Commit Graph

2102 Commits

Author SHA1 Message Date
Kelvin Cheung 379e38a763 CPUFREQ: Loongson1: Replace kzalloc() with kcalloc()
This patch replaces kzalloc() with kcalloc() when allocating
frequency table, and remove unnecessary 'out of memory' message.

Signed-off-by: Kelvin Cheung <keguang.zhang@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13053/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-05-13 14:02:08 +02:00
Kelvin Cheung 6a1d55ccd8 CPUFREQ: Loongson1: Rename the file to loongson1-cpufreq.c
This patch renames the file to loongson1-cpufreq.c,
and also includes some minor updates.

Signed-off-by: Kelvin Cheung <keguang.zhang@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13052/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-05-13 14:02:08 +02:00
Rafael J. Wysocki 1aa7a6e2b8 intel_pstate: Clean up get_target_pstate_use_performance()
The comments and the core_busy variable name in
get_target_pstate_use_performance() are totally confusing,
so modify them to reflect what's going on.

The results of the computations should be the same as before.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 22:58:38 +02:00
Rafael J. Wysocki 8edb0a6e48 intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
Notice that get_avg_pstate() can use sample.core_avg_perf instead of
carrying the same division again, so make it do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 22:58:37 +02:00
Rafael J. Wysocki a1c9787dc3 intel_pstate: Clarify average performance computation
The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utilization of the core as suggested by its name
which is confusing.

For this reason, change the name of that field to core_avg_perf
and rename the function that computes its value accordingly.

Also notice that storing this value as percentage requires a costly
integer multiplication to be carried out in a hot path, so instead
store it as an "extended fixed point" value with more fraction bits
and update the code using it accordingly (it is better to change the
name of the field along with its meaning in one go than to make those
two changes separately, as that would likely lead to more
confusion).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 22:58:37 +02:00
Chen Yu 4578ee7e1d intel_pstate: Avoid unnecessary synchronize_sched() during initialization
Currently, in intel_pstate_clear_update_util_hook(), after
clearing the utilization update hook, we leverage
synchronize_sched() to deal with synchronization, which
is a little bit time-costly because synchronize_sched()
has to wait for all the CPUs to go through a grace period.

Actually, the synchronize_sched() is not necessary if the utilization
update hook has not been set for the given CPU yet, so make the driver
check if that's the case and avoid the synchronize_sched() call then.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371
Tested-by: Tian Ye <yex.tian@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw : Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 22:56:34 +02:00
Rafael J. Wysocki c29af6f1a4 Merge branch 'pm-cpufreq-sched' into pm-cpufreq 2016-05-11 22:48:20 +02:00
Arnd Bergmann cfe9492fdf cpufreq: schedutil: Make default depend on CONFIG_SMP
CPU_FREQ_GOV_SCHEDUTIL gained a dependency on SMP, so now we
get a warning if it gets selected by CPU_FREQ_DEFAULT_GOV_SCHEDUTIL
without SMP:

warning: (CPU_FREQ_DEFAULT_GOV_SCHEDUTIL) selects CPU_FREQ_GOV_SCHEDUTIL which has unmet direct dependencies (CPU_FREQ && SMP)

This adds another dependency to avoid the problem.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: bf7cdff194 (cpufreq: schedutil: Make it depend on CONFIG_SMP)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 22:47:44 +02:00
Akshay Adiga 0bc10b93f2 cpufreq: powernv: del_timer_sync when global and local pstate are equal
When global and local pstate are equal in a powernv_target_index() call,
we don't queue a timer. But we may have timer already queued for future.
This could cause the timer to fire one additional time for no use.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 02:28:00 +02:00
Akshay Adiga 1fd3ff2874 cpufreq: powernv: Move smp_call_function_any() out of irq safe block
Fix a WARN_ON caused by smp_call_function_any() when irq is disabled,
because of changes made in the patch ('cpufreq: powernv: Ramp-down
 global pstate slower than local-pstate')
https://patchwork.ozlabs.org/patch/612058/

 WARNING: CPU: 0 PID: 4 at kernel/smp.c:291
smp_call_function_single+0x170/0x180

 Call Trace:
 [c0000007f648f9f0] [c0000007f648fa90] 0xc0000007f648fa90 (unreliable)
 [c0000007f648fa30] [c0000000001430e0] smp_call_function_any+0x170/0x1c0
 [c0000007f648fa90] [c0000000007b4b00]
powernv_cpufreq_target_index+0xe0/0x250
 [c0000007f648fb00] [c0000000007ac9dc]
__cpufreq_driver_target+0x20c/0x3d0
 [c0000007f648fbc0] [c0000000007b1b4c] od_dbs_timer+0xcc/0x260
 [c0000007f648fc10] [c0000000007b3024] dbs_work_handler+0x54/0xa0
 [c0000007f648fc50] [c0000000000c49a8] process_one_work+0x1d8/0x590
 [c0000007f648fce0] [c0000000000c4e08] worker_thread+0xa8/0x660
 [c0000007f648fd80] [c0000000000cca88] kthread+0x108/0x130
 [c0000007f648fe30] [c0000000000095e8] ret_from_kernel_thread+0x5c/0x74

- Calling smp_call_function_any() with interrupt disabled (through
 spin_lock_irqsave) could cause a deadlock, as smp_call_function_any()
 relies on the IPI to complete. This is detected in the
 smp_call_function_any() call and hence the WARN_ON.

- As the spinlock (gpstates->lock) is only used to synchronize access of
 global_pstate_info  between timer irq handler and target_index calls. And
 the timer irq handler just try_locks() hence it would not cause a
 deadlock. Hence could do without making spinlocks irq safe.

- As the smp_call_function_any() is a blocking call and does not access
 global_pstates_info, it could reduce the critcal section by moving
 smp_call_function_any() after giving up the lock.

Reported-by: Abdul Haleem <abdhalee@linux.vnet.linux.com>
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 02:28:00 +02:00
Rafael J. Wysocki f96fd0c86f intel_pstate: Clean up intel_pstate_get()
intel_pstate_get() contains a local variable that's initialized but
never used and it can be written in fewer lines of code, so clean
it up.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-05-09 22:55:36 +02:00
Rafael J. Wysocki d87de8f38a Merge branch 'pm-cpufreq-sched' into pm-cpufreq 2016-05-09 16:00:10 +02:00
Rafael J. Wysocki bf7cdff194 cpufreq: schedutil: Make it depend on CONFIG_SMP
Make the schedutil cpufreq governor depend on CONFIG_SMP, because
the scheduler-provided utilization numbers used by it are only
available with CONFIG_SMP set.

Fixes: 9bdcb44e39 (cpufreq: schedutil: New governor based on scheduler utilization data)
Reported-by: Steve Muckle <steve.muckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-06 22:33:33 +02:00
Rafael J. Wysocki da43af961b Merge cpufreq fixes going into v4.6.
* pm-cpufreq-fixes:
  intel_pstate: Fix intel_pstate_get()
  cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
  cpufreq: st: enable selective initialization based on the platform
  cpufreq: intel_pstate: Fix processing for turbo activation ratio
2016-05-06 22:01:14 +02:00
Rafael J. Wysocki 9485e4ca0b cpufreq: governor: Fix handling of special cases in dbs_update()
As reported in KBZ 69821:

"With CONFIG_HZ_PERIODIC=y cpu stays at the lowest frequcency 800MHz
 even if usage goes to 100%, frequency does not scale up, the governor
 in use is ondemand. Neither works conservative. Performance and
 userspace governors work as expected.

 With CONFIG_NO_HZ_IDLE or CONFIG_NO_HZ_FULL cpu scales up with ondemand
 as expected."

Analysis carried out by Chen Yu leads to the conclusion that the
observed issue is due to idle_time in dbs_update() representing a
negative number in which case the function will return 0 as the load
(unless load is greater than 0 for another CPU sharing the policy),
although that need not be the right choice.

Indeed, idle_time representing a negative number means that during
the last sampling interval the CPU was almost 100% busy on the rough
average, so 100 should be returned as the load in that case.

Modify the code accordingly and rearrange it to clarify the handling
of all of the special cases in it.  While at it, also avoid returning
zero as the load if time_elapsed is 0 (it doesn't really make sense
to return 0 then).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821
Tested-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Timo Valtoaho <timo.valtoaho@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-05-06 14:24:23 +02:00
Rafael J. Wysocki 5f2f88e330 Merge branches 'pm-opp-fixes', 'pm-cpufreq-fixes' and 'pm-cpuidle-fixes'
* pm-opp-fixes:
  PM / OPP: Remove useless check

* pm-cpufreq-fixes:
  intel_pstate: Fix intel_pstate_get()
  cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
  cpufreq: st: enable selective initialization based on the platform

* pm-cpuidle-fixes:
  ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
2016-05-06 13:16:22 +02:00
Ingo Molnar 1fb48f8e54 Linux 4.6-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXJoi6AAoJEHm+PkMAQRiGYKIIAIcocIV48DpGAHXFuSZbzw5D
 rp9EbE5TormtddPz1J1zcqu9tl5H8tfxS+LvHqRaDXqQkbb0BWKttmEpKTm9mrH8
 kfGNW8uwrEgTMMsar54BypAMMhHz4ITsj3VQX5QLSC5j6wixMcOmQ+IqH0Bwt3wr
 Y5JXDtZRysI1GoMkSU7/qsQBjC7aaBa5VzVUiGvhV8DdvPVFQf73P89G1vzMKqb5
 HRWbH4ieu6/mclLvW9N2QKGMHQntlB+9m2kG9nVWWbBSDxpAotwqQZFh3D52MBUy
 6DH/PNgkVyDhX4vfjua0NrmXdwTfKxLWGxe4dZ8Z+JZP5c6pqWlClIPBCkjHj50=
 =KLSM
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc6' into x86/asm, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05 08:35:00 +02:00
Srinivas Pandruvada e59a8f7ff4 cpufreq: intel_pstate: Ignore _PPC processing under HWP
When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC
is not used. So ignore processing for _PPC limits.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05 01:43:47 +02:00
Sudeep Holla d9975b0b07 cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table
Currently when performing random CPU hot-plugs and suspend-to-ram(S2R)
on systems using arm_big_little cpufreq driver, we get warnings similar
to something like below:

cpu cpu1: _opp_add: duplicate OPPs detected. Existing: freq: 600000000,
	volt: 800000, enabled: 1. New: freq: 600000000, volt: 800000, enabled: 1

This is mainly because the OPPs for the shared cpus are not set. We can
just use dev_pm_opp_of_cpumask_add_table in case the OPPs are obtained
from DT(arm_big_little_dt.c) or use dev_pm_opp_set_sharing_cpus if the
OPPs are obtained by other means like firmware(e.g. scpi-cpufreq.c)

Also now that the generic dev_pm_opp{,_of}_cpumask_remove_table can
handle removal of opp table and entries for all associated CPUs, we can
re-use dev_pm_opp{,_of}_cpumask_remove_table as free_opp_table in
cpufreq_arm_bL_ops.

This patch makes necessary changes to reuse the generic OPP functions for
{init,free}_opp_table and thereby eliminating the warnings.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05 01:40:04 +02:00
Marc Gonzalez d9c99acb63 cpufreq: tango: Use generic platdev driver
Add tango4 compatible string to the list.

Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05 01:36:57 +02:00
Sai Gurrappadi e43e94c1ed cpufreq: Fix GOV_LIMITS handling for the userspace governor
Currently, the userspace governor only updates frequency on GOV_LIMITS
if policy->cur falls outside policy->{min/max}. However, it is also
necessary to update current frequency on GOV_LIMITS to match the user
requested value if it can be achieved within the new policy->{max/min}.

This was previously the behaviour in the governor until commit d1922f0
("cpufreq: Simplify userspace governor") which incorrectly assumed that
policy->cur == user requested frequency via scaling_setspeed. This won't
be true if the user requested frequency falls outside policy->{min/max}.
Ex: a temporary thermal cap throttled the user requested frequency.

Fix this by storing the user requested frequency in a seperate variable.
The governor will then try to achieve this request on every GOV_LIMITS
change.

Fixes: d1922f0256 (cpufreq: Simplify userspace governor)
Signed-off-by: Sai Gurrappadi <sgurrappadi@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05 01:30:38 +02:00
Rafael J. Wysocki 6d45b719cb intel_pstate: Fix intel_pstate_get()
After commit 8fa520af50 "intel_pstate: Remove freq calculation from
intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
to compute the average frequency, which is problematic for two reasons.

First, intel_pstate_get() may be invoked before the driver reads the
CPU feedback registers for the first time and if that happens,
get_avg_frequency() will attempt to divide by zero.

Second, the get_avg_frequency() call in intel_pstate_get() is racy
with respect to intel_pstate_sample() and it may end up returning
completely meaningless values for this reason.

Moreover, after commit 7349ec0470 "intel_pstate: Move
intel_pstate_calc_busy() into get_target_pstate_use_performance()"
sample.core_pct_busy is never computed on Atom, but it is used in
intel_pstate_adjust_busy_pstate() in that case too.

To address those problems notice that if sample.core_pct_busy
was used in the average frequency computation carried out by
get_avg_frequency(), both the divide by zero problem and the
race with respect to intel_pstate_sample() would be avoided.

Accordingly, move the invocation of intel_pstate_calc_busy() from
get_target_pstate_use_performance() to intel_pstate_update_util(),
which also will take care of the uninitialized sample.core_pct_busy
on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
as per the above.

Reported-by: kernel test robot <ying.huang@linux.intel.com>
Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4
Fixes: 8fa520af50 "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()"
Fixes: 7349ec0470 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()"
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-04 14:09:16 +02:00
Rafael J. Wysocki ba41e1bc28 cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
Commit 41cfd64cf4 "Update frequencies of policy->cpus only from
->set_policy()" changed the way the intel_pstate driver's ->set_policy
callback updates the HWP (hardware-managed P-states) settings.
A side effect of it is that if those settings are modified on the
boot CPU during system suspend and wakeup, they will never be
restored during subsequent system resume.

To address this problem, allow cpufreq drivers that don't provide
->target or ->target_index callbacks to use ->suspend and ->resume
callbacks and add a ->resume callback to intel_pstate to restore
the HWP settings on the CPUs that belong to the given policy.

Fixes: 41cfd64cf4 "Update frequencies of policy->cpus only from ->set_policy()"
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-05-02 13:48:15 +02:00
Aneesh Kumar K.V d2adba3fd1 powerpc/mm: Abstraction for switch_mmu_context()
How we switch MMU context differs between hash and radix. For hash we
need to switch the SLB details and for radix we need to switch the PID
SPR.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:04 +10:00
Rafael J. Wysocki 81be193b7e Merge branch 'pm-cpufreq-fixes'
* pm-cpufreq-fixes:
  cpufreq: intel_pstate: Fix processing for turbo activation ratio
  Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
2016-04-29 14:22:25 +02:00
Sudeep Holla 2482bc31ca cpufreq: st: enable selective initialization based on the platform
The sti-cpufreq does unconditional registration of the cpufreq-dt driver
which causes issue on an multi-platform build. For example, on Vexpress
TC2 platform, we get the following error on boot:

cpu cpu0: OPP-v2 not supported
cpu cpu0: Not doing voltage scaling
cpu: dev_pm_opp_of_cpumask_add_table: couldn't find opp table
	for cpu:0, -19
cpu cpu0: dev_pm_opp_get_max_volt_latency: Invalid regulator (-6)
...
arm_big_little: bL_cpufreq_register: Failed registering platform driver:
		vexpress-spc, err: -17

The actual driver fails to initialise as cpufreq-dt is probed
successfully, which is incorrect. This issue can happen to any platform
not using cpufreq-dt in a multi-platform build.

This patch adds a check to do selective initialization of the driver.

Fixes: ab0ea257fc (cpufreq: st: Provide runtime initialised driver for ST's platforms)
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 15:25:56 +02:00
Viresh Kumar 9f123def55 cpufreq: mvebu: Move cpufreq code into drivers/cpufreq/
Move cpufreq bits for mvebu into drivers/cpufreq/ directory, that's
where they really belong to.

Compiled tested only.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 15:22:43 +02:00
Viresh Kumar eb96924acd cpufreq: dt: Kill platform-data
There are no more users of platform-data for cpufreq-dt driver, get rid
of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 15:22:43 +02:00
Viresh Kumar 1530b9963e cpufreq: dt: Identify cpu-sharing for platforms without operating-points-v2
Existing platforms, which do not support operating-points-v2, can
explicitly tell the opp core that some of the CPUs share opp tables,
with help of dev_pm_opp_set_sharing_cpus().

For such platforms, explicitly ask the opp core to provide list of CPUs
sharing the opp table with current cpu device, before falling back to
platform data.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 15:22:42 +02:00
Rafael J. Wysocki b4f4b4b371 cpufreq: governor: Change confusing struct field and variable names
The name of the prev_cpu_wall field in struct cpu_dbs_info is
confusing, because it doesn't represent wall time, but the previous
update time as returned by get_cpu_idle_time() (that may be the
current value of jiffies_64 in some cases, for example).

Moreover, the names of some related variables in dbs_update() take
that confusion further.

Rename all of those things to make their names reflect the purpose
more accurately.  While at it, drop unnecessary parens from one of
the updated expressions.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Chen Yu <yu.c.chen@intel.com>
2016-04-28 15:10:08 +02:00
Srinivas Pandruvada 2b3ec76505 cpufreq: intel_pstate: Enable PPC enforcement for servers
For platforms which are controlled via remove node manager, enable _PPC by
default. These platforms are mostly categorized as enterprise server or
performance servers. These platforms needs to go through some
certifications tests, which tests control via _PPC.
The relative risk of enabling by default is  low as this is is less likely
that these systems have broken _PSS table.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 01:01:39 +02:00
Srinivas Pandruvada 3be9200d51 cpufreq: intel_pstate: Adjust policy->max
When policy->max is changed via _PPC or sysfs and is more than the max non
turbo frequency, it does not really change resulting performance in some
processors. When policy->max results in a P-State ratio more than the
turbo activation ratio, then processor can choose any P-State up to max
turbo. So the user or _PPC setting has no value, but this can cause
undesirable side effects like:
- Showing reduced max percentage in Intel P-State sysfs
- It can cause reduced max performance under certain boundary conditions:
The requested max scaling frequency either via _PPC or via cpufreq-sysfs,
will be converted into a fixed floating point max percent scale. In
majority of the cases this will result in correct max. But not 100% of the
time. If the _PPC is requested at a point where the calculation lead to a
lower max, this can result in a lower P-State then expected and it will
impact performance.
Example of this condition using a Broadwell laptop with config TDP.

ACPI _PSS table from a Broadwell laptop
2301000 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000
1300000 1100000 1000000 900000 800000 600000 500000

The actual results by disabling config TDP so that we can get what is
requested on or below 2300000Khz.

scaling_max_freq        Max Requested P-State   Resultant scaling
max
---------------------------------------- ----------------------
2400000                 18                      2900000 (max
turbo)
2300000                 17                      2300000 (max
physical non turbo)
2200000                 15                      2100000
2100000                 15                      2100000
2000000                 13                      1900000
1900000                 13                      1900000
1800000                 12                      1800000
1700000                 11                      1700000
1600000                 10                      1600000
1500000                 f                       1500000
1400000                 e                       1400000
1300000                 d                       1300000
1200000                 c                       1200000
1100000                 a                       1000000
1000000                 a                       1000000
900000                  9                        900000
800000                  8                        800000
700000                  7                        700000
600000                  6                        600000
500000                  5                        500000
------------------------------------------------------------------

Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz)
in BIOS (not every system will let you adjust this).
The turbo activation ratio will be set to one less than that, which will
be 0x0a (So any request above 1000000KHz should result in turbo region
assuming no thermal limits).
Here _PPC will request max to 1100000KHz (which basically should still
result in turbo as this is more than the turbo activation ratio up to
max allowable turbo frequency), but actual calculation resulted in a max
ceiling P-State which is 0x0a. So under any load condition, this driver
will not request turbo P-States. This will be a huge performance hit.

When config TDP feature is ON, if the _PPC points to a frequency above
turbo activation ratio, the performance can still reach max turbo. In this
case we don't need to treat this as the reduced frequency in set_policy
callback.

In this change when config TDP is active (by checking if the physical max
non turbo ratio is more than the current max non turbo ratio), any request
above current max non turbo is treated as full performance.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Minor cleanups ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 01:01:39 +02:00
Srinivas Pandruvada 9522a2ff9c cpufreq: intel_pstate: Enforce _PPC limits
Use ACPI _PPC notification to limit max P state driver will request.
ACPI _PPC change notification is sent by BIOS to limit max P state
in several cases:
- Reduce impact of platform thermal condition
- When Config TDP feature is used, a changed _PPC is sent to
follow TDP change
- Remote node managers in server want to control platform power
via baseboard management controller (BMC)

This change registers with ACPI processor performance lib so that
_PPC changes are notified to cpufreq core, which in turns will
result in call to .setpolicy() callback. Also the way _PSS
table identifies a turbo frequency is not compatible to max turbo
frequency in intel_pstate, so the very first entry in _PSS needs
to be adjusted.

This feature can be turned on by using kernel parameters:
intel_pstate=support_acpi_ppc

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Minor cleanups ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 01:01:39 +02:00
Akshay Adiga eaa2c3aeef cpufreq: powernv: Ramp-down global pstate slower than local-pstate
The frequency transition latency from pmin to pmax is observed to be in
few millisecond granurality. And it usually happens to take a performance
penalty during sudden frequency rampup requests.

This patch set solves this problem by using an entity called "global
pstates". The global pstate is a Chip-level entity, so the global entitiy
(Voltage) is managed across the cores. The local pstate is a Core-level
entity, so the local entity (frequency) is managed across threads.

This patch brings down global pstate at a slower rate than the local
pstate. Hence by holding global pstates higher than local pstate makes
the subsequent rampups faster.

A per policy structure is maintained to keep track of the global and
local pstate changes. The global pstate is brought down using a parabolic
equation. The ramp down time to pmin is set to ~5 seconds. To make sure
that the global pstates are dropped at regular interval , a timer is
queued for every 2 seconds during ramp-down phase, which eventually brings
the pstate down to local pstate.

Iozone results show fairly consistent performance boost.
YCSB on redis shows improved Max latencies in most cases.

Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb
with different record sizes . The following table shows IOoperations/sec
with and without patch.

Iozone Results ( in op/sec) ( mean over 3 iterations )
---------------------------------------------------------------------
file size-                      with            without		  %
recordsize-IOtype               patch           patch		change
----------------------------------------------------------------------
200704-1-SeqWrite               1616532         1615425         0.06
200704-1-Rewrite                2423195         2303130         5.21
200704-2-SeqWrite               1628577         1602620         1.61
200704-2-Rewrite                2428264         2312154         5.02
200704-4-SeqWrite               1617605         1617182         0.02
200704-4-Rewrite                2430524         2351238         3.37
200704-8-SeqWrite               1629478         1600436         1.81
200704-8-Rewrite                2415308         2298136         5.09
200704-16-SeqWrite              1619632         1618250         0.08
200704-16-Rewrite               2396650         2352591         1.87
200704-32-SeqWrite              1632544         1598083         2.15
200704-32-Rewrite               2425119         2329743         4.09
200704-64-SeqWrite              1617812         1617235         0.03
200704-64-Rewrite               2402021         2321080         3.48
200704-128-SeqWrite             1631998         1600256         1.98
200704-128-Rewrite              2422389         2304954         5.09
200704-256 SeqWrite             1617065         1616962         0.00
200704-256-Rewrite              2432539         2301980         5.67
200704-512-SeqWrite             1632599         1598656         2.12
200704-512-Rewrite              2429270         2323676         4.54
200704-1024-SeqWrite            1618758         1616156         0.16
200704-1024-Rewrite             2431631         2315889         4.99
401408-1-SeqWrite               1631479         1608132         1.45
401408-1-Rewrite                2501550         2459409         1.71
401408-2-SeqWrite               1617095         1626069         -0.55
401408-2-Rewrite                2507557         2443621         2.61
401408-4-SeqWrite               1629601         1611869         1.10
401408-4-Rewrite                2505909         2462098         1.77
401408-8-SeqWrite               1617110         1626968         -0.60
401408-8-Rewrite                2512244         2456827         2.25
401408-16-SeqWrite              1632609         1609603         1.42
401408-16-Rewrite               2500792         2451405         2.01
401408-32-SeqWrite              1619294         1628167         -0.54
401408-32-Rewrite               2510115         2451292         2.39
401408-64-SeqWrite              1632709         1603746         1.80
401408-64-Rewrite               2506692         2433186         3.02
401408-128-SeqWrite             1619284         1627461         -0.50
401408-128-Rewrite              2518698         2453361         2.66
401408-256-SeqWrite             1634022         1610681         1.44
401408-256-Rewrite              2509987         2446328         2.60
401408-512-SeqWrite             1617524         1628016         -0.64
401408-512-Rewrite              2504409         2442899         2.51
401408-1024-SeqWrite            1629812         1611566         1.13
401408-1024-Rewrite             2507620          2442968        2.64

Tested with YCSB workload (50% update + 50% read) over redis for 1 million
records and 1 million operation. Each test was carried out with target
operations per second and persistence disabled.

Max-latency (in us)( mean over 5 iterations )
---------------------------------------------------------------
op/s    Operation       with patch      without patch   %change
---------------------------------------------------------------
15000   Read            61480.6         50261.4         22.32
15000   cleanup         215.2           293.6           -26.70
15000   update          25666.2         25163.8         2.00

25000   Read            32626.2         89525.4         -63.56
25000   cleanup         292.2           263.0           11.10
25000   update          32293.4         90255.0         -64.22

35000   Read            34783.0         33119.0         5.02
35000   cleanup         321.2           395.8           -18.8
35000   update          36047.0         38747.8         -6.97

40000   Read            38562.2         42357.4         -8.96
40000   cleanup         371.8           384.6           -3.33
40000   update          27861.4         41547.8         -32.94

45000   Read            42271.0         88120.6         -52.03
45000   cleanup         263.6           383.0           -31.17
45000   update          29755.8         81359.0         -63.43

(test without target op/s)
47659   Read            83061.4         136440.6        -39.12
47659   cleanup         195.8           193.8           1.03
47659   update          73429.4         124971.8        -41.24

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-27 23:56:58 +02:00
Shilpasri G Bhat 2920e9ce8f cpufreq: powernv: Remove flag use-case of policy->driver_data
commit 1b0289848d ("cpufreq: powernv: Add sysfs attributes to show
throttle stats") used policy->driver_data as a flag for one-time creation
of throttle sysfs files. Instead of this use 'kernfs_find_and_get()' to
check if the attribute already exists. This is required as
policy->driver_data is used for other purposes in the later patch.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-27 23:56:58 +02:00
Javier Martinez Canillas 6de0dc4b53 cpufreq: e_powersaver: Use IS_ENABLED() instead of checking for built-in or module
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-27 22:42:34 +02:00
Srinivas Pandruvada 1becf03545 cpufreq: intel_pstate: Fix processing for turbo activation ratio
When the config TDP level is not nominal (level = 0), the MSR values for
reading level 1 and level 2 ratios contain power in low 14 bits and actual
ratio bits are at bits [23:16]. The current processing for level 1 and
level 2 is wrong as there is no shift done to get actual ratio.

Fixes: 6a35fc2d6c (cpufreq: intel_pstate: get P1 from TAR when available)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 23:39:09 +02:00
Rafael J. Wysocki ba1ca654f3 cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start()
The way cpufreq_governor_start() initializes j_cdbs->prev_load is
questionable.

First off, j_cdbs->prev_cpu_wall used as a denominator in the
computation may be zero.  The case this happens is when
get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
used to return that number is called exactly at the jiffies_64
wrap time.  It is rather hard to trigger that error, but it is not
impossible and it will just crash the kernel then.

Second, j_cdbs->prev_load is computed as the average load during
the entire time since the system started and it may not reflect the
load in the previous sampling period (as it is expected to).
That doesn't play well with the way dbs_update() uses that value.
Namely, if the update time delta (wall_time) happens do be greater
than twice the sampling rate on the first invocation of it, the
initial value of j_cdbs->prev_load (which may be completely off) will
be returned to the caller as the current load (unless it is equal to
zero and unless another CPU sharing the same policy object has a
greater load value).

For this reason, notice that the prev_load field of struct cpu_dbs_info
is only used by dbs_update() and only in that one place, so if
cpufreq_governor_start() is modified to always initialize it to 0,
it will make dbs_update() always compute the actual load first time
it checks the update time delta against the doubled sampling rate
(after initialization) and there won't be any side effects of it.

Consequently, modify cpufreq_governor_start() as described.

Fixes: 18b46abd00 (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-25 16:21:34 +02:00
Viresh Kumar 3920be471c cpufreq: hisilicon: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:24 +02:00
Viresh Kumar 5e4249c6d9 cpufreq: zynq: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:24 +02:00
Viresh Kumar 117d4f59af cpufreq: sunxi: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:24 +02:00
Viresh Kumar a399dc9fc5 cpufreq: shmobile: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Finley Xiao 014400c127 cpufreq: rockchip: Use generic platdev driver
This patch add rockchip's compatible string to the compat list and
remove similar code from platform code for supporting generic platdev
driver.

Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Viresh Kumar 7694ca6e1d cpufreq: omap: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Viresh Kumar 7ead83f6df cpufreq: imx: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Note that the complete routine imx27_dt_init() is removed as

of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);

has same effect as a NULL .init_machine machine callback pointer.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Viresh Kumar a59511d1da cpufreq: berlin: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Viresh Kumar e92bb16674 cpufreq: dt: Mark platdev machines array as __initconst
The machines array in cpufreq-dt-platdev is used only once at boot time
and so should be marked with __initconst, so that kernel can free up
memory used for it, if required.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:22 +02:00
Jia Hongtao 394cb8316b cpufreq: qoriq: Fix cooling device registration issue during suspend
Cooling device is registered by ready callback. It's also invoked while
system resuming from sleep (Enabling non-boot cpus). Thus cooling device
may be multiple registered. Matchable unregistration is added to exit
callback to fix this issue.

Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:07:02 +02:00
Jia Hongtao 495c716f17 cpufreq: qoriq: Remove __exit macro from .exit callback
.exit callback (qoriq_cpufreq_cpu_exit()) is also used during suspend.
So __exit macro should be removed or the function will be discarded.

Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:07:02 +02:00
Jia Hongtao 27b8fe8daf cpufreq: qoriq: Don't show cooling device messages if THERMAL_OF undefined
When THERMAL_OF is undefined the cooling device messages should not be
shown. -ENOSYS is returned from of_cpufreq_cooling_register() when
THERMAL_OF is undefined.

Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:00:42 +02:00
Ashwin Chaugule a29a1e7678 cpufreq: ACPI / CPPC: Add module support for cppc_cpufreq driver
Add a function to cleanup at module exit and export
appropriate GPL string to enable moduler support
for the cppc_cpufreq driver.

Reported-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 15:59:35 +02:00
Philippe Longepe bdcaa23fb8 cpufreq: intel_pstate: Use average P-State instead of current P-State
The result returned by pid_calc() is subtracted from current_pstate
(which is the P-State requested during the last period) in order to
obtain the target P-State for the current iteration.

However, current_pstate may not reflect the real current P-State of
the CPU. In particular, that P-State may be higher because of the
frequency sharing per module.

The theory is:
 - The load is the percentage of time spent in C0 and is related to
   the average P-State during the same period.
 - The last requested P-State can be completely different than the
   average P-State (because of frequency sharing or throttling).
 - The P-State shift computed by the pid_calc is based on the load
   computed at average P-State, so the shift must be relative to
   this average P-State.

Using the average P-State instead of current P-State improves power
without significant performance penalty in cases when a task migrates
from one core to other core sharing frequency and voltage.

Performance and power comparison with this patch on Cherry Trail
platform using Android:

Benchmark               ?Perf    ?Power
FishTank                10.45%    3.1%
SmartBench-Gaming       -0.1%   -10.4%
SmartBench-Productivity -0.8%   -10.4%
CandyCrush                n/a   -17.4%
AngryBirds                n/a    -5.9%
videoPlayback             n/a   -13.9%
audioPlayback             n/a    -4.9%
IcyRocks-20-50           0.0%   -38.4%
iozone RR               -0.16%  -1.3%
iozone RW                0.74%  -1.3%

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 15:45:11 +02:00
Rafael J. Wysocki 1cbc99dfe5 Merge back cpufreq changes for v4.7. 2016-04-25 15:44:01 +02:00
Rafael J. Wysocki 94862a62df Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
Revert commit 0df35026c6 (cpufreq: governor: Fix negative idle_time
when configured with CONFIG_HZ_PERIODIC) that introduced a regression
by causing the ondemand cpufreq governor to misbehave for
CONFIG_TICK_CPU_ACCOUNTING unset (the frequency goes up to the max at
one point and stays there indefinitely).

The revert takes subsequent modifications of the code in question into
account.

Fixes: 0df35026c6 (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261
Reported-and-tested-by: Timo Valtoaho <timo.valtoaho@gmail.com>
Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-25 02:39:20 +02:00
Rafael J. Wysocki 395da1259a Merge branch 'pm-cpufreq-fixes'
* pm-cpufreq-fixes:
  cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
  intel_pstate: Avoid getting stuck in high P-states when idle
2016-04-21 20:57:46 +02:00
Ingo Molnar 6666ea558b Linux 4.6-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXFELfAAoJEHm+PkMAQRiGRYIH+wWsUva7TR9arN1ZrURvI17b
 KqyQH8Ov9zJBsIaq/rFXOr5KfNgx7BU9BL9h7QkBy693HXTWf+GTZ1czHM4N12C3
 0ZdHGrLwTHo2zdisiQaFORZSfhSVTUNGXGHXw13bUMgEqatPgkozXEnsvXXNdt1Z
 HtlcuJn3pcj+QIY7qDXZgTLTwgn248hi1AgNag+ntFcWiz21IYaMIi7/mCY9QUIi
 AY+Y3hqFQM7/8cVyThGS5wZPTg1YzdhsLJpoCk0TbS8FvMEnA+ylcTgc15C78bwu
 AxOwM3OCmH4gMsd7Dd/O+i9lE3K6PFrgzdDisYL3P7eHap+EdiLDvVzPDPPx0xg=
 =Q7r3
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc4' into x86/asm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:38:52 +02:00
Rafael J. Wysocki c9d9c929e6 cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
Since governor operations are generally skipped if cpufreq_suspended
is set, cpufreq_start_governor() should do nothing in that case.

That function is called in the cpufreq_online() path, and may also
be called from cpufreq_offline() in some cases, which are invoked
by the nonboot CPUs disabing/enabling code during system suspend
to RAM and resume.  That happens when all devices have been
suspended, so if the cpufreq driver relies on things like I2C to
get the current frequency, it may not be ready to do that then.

To prevent problems from happening for this reason, make
cpufreq_update_current_freq(), which is the only function invoked
by cpufreq_start_governor() that doesn't check cpufreq_suspended
already, return 0 upfront if cpufreq_suspended is set.

Fixes: 3bbf8fe3ae (cpufreq: Always update current frequency before startig governor)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-18 23:47:42 +02:00
Borislav Petkov 93984fbd4e x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org
Cc: linux-pm@vger.kernel.org
Cc: oprofile-list@lists.sf.net
Link: http://lkml.kernel.org/r/1459801503-15600-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Paul Gortmaker 6a0bcab9c6 drivers/cpufreq: make ppc_cbe_cpufreq_pmi driver explicitly non-modular
The Kconfig for this driver is currently:

config CPU_FREQ_CBE_PMI
    bool "CBE frequency scaling using PMI interface"

...meaning that it currently is not being built as a module by
anyone.  Lets remove the modular and unused code here, so that
when reading the driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

We also delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Christian Krafft <krafft@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-pm@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-11 20:30:43 +10:00
Rafael J. Wysocki ffb810563c intel_pstate: Avoid getting stuck in high P-states when idle
Jörg Otte reports that commit a4675fbc4a (cpufreq: intel_pstate:
Replace timers with utilization update callbacks) caused the CPUs in
his Haswell-based system to stay in the very high frequency region
even if the system is completely idle.

That turns out to be an existing problem in the intel_pstate driver's
P-state selection algorithm for Core processors.  Namely, all
decisions made by that algorithm are based on the average frequency
of the CPU between sampling events and on the P-state requested on
the last invocation, so it may get stuck at a very hight frequency
even if the utilization of the CPU is very low (in fact, it may get
stuck in a inadequate P-state regardless of the CPU utilization).
The only way to kick it out of that limbo is a sufficiently long idle
period (3 times longer than the prescribed sampling interval), but if
that doesn't happen often enough (eg. due to a timing change like
after the above commit), the P-state of the CPU may be inadequate
pretty much all the time.

To address the most egregious manifestations of that issue, reset the
core_busy value used to determine the next P-state to request if the
utilization of the CPU, determined with the help of the MPERF
feedback register and the TSC, is below 1%.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-10 05:59:10 +02:00
Viresh Kumar 8cee1eed8e cpufreq: ACPI: Remove freq_table from acpi_cpufreq_data
The freq-table is stored in struct cpufreq_policy also and there is
absolutely no need of keeping a copy of its reference in struct
acpi_cpufreq_data. Drop it.

Also policy->freq_table can't be NULL in the target() callback, remove
the useless check as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:54:31 +02:00
Viresh Kumar 9b55f55af8 cpufreq: ACPI: policy->driver_data can't be NULL in ->exit()
Its always set by ->init() and so it will always be there in ->exit().
There is no need to have a special check for just that.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:41:50 +02:00
Rafael J. Wysocki a794d6138c cpufreq: Rearrange cpufreq_add_dev()
Reorganize the code in cpufreq_add_dev() to avoid using the ret
variable and reduce the indentation level in it.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-09 01:37:07 +02:00
Rafael J. Wysocki cd73e9b01f cpufreq: Simplify switch () in cpufreq_cpu_callback()
Merge two switch entries that do the same thing in
cpufreq_cpu_callback().

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-09 01:37:07 +02:00
Joe Perches 1c5864e26c cpufreq: Use consistent prefixing via pr_fmt
Use the more common kernel style adding a define for pr_fmt.

Miscellanea:

o Remove now unused PFX defines

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:35:18 +02:00
Joe Perches b49c22a6ca cpufreq: Convert printk(KERN_<LEVEL> to pr_<level>
Use the more common logging style.

Miscellanea:

o Coalesce formats
o Realign arguments
o Add a missing space between a coalesced format

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:35:18 +02:00
Joe Perches 4836df173a intel_pstate: Use pr_fmt
Prefix the output using the more common kernel style.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:33:42 +02:00
Geliang Tang d2499d05f0 cpufreq: mt8173: use list_for_each_entry*()
Use list_for_each_entry*() instead of list_for_each*() to simplify
the code.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:27:54 +02:00
Rafael J. Wysocki 22590efb98 intel_pstate: Avoid pointless FRAC_BITS shifts under div_fp()
There are multiple places in intel_pstate where int_tofp() is applied
to both arguments of div_fp(), but this is pointless, because int_tofp()
simply shifts its argument to the left by FRAC_BITS which mathematically
is equivalent to multuplication by 2^FRAC_BITS, so if this is done
to both arguments of a division, the extra factors will cancel each
other during that operation anyway.

Drop the pointless int_tofp() applied to div_fp() arguments throughout
the driver.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:25:58 +02:00
Viresh Kumar 2249c00a0b cpufreq: exynos: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:18:42 +02:00
Viresh Kumar f56aad1d98 cpufreq: dt: Add generic platform-device creation support
Multiple platforms are using the generic cpufreq-dt driver now, and all
of them are required to create a platform device with name "cpufreq-dt",
in order to get the cpufreq-dt probed.

Many of them do it from platform code, others have special drivers just
to do that.

It would be more sensible to do this at a generic place, where all such
platform can mark their entries.

This patch adds a separate file to get this device created. Currently
the compat list of platforms that we support is empty, and will be
filled in as and when we move platforms to use it.

It always compiles as part of the kernel and so doesn't need a
module-exit operation.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:18:42 +02:00
Viresh Kumar f3f24dea2c cpufreq: tegra124: No need of setting platform-data
All CPUs on Tegra platform share clock/voltage lines and there is
absolutely no need of setting platform data for 'cpufreq-dt' platform
device, as that's the default case.

Stop setting platform data for cpufreq-dt device.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Thierry Reding <treding@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:12:09 +02:00
Paul Gortmaker dbbe972c11 cpufreq: ppc_cbe_cpufreq_pmi: make the driver explicitly non-modular
The Kconfig for this driver is currently:

config CPU_FREQ_CBE_PMI
    bool "CBE frequency scaling using PMI interface"

...meaning that it currently is not being built as a module by
anyone.  Lets remove the modular and unused code here, so that
when reading the driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

We also delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:11:04 +02:00
Rafael J. Wysocki 4b42fafc1c Merge branch 'pm-cpufreq-sched' into pm-cpufreq 2016-04-09 01:08:02 +02:00
Rafael J. Wysocki 6c9d9c8192 cpufreq: Call cpufreq_disable_fast_switch() in sugov_exit()
Due to differences in the cpufreq core's handling of runtime CPU
offline and nonboot CPUs disabling during system suspend-to-RAM,
fast frequency switching gets disabled after a suspend-to-RAM and
resume cycle on all of the nonboot CPUs.

To prevent that from happening, move the invocation of
cpufreq_disable_fast_switch() from cpufreq_exit_governor() to
sugov_exit(), as the schedutil governor is the only user of fast
frequency switching today anyway.

That simply prevents cpufreq_disable_fast_switch() from being called
without invoking the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT
event (which happens during system suspend now).

Fixes: b7898fda5b (cpufreq: Support for fast frequency switching)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-08 22:41:36 +02:00
Rafael J. Wysocki fa81e66ec8 Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'acpi-cppc'
* pm-cpufreq:
  cpufreq: dt: Drop stale comment
  cpufreq: intel_pstate: Documenation for structures
  cpufreq: intel_pstate: fix inconsistency in setting policy limits
  intel_pstate: Avoid extra invocation of intel_pstate_sample()
  intel_pstate: Do not set utilization update hook too early

* pm-cpuidle:
  intel_idle: Add KBL support
  intel_idle: Add SKX support
  intel_idle: Clean up all registered devices on exit.
  intel_idle: Propagate hot plug errors.
  intel_idle: Don't overreact to a cpuidle registration failure.
  intel_idle: Setup the timer broadcast only on successful driver load.
  intel_idle: Avoid a double free of the per-CPU data.
  intel_idle: Fix dangling registration on error path.
  intel_idle: Fix deallocation order on the driver exit path.
  intel_idle: Remove redundant initialization calls.
  intel_idle: Fix a helper function's return value.
  intel_idle: remove useless return from void function.

* acpi-cppc:
  mailbox: pcc: Don't access an unmapped memory address space
2016-04-08 21:46:05 +02:00
Viresh Kumar b318556479 cpufreq: dt: Drop stale comment
The comment in file header doesn't hold true anymore, drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-05 03:40:44 +02:00
Srinivas Pandruvada 13ad7701f9 cpufreq: intel_pstate: Documenation for structures
No code change. Only added kernel doc style comments for structures.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-05 03:39:05 +02:00
Srinivas Pandruvada 30a3915385 cpufreq: intel_pstate: fix inconsistency in setting policy limits
When user sets performance policy using cpufreq interface, it is possible
that because of policy->max limits, the actual performance is still
limited. But the current implementation will silently switch the
policy to powersave and start using powersave limits. If user modifies
any limits using intel_pstate sysfs, this is actually changing powersave
limits.

The current implementation tracks limits under powersave and performance
policy using two different variables. When policy->max is less than
policy->cpuinfo.max_freq, only powersave limit variable is used.

This fix causes the performance limits variable to be used always when
the policy is performance.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-05 03:37:13 +02:00
Rafael J. Wysocki 9bdcb44e39 cpufreq: schedutil: New governor based on scheduler utilization data
Add a new cpufreq scaling governor, called "schedutil", that uses
scheduler-provided CPU utilization information as input for making
its decisions.

Doing that is possible after commit 34e2c555f3 (cpufreq: Add
mechanism for registering utilization update callbacks) that
introduced cpufreq_update_util() called by the scheduler on
utilization changes (from CFS) and RT/DL task status updates.
In particular, CPU frequency scaling decisions may be based on
the the utilization data passed to cpufreq_update_util() by CFS.

The new governor is relatively simple.

The frequency selection formula used by it depends on whether or not
the utilization is frequency-invariant.  In the frequency-invariant
case the new CPU frequency is given by

	next_freq = 1.25 * max_freq * util / max

where util and max are the last two arguments of cpufreq_update_util().
In turn, if util is not frequency-invariant, the maximum frequency in
the above formula is replaced with the current frequency of the CPU:

	next_freq = 1.25 * curr_freq * util / max

The coefficient 1.25 corresponds to the frequency tipping point at
(util / max) = 0.8.

All of the computations are carried out in the utilization update
handlers provided by the new governor.  One of those handlers is
used for cpufreq policies shared between multiple CPUs and the other
one is for policies with one CPU only (and therefore it doesn't need
to use any extra synchronization means).

The governor supports fast frequency switching if that is supported
by the cpufreq driver in use and possible for the given policy.
In the fast switching case, all operations of the governor take
place in its utilization update handlers.  If fast switching cannot
be used, the frequency switch operations are carried out with the
help of a work item which only calls __cpufreq_driver_target()
(under a mutex) to trigger a frequency update (to a value already
computed beforehand in one of the utilization update handlers).

Currently, the governor treats all of the RT and DL tasks as
"unknown utilization" and sets the frequency to the allowed
maximum when updated from the RT or DL sched classes.  That
heavy-handed approach should be replaced with something more
subtle and specifically targeted at RT and DL tasks.

The governor shares some tunables management code with the
"ondemand" and "conservative" governors and uses some common
definitions from cpufreq_governor.h, but apart from that it
is stand-alone.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-04-02 01:09:12 +02:00
Rafael J. Wysocki b7898fda5b cpufreq: Support for fast frequency switching
Modify the ACPI cpufreq driver to provide a method for switching
CPU frequencies from interrupt context and update the cpufreq core
to support that method if available.

Introduce a new cpufreq driver callback, ->fast_switch, to be
invoked for frequency switching from interrupt context by (future)
governors supporting that feature via (new) helper function
cpufreq_driver_fast_switch().

Add two new policy flags, fast_switch_possible, to be set by the
cpufreq driver if fast frequency switching can be used for the
given policy and fast_switch_enabled, to be set by the governor
if it is going to use fast frequency switching for the given
policy.  Also add a helper for setting the latter.

Since fast frequency switching is inherently incompatible with
cpufreq transition notifiers, make it possible to set the
fast_switch_enabled only if there are no transition notifiers
already registered and make the registration of new transition
notifiers fail if fast_switch_enabled is set for at least one
policy.

Implement the ->fast_switch callback in the ACPI cpufreq driver
and make it set fast_switch_possible during policy initialization
as appropriate.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-02 01:09:03 +02:00
Rafael J. Wysocki 379480d825 cpufreq: Move governor symbols to cpufreq.h
Move definitions of symbols related to transition latency and
sampling rate to include/linux/cpufreq.h so they can be used by
(future) goverernors located outside of drivers/cpufreq/.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-02 01:09:02 +02:00
Rafael J. Wysocki 66893b6ac9 cpufreq: Move governor attribute set headers to cpufreq.h
Move definitions and function headers related to struct gov_attr_set
to include/linux/cpufreq.h so they can be used by (future) goverernors
located outside of drivers/cpufreq/.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-02 01:09:02 +02:00
Rafael J. Wysocki 2d0c58ad60 cpufreq: governor: Move abstract gov_attr_set code to seperate file
Move abstract code related to struct gov_attr_set to a separate (new)
file so it can be shared with (future) goverernors that won't share
more code with "ondemand" and "conservative".

No intentional functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-02 01:09:01 +02:00
Rafael J. Wysocki 0dd3c1d678 cpufreq: governor: New data type for management part of dbs_data
In addition to fields representing governor tunables, struct dbs_data
contains some fields needed for the management of objects of that
type.  As it turns out, that part of struct dbs_data may be shared
with (future) governors that won't use the common code used by
"ondemand" and "conservative", so move it to a separate struct type
and modify the code using struct dbs_data to follow.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-02 01:09:00 +02:00
Rafael J. Wysocki 0bed612be6 cpufreq: sched: Helpers to add and remove update_util hooks
Replace the single helper for adding and removing cpufreq utilization
update hooks, cpufreq_set_update_util_data(), with a pair of helpers,
cpufreq_add_update_util_hook() and cpufreq_remove_update_util_hook(),
and modify the users of cpufreq_set_update_util_data() accordingly.

With the new helpers, the code using them doesn't need to worry
about the internals of struct update_util_data and in particular
it doesn't need to worry about populating the func field in it
properly upfront.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-04-02 01:08:43 +02:00
Rafael J. Wysocki 9fa64d6424 Merge back intel_pstate fixes for v4.6.
* pm-cpufreq:
  intel_pstate: Avoid extra invocation of intel_pstate_sample()
  intel_pstate: Do not set utilization update hook too early
2016-04-02 01:07:17 +02:00
Rafael J. Wysocki febce40feb intel_pstate: Avoid extra invocation of intel_pstate_sample()
The initialization of intel_pstate for a given CPU involves populating
the fields of its struct cpudata that represent the previous sample,
but currently that is done in a problematic way.

Namely, intel_pstate_init_cpu() makes an extra call to
intel_pstate_sample() so it reads the current register values that
will be used to populate the "previous sample" record during the
next invocation of intel_pstate_sample().  However, after commit
a4675fbc4a (cpufreq: intel_pstate: Replace timers with utilization
update callbacks) that doesn't work for last_sample_time, because
the time value is passed to intel_pstate_sample() as an argument now.
Passing 0 to it from intel_pstate_init_cpu() is problematic, because
that causes cpu->last_sample_time == 0 to be visible in
get_target_pstate_use_performance() (and hence the extra
cpu->last_sample_time > 0 check in there) and effectively allows
the first invocation of intel_pstate_sample() from
intel_pstate_update_util() to happen immediately after the
initialization which may lead to a significant "turn on"
effect in the governor algorithm.

To mitigate that issue, rework the initialization to avoid the
extra intel_pstate_sample() call from intel_pstate_init_cpu().
Instead, make intel_pstate_sample() return false if it has been
called with cpu->sample.time equal to zero, which will make
intel_pstate_update_util() skip the sample in that case, and
reset cpu->sample.time from intel_pstate_set_update_util_hook()
to make the algorithm start properly every time the hook is set.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-02 01:06:21 +02:00
Rafael J. Wysocki bb6ab52f2b intel_pstate: Do not set utilization update hook too early
The utilization update hook in the intel_pstate driver is set too
early, as it only should be set after the policy has been fully
initialized by the core.  That may cause intel_pstate_update_util()
to use incorrect data and put the CPUs into incorrect P-states as
a result.

To prevent that from happening, make intel_pstate_set_policy() set
the utilization update hook instead of intel_pstate_init_cpu() so
intel_pstate_update_util() only runs when all things have been
initialized as appropriate.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-31 17:42:15 +02:00
Linus Torvalds 3d66c6ba3f Power management and ACPI material for v4.6-rc1, part 2
- Fix for an intel_pstate driver issue related to the handling of
    MSR updates uncovered by the recent cpufreq rework (Rafael Wysocki).
 
  - cpufreq core cleanups related to starting governors and frequency
    synchronization during resume from system suspend and a locking
    fix for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).
 
  - acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
    Michael Neuling, Richard Cochran, Shilpasri Bhat).
 
  - intel_idle driver update preventing some Skylake-H systems
    from hanging during initialization by disabling deep C-states
    mishandled by the platform in the problematic configurations (Len
    Brown).
 
  - Intel Xeon Phi Processor x200 support for intel_idle (Dasaratharaman
    Chandramouli).
 
  - cpuidle menu governor updates to make it always honor PM QoS
    latency constraints (and prevent C1 from being used as the
    fallback C-state on x86 when they are set below its exit latency)
    and to restore the previous behavior to fall back to C1 if the next
    timer event is set far enough in the future that was changed in 4.4
    which led to an energy consumption regression (Rik van Riel, Rafael
    Wysocki).
 
  - New device ID for a future AMD UART controller in the ACPI driver
    for AMD SoCs (Wang Hongcheng).
 
  - Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
    scaling (AVS) driver (David Wu).
 
  - ACPI PCI resources management fix for the handling of IO space
    resources on architectures where the IO space is memory mapped
    (IA64 and ARM64) broken by the introduction of common ACPI
    resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).
 
  - Fix for the ACPI backend of the generic device properties API
    to make it parse non-device (data node only) children of an
    ACPI device correctly (Irina Tirdea).
 
  - Fixes for the handling of global suspend flags (introduced in 4.4)
    during hibernation and resume from it (Lukas Wunner).
 
  - Support for obtaining configuration information from Device Trees
    in the PM clocks framework (Jon Hunter).
 
  - ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
    King, Geert Uytterhoeven).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJW9JaRAAoJEILEb/54YlRx/GAQAJujANWilWHZYm24a9JDcIE9
 rsNZIC/FdeBVilPtRTZQnig/Pj32Z4Jm7IZ/DLOq0Deu1YK/9uv3y59M3BcX6WyL
 H5VR80L8geUJZ7RRk0WfM5D4X82ovzwpE/kWt2Z7HDuvJSCBmFBZOvNrXbaRncKD
 jIvat/p6uCuxt5c08+ebnBLQ6tOs8wLTWiCx3fO128GIrGRGN2xFV6hzRWVGnJ4g
 WXGAR+AdLxRMZz4PPmqdTfRj4TNSR071GjKyaeKfZUjQGAsf5O9A77JFjeNVomDx
 g1K37Byid2bTByzVavlEXPJZ7eKb5dAhlo7IJ9HAcOAXChLqH2Czjrpd+1XjR9MF
 SV/78rCnF8eet83QYLbGV/Mzf7gbJP2Xp6wiaM22VAPpGe+sYfphJoQka9XRTfId
 OgAjyYMYdWAKo5DhxVNI8WyN0W5dsoBFPxnaUFhHSGDCIJH7Ksy20m6y3plG2Bxf
 ahoiQhmd9ohjtB5JbRnf4MY0hjekp8Srdf+DoNKsk/+JscIyROpYY3msQ3smUKo+
 f628MC/wAosMpSV+l+KOYkbjCbtB49IabWtZ//NVD9hYB3E1f6aTN59yFbWB+1rp
 L7Y8iaxzSkyJy/yYVuBal3rSk356+BvvoXBlLXmBsyu1TMlcDjALIYztSiTVT5MB
 RZBhgNwdkxNCYJfU3ex+
 =hUVj
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management and ACPI updates from Rafael Wysocki:
 "The second batch of power management and ACPI updates for v4.6.

  Included are fixups on top of the previous PM/ACPI pull request and
  other material that didn't make into it but still should go into 4.6.

  Among other things, there's a fix for an intel_pstate driver issue
  uncovered by recent cpufreq changes, a workaround for a boot hang on
  Skylake-H related to the handling of deep C-states by the platform and
  a PCI/ACPI fix for the handling of IO port resources on non-x86
  architectures plus some new device IDs and similar.

  Specifics:

   - Fix for an intel_pstate driver issue related to the handling of MSR
     updates uncovered by the recent cpufreq rework (Rafael Wysocki).

   - cpufreq core cleanups related to starting governors and frequency
     synchronization during resume from system suspend and a locking fix
     for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).

   - acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
     Michael Neuling, Richard Cochran, Shilpasri Bhat).

   - intel_idle driver update preventing some Skylake-H systems from
     hanging during initialization by disabling deep C-states mishandled
     by the platform in the problematic configurations (Len Brown).

   - Intel Xeon Phi Processor x200 support for intel_idle
     (Dasaratharaman Chandramouli).

   - cpuidle menu governor updates to make it always honor PM QoS
     latency constraints (and prevent C1 from being used as the fallback
     C-state on x86 when they are set below its exit latency) and to
     restore the previous behavior to fall back to C1 if the next timer
     event is set far enough in the future that was changed in 4.4 which
     led to an energy consumption regression (Rik van Riel, Rafael
     Wysocki).

   - New device ID for a future AMD UART controller in the ACPI driver
     for AMD SoCs (Wang Hongcheng).

   - Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
     scaling (AVS) driver (David Wu).

   - ACPI PCI resources management fix for the handling of IO space
     resources on architectures where the IO space is memory mapped
     (IA64 and ARM64) broken by the introduction of common ACPI
     resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).

   - Fix for the ACPI backend of the generic device properties API to
     make it parse non-device (data node only) children of an ACPI
     device correctly (Irina Tirdea).

   - Fixes for the handling of global suspend flags (introduced in 4.4)
     during hibernation and resume from it (Lukas Wunner).

   - Support for obtaining configuration information from Device Trees
     in the PM clocks framework (Jon Hunter).

   - ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
     King, Geert Uytterhoeven)"

* tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  PM / AVS: rockchip-io: add io selectors and supplies for rk3399
  intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
  intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
  ACPI / PM: Runtime resume devices when waking from hibernate
  PM / sleep: Clear pm_suspend_global_flags upon hibernate
  cpufreq: governor: Always schedule work on the CPU running update
  cpufreq: Always update current frequency before startig governor
  cpufreq: Introduce cpufreq_update_current_freq()
  cpufreq: Introduce cpufreq_start_governor()
  cpufreq: powernv: Add sysfs attributes to show throttle stats
  cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
  PCI: ACPI: IA64: fix IO port generic range check
  ACPI / util: cast data to u64 before shifting to fix sign extension
  cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
  cpuidle: menu: Fall back to polling if next timer event is near
  cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
  intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
  cpufreq: Make cpufreq_quick_get() safe to call
  ACPI / property: fix data node parsing in acpi_get_next_subnode()
  ACPI / APD: Add device HID for future AMD UART controller
  ...
2016-03-24 22:59:58 -07:00
Rafael J. Wysocki 33068b61f8 Merge branches 'pm-cpufreq' and 'pm-cpuidle'
* pm-cpufreq:
  cpufreq: governor: Always schedule work on the CPU running update
  cpufreq: Always update current frequency before startig governor
  cpufreq: Introduce cpufreq_update_current_freq()
  cpufreq: Introduce cpufreq_start_governor()
  cpufreq: powernv: Add sysfs attributes to show throttle stats
  cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
  cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
  cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
  intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
  cpufreq: Make cpufreq_quick_get() safe to call

* pm-cpuidle:
  intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
  intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
  cpuidle: menu: Fall back to polling if next timer event is near
  cpuidle: menu: use high confidence factors only when considering polling
2016-03-25 00:57:22 +01:00
Rafael J. Wysocki 539a4c4247 cpufreq: governor: Always schedule work on the CPU running update
Modify dbs_irq_work() to always schedule the process-context work
on the current CPU which also ran the dbs_update_util_handler()
that the irq_work being handled came from.

This causes the entire frequency update handling (involving the
"ondemand" or "conservative" governors) to be carried out by the
CPU whose frequency is to be updated and reduces the overall amount
of inter-CPU noise related to cpufreq.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-22 23:13:36 +01:00
Rafael J. Wysocki 3bbf8fe3ae cpufreq: Always update current frequency before startig governor
Make policy->cur match the current frequency returned by the driver's
->get() callback before starting the governor in case they went out of
sync in the meantime and drop the piece of code attempting to
resync policy->cur with the real frequency of the boot CPU from
cpufreq_resume() as it serves no purpose any more (and it's racy and
super-ugly anyway).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-22 23:13:36 +01:00
Rafael J. Wysocki 999f572983 cpufreq: Introduce cpufreq_update_current_freq()
Move the part of cpufreq_update_policy() that obtains the current
frequency from the driver and updates policy->cur if necessary to
a separate function, cpufreq_get_current_freq().

That should not introduce functional changes and subsequent change
set will need it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-22 23:13:36 +01:00
Rafael J. Wysocki 0a300767e5 cpufreq: Introduce cpufreq_start_governor()
Starting a governor in cpufreq always follows the same pattern
involving two calls to cpufreq_governor(), one with the event
argument set to CPUFREQ_GOV_START and one with that argument set to
CPUFREQ_GOV_LIMITS.

Introduce cpufreq_start_governor() that will carry out those two
operations and make all places where governors are started use it.

That slightly modifies the behavior of cpufreq_set_policy() which
now also will go back to the old governor if the second call to
cpufreq_governor() (the one with event equal to CPUFREQ_GOV_LIMITS)
fails, but that really is how it should work in the first place.

Also cpufreq_resume() will now pring an error message if the
CPUFREQ_GOV_LIMITS call to cpufreq_governor() fails, but that
makes it follow cpufreq_add_policy_cpu() and cpufreq_offline()
in that respect.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-22 23:13:36 +01:00
Shilpasri G Bhat 1b0289848d cpufreq: powernv: Add sysfs attributes to show throttle stats
Create sysfs attributes to export throttle information in
/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats directory. The
newly added sysfs files are as follows:

 1)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat
 2)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub-turbo_stat
 3)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/unthrottle
 4)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/powercap
 5)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overtemp
 6)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/supply_fault
 7)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overcurrent
 8)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/occ_reset

Detailed explanation of each attribute is added to
Documentation/ABI/testing/sysfs-devices-system-cpu

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-22 23:11:18 +01:00
Jisheng Zhang ac13b9967d cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
These frequency register read/write operations' implementations for the
given processor (Intel/AMD MSR access or I/O port access) are only used
internally in acpi-cpufreq, so make them static.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-22 23:09:50 +01:00
Michael Neuling 3e5963bc34 cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
Commit 96c4726f01 "cpufreq: powernv: Remove cpu_to_chip_id() from
hot-path" introduced a 'core_to_chip_map' array to cache the chip-ids
of all cores.

Replace this with a per-CPU variable that stores the pointer to the
chip-array. This removes the linear lookup and provides a neater and
simpler solution.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-22 01:08:55 +01:00
Linus Torvalds 46e595a17d ARM: SoC driver updates for v4.6
Driver updates for ARM SoCs, these contain various things that touch
 the drivers/ directory but got merged through arm-soc for practical
 reasons:
 
 - Rockchip rk3368 gains power domain support
 - Small updates for the ARM spmi driver
 - The Atmel PMC driver saw a larger rework, touching both
   arch/arm/mach-at91 and drivers/clk/at91
 - All reset controller driver changes alway get merged through
   arm-soc, though this time the largest change is the addition
   of a MIPS pistachio reset driver
 - One bugfix for the NXP (formerly Freescale) i.MX weim bus driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAVu67OmCrR//JCVInAQJ64hAAqNemdAMloJhh8mk4O74egd/XNE8GLK3v
 gGefpZNi0TC8u/GWMhU1aFCElaCmbNlL0IlqaRrU/vydOmQcZYht7Fg3bAm4r3ck
 TlKijGTJap4sdHhxSeui+7bhaBToxcklQTdcrKFgOwsype7CAWJCl5otIC/GHO5L
 fn4QSjQbqr5kqH1XfuVIphj/fJjDKRRze5D7zn0nExq46OyoYyjc2lm/QkLgeeS2
 vDpzOULYXcjf5GfsPknCJGGjenISD7cIAwZukGvJXFh8WrXkEPZZ7B7bBI/8ZeBU
 MkdWvOm9fHEWpIPnuTcLeQNlfdzQ0Z0zijgJqnXjwSYXK2Es1UKEoIFvZUyGA9zG
 uyLtddFcKbP4QBDUKVMbyYM6x4Cj7LO96dB2pe8iH5rvnoLS32EjJ/4glnbPQFB7
 75JKb7eU1pijoy9c3x/G10vINHzbPjyUN3sYTFKMomPFzEF4OVQ3GDclSuD7jjDr
 GnqmAqlj29+qGU6iQBBHp9TfLTxwrs/4MKPEZ+tTGvtINnzOpLGA3TUnji7nVFQc
 BYy3qaEvg9MfHI3uXhAl2L4CGCVvHfqFs5B7giZfAkbbcTNAHs9PkZ6gMYH+GG3p
 tEbTf/dMHmkkqttSz4f7LZS7D56cSfm3cD8kFCRJPLKifmGAk3w1HZ7JoCXdjr1K
 22HSKRMxlhU=
 =HS4G
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "Driver updates for ARM SoCs, these contain various things that touch
  the drivers/ directory but got merged through arm-soc for practical
  reasons:

   - Rockchip rk3368 gains power domain support
   - Small updates for the ARM spmi driver
   - The Atmel PMC driver saw a larger rework, touching both
     arch/arm/mach-at91 and drivers/clk/at91
   - All reset controller driver changes alway get merged through
     arm-soc, though this time the largest change is the addition of a
     MIPS pistachio reset driver
   - One bugfix for the NXP (formerly Freescale) i.MX weim bus driver"

* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits)
  bus: imx-weim: Take the 'status' property value into account
  clk: at91: remove useless includes
  clk: at91: pmc: remove useless capacities handling
  clk: at91: pmc: drop at91_pmc_base
  usb: gadget: atmel: access the PMC using regmap
  ARM: at91: remove useless includes and function prototypes
  ARM: at91: pm: move idle functions to pm.c
  ARM: at91: pm: find and remap the pmc
  ARM: at91: pm: simply call at91_pm_init
  clk: at91: pmc: move pmc structures to C file
  clk: at91: pmc: merge at91_pmc_init in atmel_pmc_probe
  clk: at91: remove IRQ handling and use polling
  clk: at91: make use of syscon/regmap internally
  clk: at91: make use of syscon to share PMC registers in several drivers
  hwmon: (scpi) add energy meter support
  firmware: arm_scpi: add support for 64-bit sensor values
  firmware: arm_scpi: decrease Tx timeout to 20ms
  firmware: arm_scpi: fix send_message and sensor_get_value for big-endian
  reset: sti: Make reset_control_ops const
  reset: zynq: Make reset_control_ops const
  ...
2016-03-20 15:40:32 -07:00
Linus Torvalds 33b3d2e88c ARM: SoC platform updates for v4.6
Newly added support for additional SoCs:
 
 - Axis Artpec-6 SoC family
 - Allwinner A83T SoC
 - Mediatek MT7623
 - NXP i.MX6QP SoC
 - ST Microelectronics stm32f469 microcontroller
 
 New features:
 - SMP support for Mediatek mt2701
 - Big-endian support for NXP i.MX
 - DaVinci now uses the new DMA engine dma_slave_map
 - OMAP now uses the new DMA engine dma_slave_map
 - earlyprintk support for palmchip uart on mach-tango
 - delay timer support for orion
 
 Other:
 - Exynos PMU driver moved out to drivers/soc/
 - Various smaller updates for Renesas, Xilinx, PXA, AT91, OMAP, uniphier
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAVu68DGCrR//JCVInAQIHVQ//Wblms+NKj3aKh6m2Sscs/YkSbFaQ4sY2
 rNyfxLIYsLXkth1kbdHRFSMyL68Ym+xutErgw/3HQPB2D1YtYJE3VJ/y8AU92SU3
 oHyQIty+atB8d8zBbtlkWmat94NIfYf0I8PQETreGb1LMaJqAf0mDEDAyorTLZcZ
 UtQ817Ihn7urqwdTJpTO58V41RmY/vflbHI5T6bIjUJn6fF1e/7+VqtMIfq5sjJ6
 0EPEQdu8s5AJ7gcGlGi9I5gAtSnWSA/9phAxul9P8/HrMpUWIxreSEAy8FY7W14F
 4TON3sQrnw7nyA72U80KGIXhgLy7SbEmHcSqyy4YJK3ycdk6VYk0CBO7nWVYAiD1
 knLisOH6jwe0LIj9WXiRR+Y2Q53pXN8SF77pLDahSnvuShnYEjEH5uELHtxe7Vxh
 gn+NH1rDkRTgdYgt4RWlVyUoLkddQWzLb1m4QyQlvxtTR25cJJayXdVX2MRrNPF5
 c1zRa9HH+b8LJQIMdWfo/NoHhHtftkkGGsqHAAaypZqdpyk0j2HpJYk5ecPR4f5C
 /8o/h/5xOI9gEzp/DVYSZ1VAvRqBQGIDfKBXWq6GuoZaF0aN8ISe5IxFn5Yx2F46
 fNaxqiNpWmyywl8D+tSWPFK6aE21AXKGi5zIzexZZqy283aDjlUPI+tgF2GKIuKP
 3ayYTDeBpLI=
 =ynNj
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC platform updates from Arnd Bergmann:
 "Newly added support for additional SoCs:
   - Axis Artpec-6 SoC family
   - Allwinner A83T SoC
   - Mediatek MT7623
   - NXP i.MX6QP SoC
   - ST Microelectronics stm32f469 microcontroller

  New features:
   - SMP support for Mediatek mt2701
   - Big-endian support for NXP i.MX
   - DaVinci now uses the new DMA engine dma_slave_map
   - OMAP now uses the new DMA engine dma_slave_map
   - earlyprintk support for palmchip uart on mach-tango
   - delay timer support for orion

  Other:
   - Exynos PMU driver moved out to drivers/soc/
   - Various smaller updates for Renesas, Xilinx, PXA, AT91, OMAP,
     uniphier"

* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (83 commits)
  ARM: uniphier: rework SMP code to support new System Bus binding
  ARM: uniphier: add missing of_node_put()
  ARM: at91: avoid defining CONFIG_* symbols in source code
  ARM: DRA7: hwmod: Add data for eDMA tpcc, tptc0, tptc1
  ARM: imx: Make reset_control_ops const
  ARM: imx: Do L2 errata only if the L2 cache isn't enabled
  ARM: imx: select ARM_CPU_SUSPEND only for imx6
  dmaengine: pxa_dma: fix the maximum requestor line
  ARM: alpine: select the Alpine MSI controller driver
  ARM: pxa: add the number of DMA requestor lines
  dmaengine: mmp-pdma: add number of requestors
  dma: mmp_pdma: Add the #dma-requests DT property documentation
  ARM: OMAP2+: Add rtc hwmod configuration for ti81xx
  ARM: s3c24xx: Avoid warning for inb/outb
  ARM: zynq: Move early printk virtual address to vmalloc area
  ARM: DRA7: hwmod: Add custom reset handler for PCIeSS
  ARM: SAMSUNG: Remove unused register offset definition
  ARM: EXYNOS: Cleanup header files inclusion
  drivers: soc: samsung: Enable COMPILE_TEST
  MAINTAINERS: Add maintainers entry for drivers/soc/samsung
  ...
2016-03-20 14:57:08 -07:00
Richard Cochran ed72662a84 cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
This driver has two issues.  First, it tries to fiddle with the hot
plugged CPU's MSR on the UP_PREPARE event, at a time when the CPU is
not yet online.  Second, the driver sets the "boost-disable" bit for a
CPU when going down, but does not clear the bit again if the CPU comes
up again due to DOWN_FAILED.

This patch fixes the issues by changing the driver to react to the
ONLINE/DOWN_FAILED events instead of UP_PREPARE.  As an added benefit,
the driver also becomes symmetric with respect to the hot plug
mechanism.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-20 00:39:04 +01:00
Rafael J. Wysocki fdfdb2b130 intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
After commit a4675fbc4a (cpufreq: intel_pstate: Replace timers with
utilization update callbacks) wrmsrl_on_cpu() cannot be called in the
intel_pstate_adjust_busy_pstate() path as that is executed with
disabled interrupts.  However, atom_set_pstate() called from there
via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
smp_call_function_single().

The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
because intel_pstate_set_pstate() calling it is also invoked during
the initialization and cleanup of the driver and in those cases it is
not guaranteed to be run on the CPU that is being updated.  However,
in the case when intel_pstate_set_pstate() is called by
intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
the register safely.  Moreover, intel_pstate_set_pstate() already
contains code that only is executed if the function is called by
intel_pstate_adjust_busy_pstate() and there is a special argument
passed to it because of that.

To fix the problem at hand, rearrange the code taking the above
observations into account.

First, replace the ->set() callback in struct pstate_funcs with a
->get_val() one that will return the value to be written to the
IA32_PERF_CTL MSR without updating the register.

Second, split intel_pstate_set_pstate() into two functions,
intel_pstate_update_pstate() to be called by
intel_pstate_adjust_busy_pstate() that will contain all of the
intel_pstate_set_pstate() code which only needs to be executed in
that case and will use wrmsrl() to update the MSR (after obtaining
the value to write to it from the ->get_val() callback), and
intel_pstate_set_min_pstate() to be invoked during the
initialization and cleanup that will set the P-state to the
minimum one and will update the MSR using wrmsrl_on_cpu().

Finally, move the code shared between intel_pstate_update_pstate()
and intel_pstate_set_min_pstate() to a new static inline function
intel_pstate_record_pstate() and make them both call it.

Of course, that unifies the handling of the IA32_PERF_CTL MSR writes
between Atom and Core.

Fixes: a4675fbc4a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-20 00:37:09 +01:00
Richard Cochran c75361c0b0 cpufreq: Make cpufreq_quick_get() safe to call
The function, cpufreq_quick_get, accesses the global 'cpufreq_driver' and
its fields without taking the associated lock, cpufreq_driver_lock.

Without the locking, nothing guarantees that 'cpufreq_driver' remains
consistent during the call.  This patch fixes the issue by taking the lock
before accessing the data structure.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-18 01:49:01 +01:00
Rafael J. Wysocki 4ed3900427 Merge branch 'pm-cpufreq'
* pm-cpufreq: (94 commits)
  intel_pstate: Do not skip samples partially
  intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
  intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
  intel_pstate: Optimize calculation for max/min_perf_adj
  intel_pstate: Remove extra conversions in pid calculation
  cpufreq: Move scheduler-related code to the sched directory
  Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"
  cpufreq: Reduce cpufreq_update_util() overhead a bit
  cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set
  cpufreq: Remove 'policy->governor_enabled'
  cpufreq: Rename __cpufreq_governor() to cpufreq_governor()
  cpufreq: Relocate handle_update() to kill its declaration
  cpufreq: governor: Drop unnecessary checks from show() and store()
  cpufreq: governor: Fix race in dbs_update_util_handler()
  cpufreq: governor: Make gov_set_update_util() static
  cpufreq: governor: Narrow down the dbs_data_mutex coverage
  cpufreq: governor: Make dbs_data_mutex static
  cpufreq: governor: Relocate definitions of tuners structures
  cpufreq: governor: Move per-CPU data to the common code
  cpufreq: governor: Make governor private data per-policy
  ...
2016-03-14 14:22:03 +01:00
Rafael J. Wysocki 4fec7ad5f6 intel_pstate: Do not skip samples partially
If the current value of MPERF or the current value of TSC is the
same as the previous one, respectively, intel_pstate_sample() bails
out early and skips the sample.

However, intel_pstate_adjust_busy_pstate() is still called in that
case which is not correct, so modify intel_pstate_sample() to
return a bool value indicating whether or not the sample has been
taken and use it to decide whether or not to call
intel_pstate_adjust_busy_pstate().

While at it, remove redundant parentheses from the MPERF/TSC
check in intel_pstate_sample().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-03-11 00:07:51 +01:00
Philippe Longepe 8fa520af50 intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
Use a helper function to compute the average pstate and call it only
where it is needed (only when tracing or in intel_pstate_get).

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11 00:04:58 +01:00
Philippe Longepe 7349ec0470 intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
The cpu_load algorithm doesn't need to invoke intel_pstate_calc_busy(),
so move that call from intel_pstate_sample() to
get_target_pstate_use_performance().

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11 00:04:58 +01:00
Philippe Longepe a158bed5dc intel_pstate: Optimize calculation for max/min_perf_adj
mul_fp(int_tofp(A), B) expands to:
((A << FRAC_BITS) * B) >> FRAC_BITS, so the same result can be obtained
via simple multiplication A * B.  Apply this observation to
max_perf * limits->max_perf and max_perf * limits->min_perf in
intel_pstate_get_min_max()."

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11 00:04:58 +01:00
Philippe Longepe b54a0dfd56 intel_pstate: Remove extra conversions in pid calculation
pid->setpoint and pid->deadband can be initialized in fixed point, so we
can avoid the int_tofp in pid_calc.

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11 00:04:42 +01:00
Rafael J. Wysocki a5acbfbd70 Merge branch 'pm-cpufreq-governor' into pm-cpufreq 2016-03-10 20:46:03 +01:00
Rafael J. Wysocki adaf9fcd13 cpufreq: Move scheduler-related code to the sched directory
Create cpufreq.c under kernel/sched/ and move the cpufreq code
related to the scheduler to that file and to sched.h.

Redefine cpufreq_update_util() as a static inline function to avoid
function calls at its call sites in the scheduler code (as suggested
by Peter Zijlstra).

Also move the definition of struct update_util_data and declaration
of cpufreq_set_update_util_data() from include/linux/cpufreq.h to
include/linux/sched.h.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-03-10 20:44:47 +01:00
Viresh Kumar edd4a893e0 Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"
Revert commit 3510fac454 (cpufreq: postfix policy directory with the
first CPU in related_cpus).

Earlier, the policy->kobj was added to the kobject core, before ->init()
callback was called for the cpufreq drivers. Which allowed those drivers
to add or remove, driver dependent, sysfs files/directories to the same
kobj from their ->init() and ->exit() callbacks.

That isn't possible anymore after commit 3510fac454.

Now, there is no other clean alternative that people can adopt.

Its better to revert the earlier commit to allow cpufreq drivers to
create/remove sysfs files from ->init() and ->exit() callbacks.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 21:42:45 +01:00
Rafael J. Wysocki 08f511fd41 cpufreq: Reduce cpufreq_update_util() overhead a bit
Use the observation that cpufreq_update_util() is only called
by the scheduler with rq->lock held, so the callers of
cpufreq_set_update_util_data() can use synchronize_sched()
instead of synchronize_rcu() to wait for cpufreq_update_util()
to complete.  Moreover, if they are updated to do that,
rcu_read_(un)lock() calls in cpufreq_update_util() might be
replaced with rcu_read_(un)lock_sched(), respectively, but
those aren't really necessary, because the scheduler calls
that function from RCU-sched read-side critical sections
already.

In addition to that, if cpufreq_set_update_util_data() checks
the func field in the struct update_util_data before setting
the per-CPU pointer to it, the data->func check may be dropped
from cpufreq_update_util() as well.

Make the above changes to reduce the overhead from
cpufreq_update_util() in the scheduler paths invoking it
and to make the cleanup after removing its callbacks less
heavy-weight somewhat.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-03-09 15:07:58 +01:00
Rafael J. Wysocki e6f036571e cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set
Commit 0eb463be3436 (cpufreq: governor: Replace timers with utilization
update callbacks) made CPU_FREQ select IRQ_WORK, but that's not
necessary, as it is sufficient for IRQ_WORK to be selected by
CPU_FREQ_GOV_COMMON, so modify the cpufreq Kconfig to that effect.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 15:01:37 +01:00
Viresh Kumar 242aa883a6 cpufreq: Remove 'policy->governor_enabled'
The entire sequence of events (like INIT/START or STOP/EXIT) for which
cpufreq_governor() is called, is guaranteed to be protected by
policy->rwsem now.

The additional checks that were added earlier (as we were forced to drop
policy->rwsem before calling cpufreq_governor() for EXIT event), aren't
required anymore.

Over that, they weren't sufficient really. They just take care of
START/STOP events, but not INIT/EXIT and the state machine was never
maintained properly by them.

Kill the unnecessary checks and policy->governor_enabled field.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:12 +01:00
Viresh Kumar a1317e091a cpufreq: Rename __cpufreq_governor() to cpufreq_governor()
The __ at the beginning of the routine aren't really necessary at all.
Rename it to cpufreq_governor() instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:11 +01:00
Viresh Kumar 11eb69b984 cpufreq: Relocate handle_update() to kill its declaration
handle_update() is declared at the top of the file as its user appear
before its definition. Relocate the routine to get rid of this.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:11 +01:00
Viresh Kumar f737236b12 cpufreq: governor: Drop unnecessary checks from show() and store()
The show() and store() routines in the cpufreq-governor core don't need
to check if the struct governor_attr they want to use really provides
the callbacks they need as expected (if that's not the case, it means a
bug in the code anyway), so change them to avoid doing that.

Also change the error value to -EBUSY, if the governor is getting
removed and we aren't allowed to store any more changes.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:11 +01:00
Rafael J. Wysocki 27de348239 cpufreq: governor: Fix race in dbs_update_util_handler()
There is a scenario that may lead to undesired results in
dbs_update_util_handler().  Namely, if two CPUs sharing a policy
enter the funtion at the same time, pass the sample delay check
and then one of them is stalled until dbs_work_handler() (queued
up by the other CPU) clears the work counter, it may update the
work counter and queue up another work item prematurely.

To prevent that from happening, use the observation that the CPU
queuing up a work item in dbs_update_util_handler() updates the
last sample time.  This means that if another CPU was stalling after
passing the sample delay check and now successfully updated the work
counter as a result of the race described above, it will see the new
value of the last sample time which is different from what it used in
the sample delay check before.  If that happens, the sample delay
check passed previously is not valid any more, so the CPU should not
continue.

Fixes: f17cbb53783c (cpufreq: governor: Avoid atomic operations in hot paths)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:10 +01:00
Rafael J. Wysocki 94ab5e030f cpufreq: governor: Make gov_set_update_util() static
The gov_set_update_util() routine is only used internally by the
common governor code and it doesn't need to be exported, so make
it static.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:10 +01:00
Rafael J. Wysocki 1112e9d83e cpufreq: governor: Narrow down the dbs_data_mutex coverage
Since cpufreq_governor_dbs() is now always called with policy->rwsem
held, it cannot be executed twice in parallel for the same policy.
Thus it is not necessary to hold dbs_data_mutex around the invocations
of cpufreq_governor_start/stop/limits() from it as those functions
never modify any data that can be shared between different policies.

However, cpufreq_governor_dbs() may be executed twice in parallal
for different policies using the same gov->gdbs_data object and
dbs_data_mutex is still necessary to protect that object against
concurrent updates.

For this reason, narrow down the dbs_data_mutex locking to
cpufreq_governor_init/exit() where it is needed and rename the
mutex to gov_dbs_data_mutex to reflect its purpose.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:10 +01:00
Rafael J. Wysocki e3f5ed9393 cpufreq: governor: Make dbs_data_mutex static
That mutex is only used by cpufreq_governor_dbs() and it doesn't
need to be exported to modules, so make it static and drop the
export incantation.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:09 +01:00
Rafael J. Wysocki 47ebaac1f3 cpufreq: governor: Relocate definitions of tuners structures
Move the definitions of struct od_dbs_tuners and struct cs_dbs_tuners
from the common governor header to the ondemand and conservative
governor code, respectively, as they don't need to be in the common
header any more.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:09 +01:00
Rafael J. Wysocki 8c8f77fd07 cpufreq: governor: Move per-CPU data to the common code
After previous changes there is only one piece of code in the
ondemand governor making references to per-CPU data structures,
but it can be easily modified to avoid doing that, so modify it
accordingly and move the definition of per-CPU data used by the
ondemand and conservative governors to the common code.  Next,
change that code to access the per-CPU data structures directly
rather than via a governor callback.

This causes the ->get_cpu_cdbs governor callback to become
unnecessary, so drop it along with the macro and function
definitions related to it.

Finally, drop the definitions of struct od_cpu_dbs_info_s and
struct cs_cpu_dbs_info_s that aren't necessary any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:09 +01:00
Rafael J. Wysocki 7d5a9956af cpufreq: governor: Make governor private data per-policy
Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s
are only used for a limited set of CPUs.  Namely, if a policy is
shared between multiple CPUs, those fields will only be used for one
of them (policy->cpu).  This means that they really are per-policy
rather than per-CPU and holding room for them in per-CPU data
structures is generally wasteful.  Also moving those fields into
per-policy data structures will allow some significant simplifications
to be made going forward.

For this reason, introduce struct cs_policy_dbs_info and
struct od_policy_dbs_info to hold those fields.  Define each of the
new structures as an extension of struct policy_dbs_info (such that
struct policy_dbs_info is embedded in each of them) and introduce
new ->alloc and ->free governor callbacks to allocate and free
those structures, respectively, such that ->alloc() will return
a pointer to the struct policy_dbs_info embedded in the allocated
data structure and ->free() will take that pointer as its argument.

With that, modify the code accessing the data fields in question
in per-CPU data objects to look for them in the new structures
via the struct policy_dbs_info pointer available to it and drop
them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:08 +01:00
Rafael J. Wysocki d1db75fffc cpufreq: ondemand: Rework the handling of powersave bias updates
The ondemand_powersave_bias_init() function used for resetting data
fields related to the powersave bias tunable of the ondemand governor
works by walking all of the online CPUs in the system and updating the
od_cpu_dbs_info_s structures for all of them.

However, if governor tunables are per policy, the update should not
touch the CPUs that are not associated with the given dbs_data.

Moreover, since the data fields in question are only ever used for
policy->cpu in each policy governed by ondemand, the update can be
limited to those specific CPUs.

Rework the code to take the above observations into account.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:08 +01:00
Rafael J. Wysocki a33cce1c6c cpufreq: governor: Fix CPU load information updates via ->store
The ->store() callbacks of some tunable sysfs attributes of the
ondemand and conservative governors trigger immediate updates of
the CPU load information for all CPUs "governed" by the given
dbs_data by walking the cpu_dbs_info structures for all online
CPUs in the system and updating them.

This is questionable for two reasons.  First, it may lead to a lot of
extra overhead on a system with many CPUs if the given dbs_data is
only associated with a few of them.  Second, if governor tunables are
per-policy, the CPUs associated with the other sets of governor
tunables should not be updated.

To address this issue, use the observation that in all of the places
in question the update operation may be carried out in the same way
(because all of the tunables involved are now located in struct
dbs_data and readily available to the common code) and make the
code in those places invoke the same (new) helper function that
will carry out the update correctly.

That new function always checks the ignore_nice_load tunable value
and updates the CPUs' prev_cpu_nice data fields if that's set, which
wasn't done by the original code in store_io_is_busy(), but it
should have been done in there too.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:08 +01:00
Rafael J. Wysocki 76c5f66aa1 cpufreq: ondemand: Drop one more callback from struct od_ops
The ->powersave_bias_init_cpu callback in struct od_ops is only used
in one place and that invocation may be replaced with a direct call
to the function pointed to by that callback, so change the code
accordingly and drop the callback.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:07 +01:00
Rafael J. Wysocki 8434dadbb4 cpufreq: governor: Drop unused governor callback and data fields
After some previous changes, the ->get_cpu_dbs_info_s governor
callback and the "governor" field in struct dbs_governor (whose
value represents the governor type) are not used any more, so
drop them.

Also drop the unused gov_ops field from struct dbs_governor.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:07 +01:00
Rafael J. Wysocki 702c9e542a cpufreq: governor: Add a ->start callback for governors
To avoid having to check the governor type explicitly in the common
code in order to initialize data structures specific to the governor
type properly, add a ->start callback to struct dbs_governor and
use it to initialize those data structures for the ondemand and
conservative governors.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:07 +01:00
Rafael J. Wysocki 8847e038c1 cpufreq: governor: Move io_is_busy to struct dbs_data
The io_is_busy governor tunable is only used by the ondemand governor
and is located in the ondemand-specific data structure, but it is
looked at by the common governor code that has to do ugly things to
get to that value, so move it to struct dbs_data and modify ondemand
accordingly.

Since the conservative governor never touches that field, it will
be always 0 for that governor and it won't have any effect on the
results of computations in that case.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:06 +01:00
Rafael J. Wysocki 574ef14d5d cpufreq: governor: Close dbs_data update race condition
It is possible for a dbs_data object to be updated after its
usage counter has become 0.  That may happen if governor_store()
runs (via a govenor tunable sysfs attribute write) in parallel
with cpufreq_governor_exit() called for the last cpufreq policy
associated with the dbs_data in question.  In that case, if
governor_store() acquires dbs_data->mutex right after
cpufreq_governor_exit() has released it, the ->store() callback
invoked by it may operate on dbs_data with no users.  Although
sysfs will cause the kobject_put() in cpufreq_governor_exit() to
block until governor_store() has returned, that situation may
lead to some unexpected results, depending on the implementation
of the ->store callback, and therefore it should be avoided.

To that end, modify governor_store() to check the dbs_data's
usage count before invoking the ->store() callback and return
an error if it is 0 at that point.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:06 +01:00
Rafael J. Wysocki 8eb055d3f5 cpufreq: ondemand: Drop unused callback from struct od_ops
The ->freq_increase callback in struct od_ops is never invoked,
so drop it.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:06 +01:00
Rafael J. Wysocki a7f35cffb9 cpufreq: ondemand: Simplify od_update() slightly
Drop some lines of code from od_update() by arranging the statements
in there in a more logical way.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:05 +01:00
Rafael J. Wysocki 07aa4402a0 cpufreq: governor: Use microseconds in sample delay computations
Do not convert microseconds to jiffies and the other way around
in governor computations related to the sampling rate and sample
delay and drop delay_for_sampling_rate() which isn't of any use
then.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:05 +01:00
Rafael J. Wysocki 6e96c5b3ac cpufreq: ondemand: Simplify conditionals in od_dbs_timer()
Reduce the indentation level in the conditionals in od_dbs_timer()
and drop the delay variable from it.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:05 +01:00
Rafael J. Wysocki 57dc3bcd45 cpufreq: governor: Move rate_mult to struct policy_dbs
The rate_mult field in struct od_cpu_dbs_info_s is used by the code
shared with the conservative governor and to access it that code
has to do an ugly governor type check.  However, first of all it
is ever only used for policy->cpu, so it is per-policy rather than
per-CPU and second, it is initialized to 1 by cpufreq_governor_start(),
so if the conservative governor never modifies it, it will have no
effect on the results of any computations.

For these reasons, move rate_mult to struct policy_dbs_info (as a
common field).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:04 +01:00
Rafael J. Wysocki 78347cdb89 cpufreq: governor: Reset sample delay in store_sampling_rate()
If store_sampling_rate() updates the sample delay when the ondemand
governor is in the middle of its high/low dance (OD_SUB_SAMPLE sample
type is set), the governor will still do the bottom half of the
previous sample which may take too much time.

To prevent that from happening, change store_sampling_rate() to always
reset the sample delay to 0 which also is consistent with the new
behavior of cpufreq_governor_limits().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:04 +01:00
Rafael J. Wysocki 4cccf75557 cpufreq: governor: Get rid of the ->gov_check_cpu callback
The way the ->gov_check_cpu governor callback is used by the ondemand
and conservative governors is not really straightforward.  Namely, the
governor calls dbs_check_cpu() that updates the load information for
the policy and the invokes ->gov_check_cpu() for the governor.

To get rid of that entanglement, notice that cpufreq_governor_limits()
doesn't need to call dbs_check_cpu() directly.  Instead, it can simply
reset the sample delay to 0 which will cause a sample to be taken
immediately.  The result of that is practically equivalent to calling
dbs_check_cpu() except that it will trigger a full update of governor
internal state and not just the ->gov_check_cpu() part.

Following that observation, make cpufreq_governor_limits() reset
the sample delay and turn dbs_check_cpu() into a function that will
simply evaluate the load and return the result called dbs_update().

That function can now be called by governors from the routines that
previously were pointed to by ->gov_check_cpu and those routines
can be called directly by each governor instead of dbs_check_cpu().
This way ->gov_check_cpu becomes unnecessary, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:04 +01:00
Rafael J. Wysocki 57eb832f90 cpufreq: governor: Clean up load-related computations
Clean up some load-related computations in dbs_check_cpu() and
cpufreq_governor_start() to get rid of unnecessary operations and
type casts and make the code easier to read.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:03 +01:00
Rafael J. Wysocki 679b8fe43a cpufreq: governor: Fix nice contribution computation in dbs_check_cpu()
The contribution of the CPU nice time to the idle time in dbs_check_cpu()
is computed in a bogus way, as the code may subtract current and previous
nice values for different CPUs.

That doesn't matter for cases when cpufreq policies are not shared,
but may lead to problems otherwise.

Fix the computation and simplify it to avoid taking unnecessary steps.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:03 +01:00
Rafael J. Wysocki e4db2813d2 cpufreq: governor: Avoid atomic operations in hot paths
Rework the handling of work items by dbs_update_util_handler() and
dbs_work_handler() so the former (which is executed in scheduler
paths) only uses atomic operations when absolutely necessary.  That
is, when the policy is shared and dbs_update_util_handler() has
already decided that this is the time to queue up a work item.

In particular, this avoids the atomic ops entirely on platforms where
policy objects are never shared.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:03 +01:00
Rafael J. Wysocki f62b93740c cpufreq: governor: Simplify gov_cancel_work() slightly
The atomic work counter incrementation in gov_cancel_work() is not
necessary any more, because work items won't be queued up after
gov_clear_update_util() anyway, so drop it along with the comment
about how it may be missed by the gov_clear_update_util().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:02 +01:00
Rafael J. Wysocki b9db42730a cpufreq: governor: Avoid irq_work_queue_on() crash on non-SMP ARM
As it turns out, irq_work_queue_on() will crash if invoked on
non-SMP ARM platforms, but in fact it is not necessary to use that
function in the cpufreq governor code (as it doesn't matter to that
code which CPU will handle the irq_work), so change it to always use
irq_work_queue().

Fixes: 8fb47ff100af (cpufreq: governor: Replace timers with utilization update callbacks)
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:02 +01:00
Viresh Kumar a23d6d1809 cpufreq: ondemand: Rearrange od_dbs_timer() to avoid updating delay
Avoid extra checks in od_dbs_timer() by rearranging updates to the
local delay variable in it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:02 +01:00
Viresh Kumar aded387b94 cpufreq: conservative: Update sample_delay_ns immediately
The ondemand governor already updates sample_delay_ns immediately on
updates to the sampling rate, but conservative doesn't do that.

It was left out earlier as the code was really too complex to get
that done easily.  Things are sorted out very well now, however, and
the conservative governor can be modified to follow ondemand in that
respect.

Moreover, since the code needed to implement that in the
conservative governor would be identical to the corresponding
ondemand governor's code, make that code common and change both
governors to use it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:01 +01:00
Viresh Kumar 581c214b21 cpufreq: governor: No need to manage state machine now
The cpufreq core now guarantees that policy->rwsem won't be dropped
while running the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT
event and will be held acquired until the complete sequence of governor
state changes has finished.

This allows governor state machine checks to be dropped from multiple
functions in cpufreq_governor.c.

This also means that policy_dbs->policy can be initialized upfront, so
the entire initialization of struct policy_dbs can be carried out in
one place.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:01 +01:00
Viresh Kumar 99522fe678 cpufreq: Remove cpufreq_governor_lock
We used to drop policy->rwsem just before calling __cpufreq_governor()
in some cases earlier and so it was possible that __cpufreq_governor()
ran concurrently via separate threads for the same policy.

In order to guarantee valid state transitions for governors,
'governor_enabled' was required to be protected using some locking
and cpufreq_governor_lock was added for that.

But now __cpufreq_governor() is always called under policy->rwsem,
and 'governor_enabled' is protected against races even without
cpufreq_governor_lock.

Get rid of the extra lock now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw : Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:01 +01:00
Viresh Kumar 49f18560f8 cpufreq: Call __cpufreq_governor() with policy->rwsem held
The cpufreq core code is not consistent with respect to invoking
__cpufreq_governor() under policy->rwsem.

Changing all code to always hold policy->rwsem around
__cpufreq_governor() invocations will allow us to remove
cpufreq_governor_lock that is used today because we can't
guarantee that __cpufreq_governor() isn't executed twice in
parallel for the same policy.

We should also ensure that policy->rwsem is held across governor
state changes.

For example, while adding a CPU to the policy in the CPU online path,
we need to stop the governor, change policy->cpus, start the governor
and then refresh its limits. The complete sequence must be guaranteed
to complete without interruptions by concurrent governor state
updates.  That can be achieved by holding policy->rwsem around those
sequences of operations.

Also note that after this patch cpufreq_driver->stop_cpu() and
->exit() will get called under policy->rwsem which wasn't the case
earlier. That shouldn't have any side effects, though.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:00 +01:00
Viresh Kumar 69cee7147b cpufreq: Merge cpufreq_offline_prepare/finish routines
Commit 1aee40ac9c (cpufreq: Invoke __cpufreq_remove_dev_finish()
after releasing cpu_hotplug.lock) split the cpufreq's CPU offline
routine in two pieces, one of them to be run with CPU offline/online
locked and the other to be called later.  The reason for that split
was a possible deadlock scenario involving cpufreq sysfs attributes
and CPU offline.

However, the handling of CPU offline in cpufreq has changed since
then.  Policy sysfs attributes are never removed during CPU offline,
so there's no need to worry about accessing them during CPU offline,
because that can't lead to any deadlocks now.  Governor sysfs
attributes are still removed in __cpufreq_governor(_EXIT), but
there is a new kobject type for them now and its show/store
callbacks don't lock CPU offline/online (they don't need to do
that).

This means that the CPU offline code in cpufreq doesn't need to
be split any more, so combine cpufreq_offline_prepare() with
cpufreq_offline_finish().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:00 +01:00
Viresh Kumar c54df07184 cpufreq: governor: Create and traverse list of policy_dbs to avoid deadlock
The dbs_data_mutex lock is currently used in two places.  First,
cpufreq_governor_dbs() uses it to guarantee mutual exclusion between
invocations of governor operations from the core.  Second, it is used by
ondemand governor's update_sampling_rate() to ensure the stability of
data structures walked by it.

The second usage is quite problematic, because update_sampling_rate() is
called from a governor sysfs attribute's ->store callback and that leads
to a deadlock scenario involving cpufreq_governor_exit() which runs
under dbs_data_mutex.  Thus it is better to rework the code so
update_sampling_rate() doesn't need to acquire dbs_data_mutex.

To that end, rework update_sampling_rate() to walk a list of policy_dbs
objects supported by the dbs_data one it has been called for (instead of
walking cpu_dbs_info object for all CPUs).  The list manipulation is
protected with dbs_data->mutex which also is held around the execution
of update_sampling_rate(), it is not necessary to hold dbs_data_mutex in
that function any more.

Reported-by: Juri Lelli <juri.lelli@arm.com>
Reported-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:59 +01:00
Viresh Kumar 68e80dae09 Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT"
Earlier, when the struct freq-attr was used to represent governor
attributes, the standard cpufreq show/store sysfs attribute callbacks
were applied to the governor tunable attributes and they always acquire
the policy->rwsem lock before carrying out the operation.  That could
have resulted in an ABBA deadlock if governor tunable attributes are
removed under policy->rwsem while one of them is being accessed
concurrently (if sysfs attributes removal wins the race, it will wait
for the access to complete with policy->rwsem held while the attribute
callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.

The previous commit, "cpufreq: governor: New sysfs show/store callbacks
for governor tunables", fixed the original ABBA deadlock by adding new
governor specific show/store callbacks.

We don't have to drop rwsem around invocations of governor event
CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.

Fixes: 955ef48335 (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:59 +01:00
Viresh Kumar fd8ddc482a cpufreq: governor: Drop unused macros for creating governor tunable attributes
The previous commit introduced a new set of macros for creating sysfs
attributes that represent governor tunables and the old macros used for
this purpose are not needed any more, so drop them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:58 +01:00
Viresh Kumar c443563036 cpufreq: governor: New sysfs show/store callbacks for governor tunables
The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use the same governor with different tunables at the
same time and, consequently, on where those attributes are located in
sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable attributes
and they always acquire the policy->rwsem lock before carrying out the
operation.  That may lead to an ABBA deadlock if governor tunable
attributes are removed under policy->rwsem while one of them is being
accessed concurrently (if sysfs attributes removal wins the race, it
will wait for the access to complete with policy->rwsem held while the
attribute callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.  Therefore
policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
but the deadlock situation described above must be avoided too.

To that end, use the observation that in principle governor tunables may
be represented by the same data type regardless of whether the governor
is system-wide or per-policy and introduce a new structure, struct
governor_attr, for representing them and new corresponding macros for
creating show/store sysfs callbacks for them.  Also make their parent
kobject use a new kobject type whose default show/store callbacks are
not related to the standard core cpufreq ones in any way (and they don't
acquire policy->rwsem in particular).

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Subject & changelog + rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:58 +01:00
Viresh Kumar ff4b17895e cpufreq: governor: Move common tunables to 'struct dbs_data'
There are a few common tunables shared between the ondemand and
conservative governors.  Move them to struct dbs_data to simplify
code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:58 +01:00
Viresh Kumar d0684d3b89 cpufreq: governor: Create generic macro for common tunables
Some tunables are present in governor-specific structures, whereas one
(min_sampling_rate) is located directly in struct dbs_data.

There is a special macro for creating its sysfs attribute and the
show/store callbacks, but since more tunables are going to be moved
to struct dbs_data, a new generic macro for such cases will be useful,
so add it and use it for min_sampling_rate.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:57 +01:00
Rafael J. Wysocki fafd5e8ab2 cpufreq: governor: Drop pointless goto from cpufreq_governor_init()
It is silly to jump around "return 0", so don't do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:57 +01:00
Rafael J. Wysocki 686cc637c9 cpufreq: governor: Rename skip_work to work_count
The skip_work field in struct policy_dbs_info technically is a
counter, so give it a new name to reflect that.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:57 +01:00
Rafael J. Wysocki cea6a9e772 cpufreq: governor: Symmetrize cpu_dbs_info initialization and cleanup
Make the initialization of struct cpu_dbs_info objects in
alloc_policy_dbs_info() and the code that cleans them up in
free_policy_dbs_info() more symmetrical.  In particular,
set/clear the update_util.func field in those functions along
with the policy_dbs field.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:56 +01:00
Rafael J. Wysocki bc505475b8 cpufreq: governor: Rearrange governor data structures
The struct policy_dbs_info objects representing per-policy governor
data are not accessible directly from the corresponding policy
objects.  To access them, one has to get a pointer to the
struct cpu_dbs_info of policy->cpu and use the policy_dbs field of
that which isn't really straightforward.

To address that rearrange the governor data structures so the
governor_data pointer in struct cpufreq_policy will point to
struct policy_dbs_info (instead of struct dbs_data) and that will
contain a pointer to struct dbs_data.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:56 +01:00
Rafael J. Wysocki e975189400 cpufreq: governor: Simplify cpufreq_governor_limits()
Use the observation that cpufreq_governor_limits() doesn't have to
get to the policy object it wants to manipulate by walking the
reference chain cdbs->policy_dbs->policy, as the final pointer is
actually equal to its argument, and make it access the policy
object directy via its argument.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:56 +01:00
Rafael J. Wysocki d10b5eb5fc cpufreq: governor: Drop cpu argument from dbs_check_cpu()
Since policy->cpu is always passed as the second argument to
dbs_check_cpu(), it is not really necessary to pass it, because
the function can obtain that value via its first argument just fine.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:55 +01:00
Rafael J. Wysocki e40e7b255e cpufreq: governor: Rename cpu_common_dbs_info to policy_dbs_info
The struct cpu_common_dbs_info structure represents the per-policy
part of the governor data (for the ondemand and conservative
governors), but its name doesn't reflect its purpose.

Rename it to struct policy_dbs_info and rename variables related to
it accordingly.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:55 +01:00
Rafael J. Wysocki ea59ee0dc9 cpufreq: governor: Drop the gov pointer from struct dbs_data
Since it is possible to obtain a pointer to struct dbs_governor
from a pointer to the struct governor embedded in it with the help
of container_of(), the additional gov pointer in struct dbs_data
isn't really necessary.

Drop that pointer and make the code using it reach the dbs_governor
object via policy->governor.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:55 +01:00
Rafael J. Wysocki 906a6e5aae cpufreq: governor: Rework cpufreq_governor_dbs()
Since it is possible to obtain a pointer to struct dbs_governor
from a pointer to the struct governor embedded in it via
container_of(), the second argument of cpufreq_governor_init()
is not necessary.  Accordingly, cpufreq_governor_dbs() doesn't
need its second argument either and the ->governor callbacks
for both the ondemand and conservative governors may be set
to cpufreq_governor_dbs() directly.  Make that happen.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:54 +01:00
Rafael J. Wysocki 7bdad34d08 cpufreq: governor: Rename some data types and variables
The ondemand and conservative governors are represented by
struct common_dbs_data whose name doesn't reflect the purpose it
is used for, so rename it to struct dbs_governor and rename
variables of that type accordingly.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:54 +01:00
Rafael J. Wysocki af92618523 cpufreq: governor: Put governor structure into common_dbs_data
For the ondemand and conservative governors (generally, governors
that use the common code in cpufreq_governor.c), there are two static
data structures representing the governor, the struct governor
structure (the interface to the cpufreq core) and the struct
common_dbs_data one (the interface to the cpufreq_governor.c code).

There's no fundamental reason why those two structures have to be
separate.  Moreover, if the struct governor one is included into
struct common_dbs_data, it will be possible to reach the latter from
the policy via its policy->governor pointer, so it won't be necessary
to pass a separate pointer to it around.  For this reason, embed
struct governor in struct common_dbs_data.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:54 +01:00
Rafael J. Wysocki 5da3dd1e00 cpufreq: governor: Avoid passing dbs_data pointers around unnecessarily
Do not pass struct dbs_data pointers to the family of functions
implementing governor operations in cpufreq_governor.c as they can
take that pointer from policy->governor by themselves.

The cpufreq_governor_init() case is slightly more complicated, since
policy->governor may be NULL when it is invoked, but then it can reach
the pointer in question via its cdata argument just fine.

While at it, rework cpufreq_governor_dbs() to avoid a pointless
policy_governor check in the CPUFREQ_GOV_POLICY_INIT case.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:53 +01:00
Rafael J. Wysocki 2bb8d94fb0 cpufreq: governor: Use common mutex for dbs_data protection
Every governor relying on the common code in cpufreq_governor.c
has to provide its own mutex in struct common_dbs_data.  However,
there actually is no need to have a separate mutex per governor
for this purpose, they may be using the same global mutex just
fine.  Accordingly, introduce a single common mutex for that and
drop the mutex field from struct common_dbs_data.

That at least will ensure that the mutex is always present and
initialized regardless of what the particular governors do.

Another benefit is that the common code does not need a pointer to
a governor-related structure to get to the mutex which sometimes
helps.

Finally, it makes the code generally easier to follow.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:53 +01:00
Rafael J. Wysocki 9be4fd2c77 cpufreq: governor: Replace timers with utilization update callbacks
Instead of using a per-CPU deferrable timer for queuing up governor
work items, register a utilization update callback that will be
invoked from the scheduler on utilization changes.

The sampling rate is still the same as what was used for the
deferrable timers and the added irq_work overhead should be offset by
the eliminated timers overhead, so in theory the functional impact of
this patch should not be significant.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
2016-03-09 14:40:53 +01:00
Rafael J. Wysocki a4675fbc4a cpufreq: intel_pstate: Replace timers with utilization update callbacks
Instead of using a per-CPU deferrable timer for utilization sampling
and P-states adjustments, register a utilization update callback that
will be invoked from the scheduler on utilization changes.

The sampling rate is still the same as what was used for the deferrable
timers, so the functional impact of this patch should not be significant.

Based on an earlier patch from Srinivas Pandruvada.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-03-09 14:40:52 +01:00
Rafael J. Wysocki 34e2c555f3 cpufreq: Add mechanism for registering utilization update callbacks
Introduce a mechanism by which parts of the cpufreq subsystem
("setpolicy" drivers or the core) can register callbacks to be
executed from cpufreq_update_util() which is invoked by the
scheduler's update_load_avg() on CPU utilization changes.

This allows the "setpolicy" drivers to dispense with their timers
and do all of the computations they need and frequency/voltage
adjustments in the update_load_avg() code path, among other things.

The update_load_avg() changes were suggested by Peter Zijlstra.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
2016-03-09 14:39:19 +01:00
Rafael J. Wysocki ed757a2c7b cpufreq: acpi-cpufreq: Make read and write operations more efficient
Setting a new CPU frequency and reading the current request value
in the ACPI cpufreq driver involves each at least two switch
instructions (there's more if the policy is shared).  One of
them is present in drv_read/write() that prepares a command
structure and the other happens in subsequent do_drv_read/write()
when that structure is interpreted.  However, all of those switches
may be avoided by using function pointers.

To that end, add two function pointers to struct acpi_cpufreq_data
to represent read and write operations on the frequency register
and set them up during policy intitialization to point to the pair
of routines suitable for the given processor (Intel/AMD MSR access
or I/O port access).  Then, use those pointers in do_drv_read/write()
and modify drv_read/write() to prepare the command structure for
them without any checks.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-03 03:57:50 +01:00
Arnd Bergmann 3c2002aec3 cpufreq: mediatek: allow building as a module
The MT8173 cpufreq driver can currently only be built-in, but
it has a Kconfig dependency on the thermal core. THERMAL
can be a loadable module, which in turn makes this driver
impossible to build.

It is nicer to make the cpufreq driver a module as well, so
this patch turns the option in to a 'tristate' and adapts
the dependency accordingly.

The driver has no module_exit() function, so it will continue
to not support unloading, but it can be built as a module
and loaded at runtime now.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5269e7067c (cpufreq: Add ARM_MT8173_CPUFREQ dependency on THERMAL)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-01 02:43:05 +01:00
Arnd Bergmann ddd30ef474 cpufreq: qoriq: allow building as module with THERMAL=m
My previous patch to avoid link errors with the qoriq cpufreq
driver disallowed all of the broken cases, but also prevented
the driver from being built when CONFIG_THERMAL is a module.

This changes the dependency to allow the cpufreq driver to
also be a module in this case, just not built-in.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 8ae1702a0d (cpufreq: qoriq: Register cooling device based on device tree)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-01 02:43:05 +01:00
Arnd Bergmann 1c277cae14 This is the pxa changes for v4.6 cycle.
This is a minor cycle with :
  - cleanup fixes from Arnd, mainly build oriented and sparse type ones
  - dma fixes for requestors above 32 (impacting mainly camera driver)
  - some minor cleanup on pxa3xx device-tree side
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW0Y3SAAoJEAP2et0duMsSfesQAIGKccj/WRWtlCxje+8Aj3UW
 eu6vV9xGVBZu0RJxCchwCBP5ZO8WSvXBYvX5MIaBH2TFdJqJEjie6ES4mRlGVuup
 XMUZybcFOu9N5j0WMddj2OHhEeYGgMsSZwSOgK4nsQ/eyhrcI5fjykdafmPKywjW
 +/USV90ucVp+38C+yg4yXSuI8FOEABIn6VoX//+YpDZetvIoJCUQne1g+uPE2YoF
 dcPqVbY2OeAvuABkI3Wqdrc3Ico9i8Ns8erg8EuDe5xv2TvJXhn/mIeoVNZ2s4So
 5aCD87RQu3rwDfqAv6FzW06k9AYAE/p/VKh0smI12D8MCxhSD0EZP+jBfDvuwx/n
 IICMH7YuunRSRe7VDFgWyIz7wduHQu3xctF9scSYZD+kBpvAD274sZs9WYs4fGd6
 hXxxV4iXrP8af6A+sddDeo0Gq25jC7JoL43YlQTUzBMqbrzK4W3c0dXCib6hh/eN
 W/YVdPERGs9cTG/IZFbn3cn1QIYdA4exNoE38txWkKpeiVxu0tZKSsg6m9xCPu7+
 vMQj1m8E8mkgiMq0BJnei02QC8Xw5ekf4wNUsLOOou29c7CTt3zicM8YtnDmANV2
 fIIT4BK0izIZj4N0RZp9KT6h/IkF1VHRz3pcw3vYXJLbaKqHk/6doDUW30E/Ol1f
 PzJedKaWujhOV1DG1SLb
 =lBYC
 -----END PGP SIGNATURE-----

Merge tag 'pxa-for-4.6' of https://github.com/rjarzmik/linux into next/soc

Merge "pxa changes for v4.6 cycle" from Robert Jarzmik:

This is a minor cycle with :
 - cleanup fixes from Arnd, mainly build oriented and sparse type ones
 - dma fixes for requestors above 32 (impacting mainly camera driver)
 - some minor cleanup on pxa3xx device-tree side

* tag 'pxa-for-4.6' of https://github.com/rjarzmik/linux:
  dmaengine: pxa_dma: fix the maximum requestor line
  ARM: pxa: add the number of DMA requestor lines
  dmaengine: mmp-pdma: add number of requestors
  dma: mmp_pdma: Add the #dma-requests DT property documentation
  ARM: pxa: pxa3xx device-tree support cleanup
  ARM: pxa: don't select RFKILL if CONFIG_NET is disabled
  ARM: pxa: fix building without IWMMXT
  ARM: pxa: move extern declarations to pm.h
  ARM: pxa: always select one of the two CPU types
  ARM: pxa: don't select GPIO_SYSFS for MIOA701
  ARM: pxa: mark unused eseries code as __maybe_unused
  ARM: pxa: mark spitz_card_pwr_ctrl as __maybe_unused
  ARM: pxa: define clock registers as __iomem
2016-03-01 00:24:43 +01:00
Shilpasri G Bhat c5e29ea7ac cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}
Unregister the notifiers if cpufreq_driver_register() fails in
powernv_cpufreq_init(). Re-arrange the unregistration and cleanup routines
in powernv_cpufreq_exit() to free all the resources after the driver
has unregistered.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-26 22:18:19 +01:00
Srinivas Pandruvada f05c966585 cpufreq: intel_pstate: disable HWP notifications
Disable HWP Interrupt notification before enabling HWP. Since we don't
have HWP interrupt handling for possible performance interrupts, there
is not much use of enabling HWP interrupts.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-26 22:15:38 +01:00
Srinivas Pandruvada 7791e4aa59 cpufreq: intel_pstate: Enable HWP by default
If the processor supports HWP, enable it by default without checking
for the cpu model. This will allow to enable HWP in all supported
processors without driver change.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-26 22:15:38 +01:00
Rafael J. Wysocki 9a909a142f cpufreq: acpi-cpufreq: Drop pointless label from acpi_cpufreq_target()
The "out" label at the final return statement in acpi_cpufreq_target()
is totally pointless, so drop them and modify the code to return the
right values immediately instead of jumping to it.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-02-26 22:11:55 +01:00
Rafael J. Wysocki 6019d23a73 cpufreq: Rearrange __cpufreq_driver_target()
Drop a pointless label at a return statement from
__cpufreq_driver_target() and rearrange that function
to reduce the indentation level.

No intentional functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-02-26 22:11:55 +01:00
Olof Johansson 6997e172dc Exynos-specific driver changes for v4.6:
1. Minor cleanup in s5pv210-cpufreq driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWwRh1AAoJEME3ZuaGi4PXyggP/RrBIvcSMAcGsmKZVYb0+osL
 IOy363iayW2D92QlyPOx5CSML9TenVwi5F+sNeKnvMevb6YxWR3HrixI6cDOmmnx
 IDEa6/cW8pmbDxea2an79sW4LRUnFuZA+UfdcLMFRFUF6HYKeU/G8zACncX2x+4H
 IcYnsNSuUAFn9twIWa3eLBLW46x+8L5YUAdrEd6HdfYhQ0+pK/KjBEcje1jY4nwM
 megjhwT6WevmRxzEp1vbGNVJIfFRy1CSAOm+40gmFp8YJLAEkybGnHgbFojNESF7
 3BV0ncuKJsk0HylR3A2TCwBeaUbgcNgJoI90B7pgQUOBvL7St+x3PuXqR5dbV3xg
 TtC461pNV5Yw2F5S1ZkqVxr4u6ThNpBtao+AQjBTQVL1RuO3v/gmBFL03jd2xc8I
 Mn4FVQDTOZmEJIhpZsvk27ayJMkOF5Gtgd2T7rhlEdy8s8EuXjYOAlRvA9C2BgXx
 DxE1BUwnZsjY3ygq3BLO5vVuPjFlcYytOo1tsswzrsayI3UmJkbUWqThliXj2pXW
 t4OWirqX0q1nt5U/e3SsEgTyHBqcltwwcrhEt5CFTccAstQfdukgiOxSeMEvO5ie
 o9oseH+FP//5/DpuLOnkaRXJlLCuNw9ZTK4S2y/Gnfw0Qh8kwWNzuqBH+h2cQmZe
 eUzaSkpq6YFh6dCNxABJ
 =nkpz
 -----END PGP SIGNATURE-----

Merge tag 'samsung-drivers-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into next/drivers

Exynos-specific driver changes for v4.6:
1. Minor cleanup in s5pv210-cpufreq driver.

* tag 'samsung-drivers-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  cpufreq: s5pv210: remove superfluous CONFIG_PM ifdefs

Signed-off-by: Olof Johansson <olof@lixom.net>
2016-02-24 14:30:05 -08:00
Viresh Kumar 41cfd64cf4 intel_pstate: Update frequencies of policy->cpus only from ->set_policy()
The intel-pstate driver is using intel_pstate_hwp_set() from two
separate paths, i.e. ->set_policy() callback and sysfs update path for
the files present in /sys/devices/system/cpu/intel_pstate/ directory.

While an update to the sysfs path applies to all the CPUs being managed
by the driver (which essentially means all the online CPUs), the update
via the ->set_policy() callback applies to a smaller group of CPUs
managed by the policy for which ->set_policy() is called.

And so, intel_pstate_hwp_set() should update frequencies of only the
CPUs that are part of policy->cpus mask, while it is called from
->set_policy() callback.

In order to do that, add a parameter (cpumask) to intel_pstate_hwp_set()
and apply the frequency changes only to the concerned CPUs.

For ->set_policy() path, we are only concerned about policy->cpus, and
so policy->rwsem lock taken by the core prior to calling ->set_policy()
is enough to take care of any races. The larger lock acquired by
get_online_cpus() is required only for the updates to sysfs files.

Add another routine, intel_pstate_hwp_set_online_cpus(), and call it
from the sysfs update paths.

This also fixes a lockdep reported recently, where policy->rwsem and
get_online_cpus() could have been acquired in any order causing an ABBA
deadlock. The sequence of events leading to that was:

intel_pstate_init(...)
	...cpufreq_online(...)
		down_write(&policy->rwsem); // Locks policy->rwsem
		...
		cpufreq_init_policy(policy);
			...intel_pstate_hwp_set();
				get_online_cpus(); // Temporarily locks cpu_hotplug.lock
		...
		up_write(&policy->rwsem);

pm_suspend(...)
	...disable_nonboot_cpus()
		_cpu_down()
			cpu_hotplug_begin(); // Locks cpu_hotplug.lock
			__cpu_notify(CPU_DOWN_PREPARE, ...);
				...cpufreq_offline_prepare();
					down_write(&policy->rwsem); // Locks policy->rwsem

Reported-and-tested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-23 01:10:44 +01:00
Eric Biggers fd7dc7e6b6 cpufreq: simplify for_each_suitable_policy() macro
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-22 13:56:41 +01:00
Eric Biggers 63af405572 cpufreq: fix comment about return value of cpufreq_register_driver()
The comment has been incorrect since commit 4dea5806d3
("cpufreq: return EEXIST instead of EBUSY for second registering").

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-22 13:56:41 +01:00
Rafael J. Wysocki 6541aef01a cpufreq: Drop unnecessary checks from show() and store()
The show() and store() routines in the cpufreq core don't need to
check if the struct freq_attr they want to use really provides the
callbacks they need as expected (if that's not the case, it means
a bug in the code anyway), so change them to avoid doing that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-02-12 23:56:21 +01:00
Viresh Kumar dd02a3d920 cpufreq: dt: No need to allocate resources anymore
OPP layer manages it now and cpufreq-dt driver doesn't need it. But, we
still need to check for availability of resources for deferred probing.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar df2c8ec28e cpufreq: dt: No need to fetch voltage-tolerance
Its already done by core and we don't need to get it anymore.  And so,
we don't need to get of node in cpufreq_init() anymore, move that to
find_supply_name() instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar 78c3ba5df9 cpufreq: dt: Use dev_pm_opp_set_rate() to switch frequency
OPP core supports frequency/voltage changes based on the target
frequency now, use that instead of open coding the same in cpufreq-dt
driver.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar 755b888ff0 cpufreq: dt: Reuse dev_pm_opp_get_max_transition_latency()
OPP layer has all the information now to calculate transition latency
(clock_latency + voltage_latency). Lets reuse the OPP layer helper
dev_pm_opp_get_max_transition_latency() instead of open coding the same
in cpufreq-dt driver.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar 6def6ea75e cpufreq: dt: Unsupported OPPs are already disabled
The core already have a valid regulator set for the device opp and the
unsupported OPPs are already disabled by the core. There is no need to
repeat that in the user drivers, get rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar 050794aaeb cpufreq: dt: Pass regulator name to the OPP core
OPP core can handle the regulators by itself, and but it needs to know
the name of the regulator to fetch. Add support for that.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:36 +01:00
Viresh Kumar 391d9aef81 cpufreq: dt: OPP layers handles clock-latency for V1 bindings as well
"clock-latency" is handled by OPP layer for all bindings and so there is
no need to make special calls for V1 bindings. Use
dev_pm_opp_get_max_clock_latency() for both the cases.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:36 +01:00
Viresh Kumar 457e99e60a cpufreq: dt: Rename 'need_update' to 'opp_v1'
That's the real purpose of this field, i.e. to take special care of old
OPP V1 bindings. Lets name it accordingly, so that it can be used
elsewhere.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:36 +01:00
Viresh Kumar 896d6a4c0f cpufreq: dt: Convert few pr_debug/err() calls to dev_dbg/err()
We have the device structure available now, lets use it for better print
messages.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:36 +01:00
Shilpasri G Bhat c89f2682a3 cpufreq: powernv: Replace pr_info with trace print for throttle event
Currently we use printk message to notify the throttle event. But this
can flood the console if the cpu is throttled frequently. So replace the
printk with the tracepoint to notify the throttle event. And also events
like throttle below nominal frequency and OCC_RESET are reduced to
pr_warn/pr_warn_once as pointed by MFG to not mark them as critical
messages. This patch adds 'throttle_reason' to struct chip to store the
throttle reason.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-05 02:38:03 +01:00
Shilpasri G Bhat 96c4726f01 cpufreq: powernv: Remove cpu_to_chip_id() from hot-path
cpu_to_chip_id() does a DT walk through to find out the chip id by
taking a contended device tree lock. This adds an unnecessary overhead
in a hot path. So instead of calling cpu_to_chip_id() everytime cache
the chip ids for all cores in the array 'core_to_chip_map' and use it
in the hotpath.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-05 02:38:02 +01:00
Shilpasri G Bhat 6d167a44e6 cpufreq: powernv: Hot-plug safe the kworker thread
In the kworker_thread powernv_cpufreq_work_fn(), we can end up
sending an IPI to a cpu going offline. This is a rare corner case
which is fixed using {get/put}_online_cpus(). Along with this fix,
this patch adds changes to do oneshot cpumask_{clear/and} operation.

Suggested-by: Shreyas B Prabhu <shreyas@linux.vnet.ibm.com>
Suggested-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-05 02:38:02 +01:00
Shilpasri G Bhat 86622cb8c5 cpufreq: powernv: Free 'chips' on module exit
This will free the dynamically allocated memory of 'chips' on
module exit.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-05 02:38:01 +01:00
Rafael J. Wysocki de1df26b7c cpufreq: Clean up default and fallback governor setup
The preprocessor magic used for setting the default cpufreq governor
(and for using the performance governor as a fallback one for that
matter) is really nasty, so replace it with __weak functions and
overrides.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-02-05 02:37:42 +01:00
Arnd Bergmann ea7743e271 ARM: pxa: define clock registers as __iomem
We should not dereference registers as pointers, so use readl/writel
instead for these registers.

The clock registers are accessed in multiple files, so we have to
change them all at once.

I stumbled over these registers while looking at something unrelated.
There are in fact other registers with the same problem, but I did
not try to address those at this point.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
2016-02-01 21:43:41 +01:00
Rafael J. Wysocki ad1ac94767 Merge branches 'pm-cpuidle', 'pm-cpufreq', 'pm-domains' and 'pm-sleep'
* pm-cpuidle:
  cpuidle: coupled: remove unused define cpuidle_coupled_lock
  cpuidle: fix fallback mechanism for suspend to idle in absence of enter_freeze

* pm-cpufreq:
  cpufreq: cpufreq-dt: avoid uninitialized variable warnings:
  cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototype
  cpufreq: Use list_is_last() to check last entry of the policy list
  cpufreq: Fix NULL reference crash while accessing policy->governor_data

* pm-domains:
  PM / Domains: Fix typo in comment
  PM / Domains: Fix potential deadlock while adding/removing subdomains
  PM / domains: fix lockdep issue for all subdomains

* pm-sleep:
  PM: APM_EMULATION does not depend on PM
2016-01-29 21:45:17 +01:00
Arnd Bergmann b331bc20d9 cpufreq: cpufreq-dt: avoid uninitialized variable warnings:
gcc warns quite a bit about values returned from allocate_resources()
in cpufreq-dt.c:

cpufreq-dt.c: In function 'cpufreq_init':
cpufreq-dt.c:327:6: error: 'cpu_dev' may be used uninitialized in this function [-Werror=maybe-uninitialized]
cpufreq-dt.c:197:17: note: 'cpu_dev' was declared here
cpufreq-dt.c:376:2: error: 'cpu_clk' may be used uninitialized in this function [-Werror=maybe-uninitialized]
cpufreq-dt.c:199:14: note: 'cpu_clk' was declared here
cpufreq-dt.c: In function 'dt_cpufreq_probe':
cpufreq-dt.c:461:2: error: 'cpu_clk' may be used uninitialized in this function [-Werror=maybe-uninitialized]
cpufreq-dt.c:447:14: note: 'cpu_clk' was declared here

The problem is that it's slightly hard for gcc to follow return
codes across PTR_ERR() calls.

This patch uses explicit assignments to the "ret" variable to make
it easier for gcc to verify that the code is actually correct,
without the need to add a bogus initialization.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-27 23:23:54 +01:00
Arnd Bergmann fb2a24a1c6 cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototype
There are two definitions of pxa_cpufreq_change_voltage, with slightly
different prototypes after one of them had its argument marked 'const'.
Now the other one (for !CONFIG_REGULATOR) produces a harmless warning:

drivers/cpufreq/pxa2xx-cpufreq.c: In function 'pxa_set_target':
drivers/cpufreq/pxa2xx-cpufreq.c:291:36: warning: passing argument 1 of 'pxa_cpufreq_change_voltage' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
   ret = pxa_cpufreq_change_voltage(&pxa_freq_settings[idx]);
                                    ^
drivers/cpufreq/pxa2xx-cpufreq.c:205:12: note: expected 'struct pxa_freqs *' but argument is of type 'const struct pxa_freqs *'
 static int pxa_cpufreq_change_voltage(struct pxa_freqs *pxa_freq)
            ^

This changes the prototype in the same way as the other, which
avoids the warning.

Fixes: 03c2299063 (cpufreq: pxa: make pxa_freqs arrays const)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.2+ <stable@vger.kernel.org> # 4.2+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-27 23:22:50 +01:00
Gautham R Shenoy 2dadfd7564 cpufreq: Use list_is_last() to check last entry of the policy list
Currently next_policy() explicitly checks if a policy is the last
policy in the cpufreq_policy_list. Use the standard list_is_last
primitive instead.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-27 23:13:59 +01:00
Viresh Kumar e4b133cc4b cpufreq: Fix NULL reference crash while accessing policy->governor_data
There is a race discovered by Juri, where we are able to:
- create and read a sysfs file before policy->governor_data is being set
  to a non NULL value.
  OR
- set policy->governor_data to NULL, and reading a file before being
  destroyed.

And so such a crash is reported:

Unable to handle kernel NULL pointer dereference at virtual address 0000000c
pgd = edfc8000
[0000000c] *pgd=bfc8c835
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 4 PID: 1730 Comm: cat Not tainted 4.5.0-rc1+ #463
Hardware name: ARM-Versatile Express
task: ee8e8480 ti: ee930000 task.ti: ee930000
PC is at show_ignore_nice_load_gov_pol+0x24/0x34
LR is at show+0x4c/0x60
pc : [<c058f1bc>]    lr : [<c058ae88>]    psr: a0070013
sp : ee931dd0  ip : ee931de0  fp : ee931ddc
r10: ee4bc290  r9 : 00001000  r8 : ef2cb000
r7 : ee4bc200  r6 : ef2cb000  r5 : c0af57b0  r4 : ee4bc2e0
r3 : 00000000  r2 : 00000000  r1 : c0928df4  r0 : ef2cb000
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: adfc806a  DAC: 00000051
Process cat (pid: 1730, stack limit = 0xee930210)
Stack: (0xee931dd0 to 0xee932000)
1dc0:                                     ee931dfc ee931de0 c058ae88 c058f1a4
1de0: edce3bc0 c07bfca4 edce3ac0 00001000 ee931e24 ee931e00 c01fcb90 c058ae48
1e00: 00000001 edce3bc0 00000000 00000001 ee931e50 ee8ff480 ee931e34 ee931e28
1e20: c01fb33c c01fcb0c ee931e8c ee931e38 c01a5210 c01fb314 ee931e9c ee931e48
1e40: 00000000 edce3bf0 befe4a00 ee931f78 00000000 00000000 000001e4 00000000
1e60: c00545a8 edce3ac0 00001000 00001000 befe4a00 ee931f78 00000000 00001000
1e80: ee931ed4 ee931e90 c01fbed8 c01a5038 ed085a58 00020000 00000000 00000000
1ea0: c0ad72e4 ee931f78 ee8ff488 ee8ff480 c077f3fc 00001000 befe4a00 ee931f78
1ec0: 00000000 00001000 ee931f44 ee931ed8 c017c328 c01fbdc4 00001000 00000000
1ee0: ee8ff480 00001000 ee931f44 ee931ef8 c017c65c c03deb10 ee931fac ee931f08
1f00: c0009270 c001f290 c0a8d968 ef2cb000 ef2cb000 ee8ff480 00000020 ee8ff480
1f20: ee8ff480 befe4a00 00001000 ee931f78 00000000 00000000 ee931f74 ee931f48
1f40: c017d1ec c017c2f8 c019c724 c019c684 ee8ff480 ee8ff480 00001000 befe4a00
1f60: 00000000 00000000 ee931fa4 ee931f78 c017d2a8 c017d160 00000000 00000000
1f80: 000a9f20 00001000 befe4a00 00000003 c000ffe4 ee930000 00000000 ee931fa8
1fa0: c000fe40 c017d264 000a9f20 00001000 00000003 befe4a00 00001000 00000000
Unable to handle kernel NULL pointer dereference at virtual address 0000000c
1fc0: 000a9f20 00001000 befe4a00 00000003 00000000 00000000 00000003 00000001
pgd = edfc4000
[0000000c] *pgd=bfcac835
1fe0: 00000000 befe49dc 000197f8 b6e35dfc 60070010 00000003 3065b49d 134ac2c9

[<c058f1bc>] (show_ignore_nice_load_gov_pol) from [<c058ae88>] (show+0x4c/0x60)
[<c058ae88>] (show) from [<c01fcb90>] (sysfs_kf_seq_show+0x90/0xfc)
[<c01fcb90>] (sysfs_kf_seq_show) from [<c01fb33c>] (kernfs_seq_show+0x34/0x38)
[<c01fb33c>] (kernfs_seq_show) from [<c01a5210>] (seq_read+0x1e4/0x4e4)
[<c01a5210>] (seq_read) from [<c01fbed8>] (kernfs_fop_read+0x120/0x1a0)
[<c01fbed8>] (kernfs_fop_read) from [<c017c328>] (__vfs_read+0x3c/0xe0)
[<c017c328>] (__vfs_read) from [<c017d1ec>] (vfs_read+0x98/0x104)
[<c017d1ec>] (vfs_read) from [<c017d2a8>] (SyS_read+0x50/0x90)
[<c017d2a8>] (SyS_read) from [<c000fe40>] (ret_fast_syscall+0x0/0x1c)
Code: e5903044 e1a00001 e3081df4 e34c1092 (e593300c)
---[ end trace 5994b9a5111f35ee ]---

Fix that by making sure, policy->governor_data is updated at the right
places only.

Cc: 4.2+ <stable@vger.kernel.org> # 4.2+
Reported-and-tested-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-27 23:13:53 +01:00
Bartlomiej Zolnierkiewicz 9b3f105ef6 cpufreq: s5pv210: remove superfluous CONFIG_PM ifdefs
CONFIG_PM ifdefs are superfluous and can be removed.

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Kukjin Kim <kgene@kernel.org>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2016-01-25 15:13:53 +09:00
Linus Torvalds f689b742f2 powerpc updates for 4.5
- Ground work for the new Power9 MMU from Aneesh Kumar K.V
  - Optimise FP/VMX/VSX context switching from Anton Blanchard
 
  - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica Gupta,
    Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling, Andrew Donnellan
  - Allow wrapper to work on non-english system from Laurent Vivier
  - Add rN aliases to the pt_regs_offset table from Rashmica Gupta
  - Fix module autoload for rackmeter & axonram drivers from Luis de Bethencourt
  - Include KVM guest test in all interrupt vectors from Paul Mackerras
  - Fix DSCR inheritance over fork() from Anton Blanchard
  - Make value-returning atomics & {cmp}xchg* & their atomic_ versions fully ordered from Boqun Feng
  - Print MSR TM bits in oops messages from Michael Neuling
  - Add TM signal return & invalid stack selftests from Michael Neuling
  - Limit EPOW reset event warnings from Vipin K Parashar
  - Remove the Cell QPACE code from Rashmica Gupta
  - Append linux_banner to exception information in xmon from Rashmica Gupta
  - Add selftest to check if VSRs are corrupted from Rashmica Gupta
  - Remove broken GregorianDay() from Daniel Axtens
  - Import Anton's context_switch2 benchmark into selftests from Michael Ellerman
  - Add selftest script to test HMI functionality from Daniel Axtens
  - Remove obsolete OPAL v2 support from Stewart Smith
  - Make enter_rtas() private from Michael Ellerman
  - PPR exception cleanups from Michael Ellerman
  - Add page soft dirty tracking from Laurent Dufour
  - Add support for Nvlink NPUs from Alistair Popple
  - Add support for kexec on 476fpe from Alistair Popple
  - Enable kernel CPU dlpar from sysfs from Nathan Fontenot
  - Copy only required pieces of the mm_context_t to the paca from Michael Neuling
  - Add a kmsg_dumper that flushes OPAL console output on panic from Russell Currey
  - Implement save_stack_trace_regs() to enable kprobe stack tracing from Steven Rostedt
  - Add HWCAP bits for Power9 from Michael Ellerman
  - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V
  - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins
  - scripts/recordmcount.pl: support data in text section on powerpc from Ulrich Weigand
  - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand
 
  - cxl: Fix possible idr warning when contexts are released from Vaibhav Jain
  - cxl: use correct operator when writing pcie config space values from Andrew Donnellan
  - cxl: Fix DSI misses when the context owning task exits from Vaibhav Jain
  - cxl: fix build for GCC 4.6.x from Brian Norris
  - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris
  - cxl: Enable PCI device ID for future IBM CXL adapter from Uma Krishnan
 
  - Freescale updates from Scott: Highlights include moving QE code out of
    arch/powerpc (to be shared with arm), device tree updates, and minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWmIxeAAoJEFHr6jzI4aWAA+cQAIXAw4WfVWJ2V4ZK+1eKfB57
 fdXG71PuXG+WYIWy71ly8keLHdzzD1NQ2OUB64bUVRq202nRgVc15ZYKRJ/FE/sP
 SkxaQ2AG/2kI2EflWshOi0Lu9qaZ+LMHJnszIqE/9lnGSB2kUI/cwsSXgziiMKXR
 XNci9v14SdDd40YV/6BSZXoxApwyq9cUbZ7rnzFLmz4hrFuKmB/L3LABDF8QcpH7
 sGt/YaHGOtqP0UX7h5KQTFLGe1OPvK6NWixSXeZKQ71ED6cho1iKUEOtBA9EZeIN
 QM5JdHFWgX8MMRA0OHAgidkSiqO38BXjmjkVYWoIbYz7Zax3ThmrDHB4IpFwWnk3
 l7WBykEXY7KEqpZzbh0GFGehZWzVZvLnNgDdvpmpk/GkPzeYKomBj7ZZfm3H1yGD
 BTHPwuWCTX+/K75yEVNO8aJO12wBg7DRl4IEwBgqhwU8ga4FvUOCJkm+SCxA1Dnn
 qlpS7qPwTXNIEfKMJcxp5X0KiwDY1EoOotd4glTN0jbeY5GEYcxe+7RQ302GrYxP
 zcc8EGLn8h6BtQvV3ypNHF5l6QeTW/0ZlO9c236tIuUQ5gQU39SQci7jQKsYjSzv
 BB1XdLHkbtIvYDkmbnr1elbeJCDbrWL9rAXRUTRyfuCzaFWTfZmfVNe8c8qwDMLk
 TUxMR/38aI7bLcIQjwj9
 =R5bX
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Core:
   - Ground work for the new Power9 MMU from Aneesh Kumar K.V
   - Optimise FP/VMX/VSX context switching from Anton Blanchard

  Misc:
   - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica
     Gupta, Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling,
     Andrew Donnellan
   - Allow wrapper to work on non-english system from Laurent Vivier
   - Add rN aliases to the pt_regs_offset table from Rashmica Gupta
   - Fix module autoload for rackmeter & axonram drivers from Luis de
     Bethencourt
   - Include KVM guest test in all interrupt vectors from Paul Mackerras
   - Fix DSCR inheritance over fork() from Anton Blanchard
   - Make value-returning atomics & {cmp}xchg* & their atomic_ versions
     fully ordered from Boqun Feng
   - Print MSR TM bits in oops messages from Michael Neuling
   - Add TM signal return & invalid stack selftests from Michael Neuling
   - Limit EPOW reset event warnings from Vipin K Parashar
   - Remove the Cell QPACE code from Rashmica Gupta
   - Append linux_banner to exception information in xmon from Rashmica
     Gupta
   - Add selftest to check if VSRs are corrupted from Rashmica Gupta
   - Remove broken GregorianDay() from Daniel Axtens
   - Import Anton's context_switch2 benchmark into selftests from
     Michael Ellerman
   - Add selftest script to test HMI functionality from Daniel Axtens
   - Remove obsolete OPAL v2 support from Stewart Smith
   - Make enter_rtas() private from Michael Ellerman
   - PPR exception cleanups from Michael Ellerman
   - Add page soft dirty tracking from Laurent Dufour
   - Add support for Nvlink NPUs from Alistair Popple
   - Add support for kexec on 476fpe from Alistair Popple
   - Enable kernel CPU dlpar from sysfs from Nathan Fontenot
   - Copy only required pieces of the mm_context_t to the paca from
     Michael Neuling
   - Add a kmsg_dumper that flushes OPAL console output on panic from
     Russell Currey
   - Implement save_stack_trace_regs() to enable kprobe stack tracing
     from Steven Rostedt
   - Add HWCAP bits for Power9 from Michael Ellerman
   - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V
   - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins
   - scripts/recordmcount.pl: support data in text section on powerpc
     from Ulrich Weigand
   - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand

  cxl:
   - cxl: Fix possible idr warning when contexts are released from
     Vaibhav Jain
   - cxl: use correct operator when writing pcie config space values
     from Andrew Donnellan
   - cxl: Fix DSI misses when the context owning task exits from Vaibhav
     Jain
   - cxl: fix build for GCC 4.6.x from Brian Norris
   - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris
   - cxl: Enable PCI device ID for future IBM CXL adapter from Uma
     Krishnan

  Freescale:
   - Freescale updates from Scott: Highlights include moving QE code out
     of arch/powerpc (to be shared with arm), device tree updates, and
     minor fixes"

* tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (149 commits)
  powerpc/module: Handle R_PPC64_ENTRY relocations
  scripts/recordmcount.pl: support data in text section on powerpc
  powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
  powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff
  powerpc/mm: Fix _PAGE_PTE breaking swapoff
  cxl: Enable PCI device ID for future IBM CXL adapter
  cxl: use -Werror only with CONFIG_PPC_WERROR
  cxl: fix build for GCC 4.6.x
  powerpc: Add HWCAP bits for Power9
  powerpc/powernv: Reserve PE#0 on NPU
  powerpc/powernv: Change NPU PE# assignment
  powerpc/powernv: Fix update of NVLink DMA mask
  powerpc/powernv: Remove misleading comment in pci.c
  powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing
  powerpc: Fix build break due to paca mm_context_t changes
  cxl: Fix DSI misses when the context owning task exits
  MAINTAINERS: Update Scott Wood's e-mail address
  powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
  powerpc: Fix style of self-test config prompts
  powerpc/powernv: Only delay opal_rtc_read() retry when necessary
  ...
2016-01-15 13:18:47 -08:00
Andrzej Hajda 929ca89c30 cpufreq-dt: fix handling regulator_get_voltage() result
The function can return negative values so it should be assigned
to signed type.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci.

Link: http://permalink.gmane.org/gmane.linux.kernel/2038576
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-05 02:01:38 +01:00
Chen Yu 0df35026c6 cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC
It is reported that, with CONFIG_HZ_PERIODIC=y cpu stays at the
lowest frequency even if the usage goes to 100%, neither ondemand
nor conservative governor works, however performance and
userspace work as expected. If set with CONFIG_NO_HZ_FULL=y,
everything goes well.

This problem is caused by improper calculation of the idle_time
when the load is extremely high(near 100%). Firstly, cpufreq_governor
uses get_cpu_idle_time to get the total idle time for specific cpu, then:

1.If the system is configured with CONFIG_NO_HZ_FULL, the idle time is
  returned by ktime_get, which is always increasing, it's OK.
2.However, if the system is configured with CONFIG_HZ_PERIODIC,
  get_cpu_idle_time might not guarantee to be always increasing,
  because it will leverage get_cpu_idle_time_jiffy to calculate the
  idle_time, consider the following scenario:

At T1:
idle_tick_1 = total_tick_1 - user_tick_1

sample period(80ms)...

At T2: ( T2 = T1 + 80ms):
idle_tick_2 = total_tick_2 - user_tick_2

Currently the algorithm is using (idle_tick_2 - idle_tick_1) to
get the delta idle_time during the past sample period, however
it CAN NOT guarantee that idle_tick_2 >= idle_tick_1, especially
when cpu load is high.
(Yes, total_tick_2 >= total_tick_1, and user_tick_2 >= user_tick_1,
but how about idle_tick_2 and idle_tick_1? No guarantee.)
So governor might get a negative value of idle_time during the past
sample period, which might mislead the system that the idle time is
very big(converted to unsigned int), and the busy time is nearly zero,
which causes the governor to always choose the lowest cpufreq,
then cause this problem.

In theory there are two solutions:

1.The logic should not rely on the idle tick during every sample period,
  but be based on the busy tick directly, as this is how 'top' is
  implemented.

2.Or the logic must make sure that the idle_time is strictly increasing
  during each sample period, then there would be no negative idle_time
  anymore. This solution requires minimum modification to current code
  and this patch uses method 2.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821
Reported-by: Jan Fikar <j.fikar@gmail.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-05 02:01:32 +01:00
Pi-Cheng Chen a889331d75 cpufreq: mt8173: migrate to use operating-points-v2 bindings
Modify mt8173-cpufreq driver to get OPP-sharing information and set up
OPP table provided by operating-points-v2 bindings.

Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-02 00:49:33 +01:00
Rafael J. Wysocki 7a6c79f2fe cpufreq: Simplify core code related to boost support
Notice that the boost_supported field in struct cpufreq_driver is
redundant, because the driver's ->set_boost callback may be left
unset if "boost" is not supported.  Moreover, the only driver
populating the ->set_boost callback is acpi_cpufreq, so make it
avoid populating that callback if "boost" is not supported, rework
the core to check ->set_boost instead of boost_supported to
verify "boost" support and drop boost_supported which isn't
used any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-01-01 03:49:51 +01:00
Rafael J. Wysocki 17135782b8 cpufreq: acpi-cpufreq: Simplify boost-related code
The store_boost() routine is only used by store_cpb(), so move
the code from it directly to that function and rename _store_boost()
to set_boost() to make its name reflect the name of the driver
callback pointing to it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-01-01 03:49:51 +01:00
Rafael J. Wysocki 41669da030 cpufreq: Make cpufreq_boost_supported() static
cpufreq_boost_supported() is not used outside of cpufreq.c, so make
it static.

While at it, refactor it as a one-liner (which it really is).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-01-01 03:49:51 +01:00
Markus Elfring d23f8cadf9 blackfin-cpufreq: Mark cpu_set_cclk() as static
The cpu_set_cclk() function was only used in a single source file so far.
Indicate this setting also by the corresponding linkage specifier.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-28 01:51:36 +01:00
Markus Elfring a5810b4f07 blackfin-cpufreq: Change return type of cpu_set_cclk() to that of clk_set_rate()
The return type "unsigned long" was used by the cpu_set_cclk() function
while the type "int" is provided by the clk_set_rate() function.
Let us make this usage consistent.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-28 01:51:35 +01:00
Rafael J. Wysocki 9ad55cd9e6 Merge back earlier cpufreq material for v4.5. 2015-12-28 01:34:35 +01:00
Dan Carpenter a7def561c2 cpufreq: scpi-cpufreq: signedness bug in scpi_get_dvfs_info()
The "domain" variable needs to be signed for the error handling to work.

Fixes: 8def31034d (cpufreq: arm_big_little: add SCPI interface driver)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-24 02:11:37 +01:00
Rafael J. Wysocki 4157c2fc84 Merge back earlier cpufreq material for v4.5. 2015-12-21 03:15:15 +01:00
Stewart Smith e4d54f71d2 powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is
just OPALv3, with nobody ever expecting anything on pre-OPALv3 to
be cared about or supported by mainline kernels.

So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL
exclusively.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:54 +11:00
Rafael J. Wysocki f1b9fc591e Merge branches 'powercap', 'pm-cpufreq' and 'pm-domains'
* powercap:
  powercap / RAPL: fix BIOS lock check

* pm-cpufreq:
  cpufreq: intel_pstate: Minor cleanup for FRAC_BITS
  cpufreq: tegra: add regulator dependency for T124

* pm-domains:
  PM / Domains: Allow runtime PM callbacks to be re-used during system PM
2015-12-14 22:58:57 +01:00
Linus Torvalds 097b285d32 ARM: SoC fixes for 4.4-rc
Here are a bunch of small bug fixes for various ARM platforms, nothing
 really sticks out this week, most of either fixes bugs in code that was
 just added in 4.4, or that has been broken for many years without anyone
 noticing.
 
 at91/sama5d2
 - fix sama5de hardware setup of sd/mmc interface
 - proper selection of pinctrl drivers. PIO4 is necessary for sama5d2
 
 berlin
 - fix incorrect clock input for SDIO
 
 exynos
 - Fix potential NULL pointer dereference in Exynos PMU driver.
 
 imx
 - Fix vf610 SAI clock configuration bug which is discovered by
   the newly added master mode support in SAI audio driver.
 - Fix buggy L2 cache latency values in vf610 device trees, which may
   cause system hang when cpu runs at a higher frequency.
 
 ixp4xx
 - fix prototypes for readl/writel functions
 
 ls2080a
 - use little-endian register access for GPIO and SDHCI
 
 omap
 - Fix clock source for ARM TWD and global timers on am437x
 - Always select REGULATOR_FIXED_VOLTAGE for omap2+ instead of
   when MACH_OMAP3_PANDORA is selected
 - Fix SPI DMA handles for dm816x as only some were mapped
 - Fix up mbox cells for dm816x to make mailbox usable
 
 pxa
 - use PWM lookup table for all ezx machines
 
 s3c24xx
 - Remove incorrect __init annotation from s3c24xx cpufreq driver structures.
 
 versatile
 - fix PCI IRQ mapping on Versatile PB
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAVmyQMWCrR//JCVInAQIIDA//VyJ2UoTJ2JC3thVP56P/ZXh7Pz8VDqnq
 cgoFUio27IeHPSgs+W9qWliOrb+LaXkuOl8CKgepm+Bv7j8Y+uryP4X2rKQ3ZRmy
 2f5+uUqAIZ0Co2aJdtG395lY9TKNHl6cPEskcbgL7cjdgj7QBqfIyj22QZbj6yRp
 kp8pj+cKXBFRLa5PvePon2w03MA/bLaP30VzKCSL1zchcs52rxekU694V3ISNa63
 eshyyKf354Sl9hP4Y8xCdl/mboymKzQxEGDQS/Fcb8h/OQ3djoh+7EKdVbdyZ2A7
 phgfazd2aE7wQ5GVIkMNV/MzGHj9xpiD4Z1Hi/2E8WdzuXJTRicS4bJihRAIualt
 H1FOEdgqT+xS4JUYxAvl46fwwqcFJfixtGgKka27sJTtk+Y1kHjASWvueZKlHMIK
 ln9CF7PoecF0InQaY2N8Vy05Qcp5MuoB/0v+XlftI0sAtIXNeo142H2NQZCsO+1U
 bJDyb5E4z06jzqk7IOK4/AKyEAV9KZPDws+ZxcNH/faPT10epK7MeZdetbD7b8q3
 pkY7s5iXV8uBox7FtHoamrlMFgAzN9Qh0E4bcw70aKaJZZ02ozTXCvJIKjoIPMne
 FsvidQToznqbA2RSXpxRQrcXrMxvURaPCRBe7CxrCoynmhIxd4UHND2HJ4OG645z
 4SAGOzOlZKM=
 =fgEd
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Here are a bunch of small bug fixes for various ARM platforms, nothing
  really sticks out this week, most of either fixes bugs in code that
  was just added in 4.4, or that has been broken for many years without
  anyone noticing.

  at91/sama5d2:
   - fix sama5de hardware setup of sd/mmc interface
   - proper selection of pinctrl drivers.  PIO4 is necessary for sama5d2

  berlin:
   - fix incorrect clock input for SDIO

  exynos:
   - Fix potential NULL pointer dereference in Exynos PMU driver.

  imx:
   - Fix vf610 SAI clock configuration bug which is discovered by the
     newly added master mode support in SAI audio driver.
   - Fix buggy L2 cache latency values in vf610 device trees, which may
     cause system hang when cpu runs at a higher frequency.

  ixp4xx:
   - fix prototypes for readl/writel functions

  ls2080a:
   - use little-endian register access for GPIO and SDHCI

  omap:
   - Fix clock source for ARM TWD and global timers on am437x
   - Always select REGULATOR_FIXED_VOLTAGE for omap2+ instead of when
     MACH_OMAP3_PANDORA is selected
   - Fix SPI DMA handles for dm816x as only some were mapped
   - Fix up mbox cells for dm816x to make mailbox usable

  pxa:
   - use PWM lookup table for all ezx machines

  s3c24xx:
   - Remove incorrect __init annotation from s3c24xx cpufreq driver
     structures.

  versatile:
   - fix PCI IRQ mapping on Versatile PB"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ls2080a/dts: Add little endian property for GPIO IP block
  dt-bindings: define little-endian property for QorIQ GPIO
  ARM64: dts: ls2080a: fix eSDHC endianness
  ARM: dts: vf610: use reset values for L2 cache latencies
  ARM: pxa: use PWM lookup table for all machines
  ARM: dts: berlin: add 2nd clock for BG2Q sdhci0 and sdhci1
  ARM: dts: berlin: correct BG2Q's sdhci2 2nd clock
  ARM: dts: am4372: fix clock source for arm twd and global timers
  ARM: at91: fix pinctrl driver selection
  ARM: at91/dt: add always-on to 1.8V regulator
  ARM: dts: vf610: fix clock definition for SAI2
  ARM: imx: clk-vf610: fix SAI clock tree
  ARM: ixp4xx: fix read{b,w,l} return types
  irqchip/versatile-fpga: Fix PCI IRQ mapping on Versatile PB
  ARM: OMAP2+: enable REGULATOR_FIXED_VOLTAGE
  ARM: dts: add dm816x missing spi DT dma handles
  ARM: dts: add dm816x missing #mbox-cells
  cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __init
  ARM: EXYNOS: Fix potential NULL pointer access in exynos_sys_powerdown_conf
2015-12-12 16:43:44 -08:00
Lee Jones ab0ea257fc cpufreq: st: Provide runtime initialised driver for ST's platforms
The bootloader is charged with the responsibility to provide platform
specific Dynamic Voltage and Frequency Scaling (DVFS) information via
Device Tree.  This driver takes the supplied configuration and
registers it with the new generic OPP framework, to then be used with
CPUFreq.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-12 02:55:21 +01:00
Prarit Bhargava 88b7b7c0c2 cpufreq: intel_pstate: Minor cleanup for FRAC_BITS
785ee27 ("cpufreq: intel_pstate: Fix limits->max_perf rounding error")
hardcodes the value of FRAC_BITS.  This patch fixes that minor issue.

Fixes: 785ee27881 (cpufreq: intel_pstate: Fix limits->max_perf rounding error)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-12 02:28:19 +01:00
Pi-Cheng Chen 89b56047f6 cpufreq: mt8173: Move resources allocation into ->probe()
Since the return value of ->init() of cpufreq driver is not propagated
to the device driver model now, move resources allocation into
->probe() to handle -EPROBE_DEFER properly.

Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-12 02:27:16 +01:00
Arnd Bergmann b5832e4b62 cpufreq: tegra: add regulator dependency for T124
This driver is the only one that calls regulator_sync_voltage(), but it
can currently be built with CONFIG_REGULATOR disabled, producing
this build error:

drivers/cpufreq/tegra124-cpufreq.c: In function 'tegra124_cpu_switch_to_pllx':
drivers/cpufreq/tegra124-cpufreq.c:68:2: error: implicit declaration of function 'regulator_sync_voltage' [-Werror=implicit-function-declaration]
  regulator_sync_voltage(priv->vdd_cpu_reg);

My first attempt was to implement a helper for this function
for regulator_sync_voltage, but Mark Brown explained:

   We don't do this for *all* regulator API functions - there's some where
   using them strongly suggests that there is actually a dependency on
   the regulator API.  This does seem like it might be falling into the
   specialist category [...]
   Looking at the code I'm pretty unclear on what the authors think the
   use of _sync_voltage() is doing in the first place so it may be even
   better to just remove the call.  It seems to have been included in the
   first commit so there's not changelog explaining things and there's
   no comment either.  I'd *expect* it to be a noop as far as I can see.

This adds the dependency to make the driver always build successfully
or not be enabled at all. Alternatively, we could investigate if the
driver should stop calling regulator_sync_voltage instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-12 02:25:56 +01:00
Arnd Bergmann 789d73b384 Fixes for Exynos:
1. Fix potential NULL pointer dereference in Exynos PMU driver.
 2. Remove incorrect __init annotation from s3c24xx cpufreq driver
    structures.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWV66yAAoJEME3ZuaGi4PXxdkP/jHTVeUGKt0NmDuAQ8oHLXsL
 AGAFF0KdkrWvAcjiLBfyNw4DTQwZu6pyF4E+PFffhXRjJFp3ku7pKcGcf2+vvrdt
 mc5Sg9dvICoqqQiJmnyMymG+dKXWGEK5NF8HPZSATAl+7DrTsACRWU4UPMLlOjmQ
 /U2KABSJFcwuAPBd0wKL6B5Wbx7ijWaUI1xnNxqpuZW6YsXEqz4GYxJ41UWqNrN/
 htoOgYCDTh9aATWpJeDRUALEYiIPS+pAjdCYaiZzrmJerCDiNO8RdMKqVX2LG+sk
 tqBem2cy9TlCuJFdZX50Y700nK6G+iniTVaGoJO93w4XLRwTKAJROr/+7uNGqWqq
 Q/Ofkh6SEJpOiNunREqf7zCLOwUs6d2bQSTp6FilAsSxQCeRVSvXkTfTv388UWOk
 vD3TtfCPPokPwyWubtPeRYn6lDTfXUZr8cVkh1+/lcTCXEMvlK9AyDp3BsKDQHyQ
 zG5yZuZ6VQ2bmhE4B43yCMdqxFTLe+vf+GetJl4aDSAQNMuBdXZ84tyLwQqAMpvK
 PXTjL9RqgA78YurlS5LamqChQEI4abVnWvwBk5rTxwaqL5B5BC9i7U5Q8DuOFPv2
 W52GcBCdYriFmPhpXSVLDn47s1eyE360mBknUh2V0p6+GQzaAjzyjiTxU9/suoHQ
 IPGSkgxOfIiBMiHYPG0O
 =jNcR
 -----END PGP SIGNATURE-----

Merge tag 'samsung-fixes-4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into fixes

Merge "Fixes for Exynos" from Krzysztof Kozlowski:

1. Fix potential NULL pointer dereference in Exynos PMU driver.
2. Remove incorrect __init annotation from s3c24xx cpufreq driver
   structures.

* tag 'samsung-fixes-4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __init
  ARM: EXYNOS: Fix potential NULL pointer access in exynos_sys_powerdown_conf
2015-12-11 00:19:37 +01:00
Philippe Longepe 63d1d656a5 cpufreq: intel_pstate: Account for IO wait time
In cases where we have many IOs, the global load becomes low and the
load algorithm will decrease the requested P-State. Because of that,
the IOs overheads will increase and impact the IO performances.

To improve IO bound work, we can count the io-wait time as busy time
in calculating CPU busy.

This change uses get_cpu_iowait_time_us() to obtain the IO wait time value
and converts time into number of cycles spent waiting on IO at the TSC
rate. At the moment, this trick is only used for Atom.

Signed-off-by: Philippe Longepe <philippe.longepe@intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 01:17:40 +01:00
Philippe Longepe e70eed2b64 cpufreq: intel_pstate: Account for non C0 time
The current function to calculate cpu utilization uses the average P-state
ratio (APerf/Mperf) scaled by the ratio of the current P-state to the
max available non-turbo one. This leads to an overestimation of
utilization which causes higher-performance P-states to be selected more
often and that leads to increased energy consumption.

This is a problem for low-power systems, so it is better to use a
different utilization calculation algorithm for them.

Namely, the Percent Busy value (or load) can be estimated as the ratio of the
MPERF counter that runs at a constant rate only during active periods (C0) to
the time stamp counter (TSC) that also runs (at the same rate) during idle.
That is:

Percent Busy = 100 * (delta_mperf / delta_tsc)

Use this algorithm for platforms with SoCs based on the Airmont and Silvermont
Atom cores.

Signed-off-by: Philippe Longepe <philippe.longepe@intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 01:17:40 +01:00
Philippe Longepe 157386b6fc cpufreq: intel_pstate: Configurable algorithm to get target pstate
Target systems using different cpus have different power and performance
requirements. They may use different algorithms to get the next P-state
based on their power or performance preference.

For example, power-constrained systems may not want to use
high-performance P-states as aggressively as a full-size desktop or a
server platform. A server platform may want to run close to the max to
achieve better performance, while laptop-like systems may prefer
sacrificing performance for longer battery lifes.

For the above reasons, modify intel_pstate to allow the target P-state
selection algorithm to be depend on the CPU ID.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Philippe Longepe <philippe.longepe@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 01:17:40 +01:00
Pi-Cheng Chen 40be4c3ccb cpufreq: mt8173: check return value of regulator_get_voltage() call
Sometimes regulator_get_voltage() call returns negative values for
reasons(e.g. underlying I2C bus timeout). Add check for the return
values and fail out early.

Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 00:27:33 +01:00
Pi-Cheng Chen 93625d52e7 cpufreq: mt8173: remove redundant regulator_get_voltage() call
Remove redundant regulator_get_voltage() call to get Vsram value
since it will be obtained later at the beginning of voltage tracking
loop.

Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 00:27:33 +01:00
Pi-Cheng Chen 9bb46b87d6 cpufreq: mt8173: add CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag
Add CPUFREQ_HAVE_GOVERNOR_PER_POLICY to have individual set of tunables
for each cluster of MT8173.

Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 00:27:32 +01:00
Hongtao Jia 8ae1702a0d cpufreq: qoriq: Register cooling device based on device tree
Register the qoriq cpufreq driver as a cooling device, based on the
thermal device tree framework. When temperature crosses the passive trip
point cpufreq is used to throttle CPUs.

Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 00:19:35 +01:00
Jacob Tanenbaum 790d849bf8 cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency
The cpufreq documentation specifies

policy->cpuinfo.transition_latency   the time it takes on this CPU to
                                switch between two frequencies in
                                nanoseconds (if appropriate, else
                                specify CPUFREQ_ETERNAL)

currently pcc-cpufreq does not expose the value and sets it to zero. I
changed the pcc-cpufreq driver and it's documentation to conform to the
default value specified in Documentation/cpu-freq/cpu-drivers.txt

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 00:17:03 +01:00
Punit Agrawal 2f7e8a175d cpufreq: arm_big_little: Add support to register a cpufreq cooling device
Register passive cooling devices when initialising cpufreq on
big.LITTLE systems. If the device tree provides a dynamic power
coefficient for the CPUs then the bound cooling device will support
the extensions that allow it to be used with all the existing thermal
governors including the power allocator governor.

A cooling device will be created per individual frequency domain and
can be bound to thermal zones via the thermal DT bindings.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 00:14:58 +01:00
Punit Agrawal f8fa8ae06b cpufreq-dt: Supply power coefficient when registering cooling devices
Support registering cooling devices with dynamic power coefficient
where provided by the device tree. This allows OF registered cooling
devices driver to be used with the power_allocator thermal governor.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-10 00:14:58 +01:00
Rafael J. Wysocki 2dd3e724b4 cpufreq: governor: Use lockless timer function
It is possible to get rid of the timer_lock spinlock used by the
governor timer function for synchronization, but a couple of races
need to be avoided.

The first race is between multiple dbs_timer_handler() instances
that may be running in parallel with each other on different
CPUs.  Namely, one of them has to queue up the work item, but it
cannot be queued up more than once.  To achieve that,
atomic_inc_return() can be used on the skip_work field of
struct cpu_common_dbs_info.

The second race is between an already running dbs_timer_handler()
and gov_cancel_work().  In that case the dbs_timer_handler() might
not notice the skip_work incrementation in gov_cancel_work() and
it might queue up its work item after gov_cancel_work() had
returned (and that work item would corrupt skip_work going
forward).  To prevent that from happening, gov_cancel_work()
can be made wait for the timer function to complete (on all CPUs)
right after skip_work has been incremented.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2015-12-09 22:26:13 +01:00
Viresh Kumar f08f638b9c cpufreq: ondemand: update update_sampling_rate() to make it more efficient
Currently update_sampling_rate() runs over each online CPU and
cancels/queues timers on all policy->cpus every time. This should be
done just once for any cpu belonging to a policy.

Create a cpumask and keep on clearing it as and when we process
policies, so that we don't have to traverse through all CPUs of the same
policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-09 22:26:12 +01:00
Viresh Kumar 70f43e5e79 cpufreq: governor: replace per-CPU delayed work with timers
cpufreq governors evaluate load at sampling rate and based on that they
update frequency for a group of CPUs belonging to the same cpufreq
policy.

This is required to be done in a single thread for all policy->cpus, but
because we don't want to wakeup idle CPUs to do just that, we use
deferrable work for this. If we would have used a single delayed
deferrable work for the entire policy, there were chances that the CPU
required to run the handler can be in idle and we might end up not
changing the frequency for the entire group with load variations.

And so we were forced to keep per-cpu works, and only the one that
expires first need to do the real work and others are rescheduled for
next sampling time.

We have been using the more complex solution until now, where we used a
delayed deferrable work for this, which is a combination of a timer and
a work.

This could be made lightweight by keeping per-cpu deferred timers with a
single work item, which is scheduled by the first timer that expires.

This patch does just that and here are important changes:
- The timer handler will run in irq context and so we need to use a
  spin_lock instead of the timer_mutex. And so a separate timer_lock is
  created. This also makes the use of the mutex and lock quite clear, as
  we know what exactly they are protecting.
- A new field 'skip_work' is added to track when the timer handlers can
  queue a work. More comments present in code.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-09 22:26:00 +01:00
Viresh Kumar 5e4500d8db cpufreq: governor: initialize/destroy timer_mutex with 'shared'
timer_mutex is required to be initialized only while memory for 'shared'
is allocated and in a similar way it is required to be destroyed only
when memory for 'shared' is freed.

There is no need to do the same every time we start/stop the governor.
Move code to initialize/destroy timer_mutex to the relevant places.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07 02:20:23 +01:00
Viresh Kumar affde5d06a cpufreq: governor: Pass policy as argument to ->gov_dbs_timer()
Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and
dbs_data.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07 02:20:22 +01:00
Viresh Kumar e68fe18c5b cpufreq: ondemand: Work is guaranteed to be pending
We are guaranteed to have works scheduled for policy->cpus, as the
policy isn't stopped yet. And so there is no need to check that again.
Drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07 02:20:21 +01:00
Viresh Kumar e128c86407 cpufreq: ondemand: Update sampling rate only for concerned policies
We are comparing policy->governor against cpufreq_gov_ondemand to make
sure that we update sampling rate only for the concerned CPUs. But that
isn't enough.

In case of governor_per_policy, there can be multiple instances of
ondemand governor and we will always end up updating all of them with
current code. What we rather need to do, is to compare dbs_data with
poilcy->governor_data, which will match only for the policies governed
by dbs_data.

This code is also racy as the governor might be getting stopped at that
time and we may end up scheduling work for a policy, which we have just
disabled.

Fix that by protecting the entire function with &od_dbs_cdata.mutex,
which will prevent against races with policy START/STOP/etc.

After these locks are in place, we can safely get the policy via per-cpu
dbs_info.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07 02:20:10 +01:00
Rafael J. Wysocki d441fe25e7 Merge branches 'pm-domains' and 'pm-cpufreq'
* pm-domains:
  PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach()
  PM / Domains: Validate cases of a non-bound driver in genpd governor

* pm-cpufreq:
  cpufreq: use last policy after online for drivers with ->setpolicy
2015-12-04 14:01:42 +01:00
Srinivas Pandruvada 69030dd1c3 cpufreq: use last policy after online for drivers with ->setpolicy
For cpufreq drivers which use setpolicy interface, after offline->online
the policy is set to default. This can be reproduced by setting the
default policy of intel_pstate or longrun to ondemand and then change to
"performance". After offline and online, the setpolicy will be called with
the policy=ondemand.

For drivers using governors this condition is handled by storing
last_governor, during offline and restoring during online. The same should
be done for drivers using setpolicy interface. Storing last_policy during
offline and restoring during online.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-02 23:50:33 +01:00
Rafael J. Wysocki f28a1b0df7 Merge branches 'pm-cpufreq' and 'acpi-cppc'
* pm-cpufreq:
  intel_pstate: Fix "performance" mode behavior with HWP enabled
  cpufreq: SCPI: Depend on SCPI clk driver
  cpufreq: intel_pstate: Fix limits->max_perf rounding error
  cpufreq: intel_pstate: Fix limits->max_policy_pct rounding error
  cpufreq: Always remove sysfs cpuX/cpufreq link on ->remove_dev()

* acpi-cppc:
  cpufreq: CPPC: Initialize and check CPUFreq CPU co-ord type correctly
2015-11-27 16:23:59 +01:00
Arnd Bergmann 62f49ee26f cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __init
s3c2410_plls_add is a device notifier that may be called at runtime and
is correctly not marked __init. However it calls s3c_plltab_register()
which is marked __init, and that triggers a build error when we are
checking for section mismatches:

WARNING: vmlinux.o(.text+0x195e0): Section mismatch in reference from the function s3c2410_plls_add() to the function .init.text:s3c_plltab_register()
The function s3c2410_plls_add() references
the function __init s3c_plltab_register().
This is often because s3c2410_plls_add lacks a __init
annotation or the annotation of s3c_plltab_register is wrong.

This removes the __init annotation from s3c2410_plls_add as well as the
__initdata section annotations from s3c2440_plls_12 and s3c2440_plls_169344,
which in turn are referenced from s3c2410_plls_add.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2015-11-27 10:10:32 +09:00
Alexandra Yates 584ee3dcb1 intel_pstate: Fix "performance" mode behavior with HWP enabled
If hardware-driven P-state selection (HWP) is enabled, the
"performance" mode of intel_pstate should only allow the processor
to use the highest-performance P-state available.  That is not
the case currently, so make it actually happen.

Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Alexandra Yates <alexandra.yates@linux.intel.com>
[ rjw: Subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-25 23:37:44 +01:00
Punit Agrawal 73124ced9c cpufreq: SCPI: Depend on SCPI clk driver
The SCPI clk driver registers the virtual cpufreq device that kicks off
initialisation of the SCPI cpufreq driver. Also, clk_get() will fail for
the cpufreq driver if the SCPI clk driver is missing.

Fix this by making the SCPI cpufreq driver explicitly depend on the SCPI
clk driver.

Fixes: 8def31034d (cpufreq: arm_big_little: add SCPI interface driver)
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-23 23:50:27 +01:00
Rafael J. Wysocki b0ceed0685 Merge back earlier cpufreq fixes for v4.4. 2015-11-23 23:49:57 +01:00
Prarit Bhargava 785ee27881 cpufreq: intel_pstate: Fix limits->max_perf rounding error
A rounding error was found in the calculation of limits->max_perf
in intel_pstate_set_policy(), which is used to calculate the max and min
pstate values in intel_pstate_get_min_max().  In that code,
limits->max_perf is truncated to 2 hex digits such that, for example,
0x169 was incorrectly calculated to 0x16 instead of 0x17.  This resulted in
the pstate being set one level too low.  This patch rounds the value of
limits->max_perf up instead of down so that the correct max pstate can
be reached.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-23 23:15:34 +01:00
Prarit Bhargava 8478f53946 cpufreq: intel_pstate: Fix limits->max_policy_pct rounding error
I have a Intel (6,63) processor with a "marketing" frequency (from
/proc/cpuinfo) of 2100MHz, and a max turbo frequency of 2600MHz.  I
can execute

cpupower frequency-set -g powersave --min 1200MHz --max 2100MHz

and the max_freq_pct is set to 80.  When adding load to the system I noticed
that the cpu frequency only reached 2000MHZ and not 2100MHz as expected.

This is because limits->max_policy_pct is calculated as 2100 * 100 /2600 = 80.7
and is rounded down to 80 when it should be rounded up to 81.  This patch
adds a DIV_ROUND_UP() which will return the correct value.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-23 23:14:10 +01:00
Viresh Kumar f344dae0fe cpufreq: Always remove sysfs cpuX/cpufreq link on ->remove_dev()
Subsys interface's ->remove_dev() is called when the cpufreq driver is
unregistering or the CPU is getting physically removed. We keep removing
the cpuX/cpufreq link for all CPUs except the last one, which is a
mistake as all CPUs contain a link now.

Because of this, one CPU from each policy will still contain a link (to
an already removed policyX directory), after the cpufreq driver is
unregistered.

Fix that by removing the link first and then only see if the policy is
required to be freed. That will make sure that no links are left out.

Fixes: 96bdda61f5 ("cpufreq: create cpu/cpufreq/policyX directories")
Reported-and-tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-23 22:49:42 +01:00
Ashwin Chaugule 9dc1791773 cpufreq: CPPC: Initialize and check CPUFreq CPU co-ord type correctly
The CPU policy struct indicates the co-ordination type
for all CPUs of a common freq domain. Initialize it
correctly using the CPU specific data gathered from
CPPC ACPI lib via acpi_get_psd_map().

The PSD object is optional, so the cpu->shared_type
can also be 0. So instead of assuming any value other
than SW_ANY(0xFD) is unsupported, explictly check
if shared_type is SW_ALL and then bail.

Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-23 22:21:18 +01:00
Rafael J. Wysocki 9832bf3a35 Merge branches 'pm-cpufreq' and 'acpi-cppc'
* pm-cpufreq:
  Revert "Documentation: kernel_parameters for Intel P state driver"
  cpufreq: mediatek: fix build error
  cpufreq: intel_pstate: Add separate support for Airmont cores
  cpufreq: intel_pstate: Replace BYT with ATOM
  Revert "cpufreq: intel_pstate: Use ACPI perf configuration"
  Revert "cpufreq: intel_pstate: Avoid calculation for max/min"

* acpi-cppc:
  ACPI / CPPC: Use h/w reduced version of the PCCT structure
2015-11-20 01:22:10 +01:00
Arnd Bergmann 2d4ee30367 cpufreq: mediatek: fix build error
The recently added mt8173 cpufreq driver relies on the cpu topology
that is always present on ARM64 but optional on ARM32:

drivers/cpufreq/mt8173-cpufreq.c: In function 'mtk_cpufreq_init':
drivers/cpufreq/mt8173-cpufreq.c:441:30: error: 'cpu_topology' undeclared (first use in this function)
  cpumask_copy(policy->cpus, &cpu_topology[policy->cpu].core_sibling);

This refines the Kconfig dependencies so that we can still build on
ARM32, but only if COMPILE_TEST is selected and the CPU topology
code is present.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-19 00:21:47 +01:00
Philippe Longepe 1421df63c3 cpufreq: intel_pstate: Add separate support for Airmont cores
There are two flavors of Atom cores to be supported by intel_pstate,
Silvermont and Airmont, so make the driver distinguish between them by
adding separate frequency tables.

Separate the CPU defaults params for each of them and match the CPU IDs
against them as appropriate.

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-19 00:21:46 +01:00
Philippe Longepe 938d21a2a6 cpufreq: intel_pstate: Replace BYT with ATOM
Rename symbol and function names starting with "BYT" or "byt" to
start with "ATOM" or "atom", respectively, so as to make it clear
that they may apply to Atom in general and not just to Baytrail
(the goal is to support several Atoms architectures eventually).

This should not lead to any functional changes.

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-19 00:21:46 +01:00
Rafael J. Wysocki 6ee11e413c Revert "cpufreq: intel_pstate: Use ACPI perf configuration"
Revert commit 37afb00032 (cpufreq: intel_pstate: Use ACPI perf
configuration) that is reported to cause a regression to happen
on a system where invalid data are returned by the ACPI _PSS object.

Since that commit makes assumptions regarding the _PSS output
correctness that may turn out to be overly optimistic in general,
there is a concern that it may introduce regression on more
systems, so it's better to revert it now and we'll revisit the
underlying issue in the next cycle with a more robust solution.

Conflicts:
        drivers/cpufreq/intel_pstate.c

Fixes: 37afb00032 (cpufreq: intel_pstate: Use ACPI perf configuration)
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-19 00:20:42 +01:00
Rafael J. Wysocki 799281a3c4 Revert "cpufreq: intel_pstate: Avoid calculation for max/min"
Revert commit 4ef4514870 (cpufreq: intel_pstate: Avoid calculation for
max/min) as it depends on commit 37afb00032 (cpufreq: intel_pstate: Use
ACPI perf configuration) that causes problems to happen and needs to be
reverted.

Conflicts:
	drivers/cpufreq/intel_pstate.c

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-18 23:29:56 +01:00
Linus Torvalds be23c9d20b More power management and ACPI updates for v4.4-rc1
- Support for the ACPI _CCA configuration object intended to tell
    the OS whether or not a bus master device supports hardware
    managed cache coherency and a new set of functions to allow
    drivers to check the cache coherency support for devices in a
    platform firmware interface agnostic way (Suravee Suthikulpanit,
    Jeremy Linton).
 
  - ACPI backlight quirks for ESPRIMO Mobile M9410 and Dell XPS L421X
    (Aaron Lu, Hans de Goede).
 
  - Fixes for the arm_big_little and s5pv210-cpufreq cpufreq drivers
    (Jon Medhurst, Nicolas Pitre).
 
  - kfree()-related fixup for the recently introduced CPPC cpufreq
    frontend (Markus Elfring).
 
  - intel_pstate fix reducing kernel log noise on systems where
    P-states are managed by hardware (Prarit Bhargava).
 
  - intel_pstate maintainers information update (Srinivas Pandruvada).
 
  - cpufreq core optimization related to the handling of delayed work
    items used by governors (Viresh Kumar).
 
  - Locking fixes and cleanups of the Operating Performance Points
    (OPP) framework (Viresh Kumar).
 
  - Generic power domains framework cleanups (Lina Iyer).
 
  - cpupower tool updates (Jacob Tanenbaum, Sriram Raghunathan,
    Thomas Renninger).
 
  - turbostat tool updates (Len Brown).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJWQ96OAAoJEILEb/54YlRxyYYQALJ1HXu76SvYX1re2aawOw6Y
 WgzF3Ly7JX034E1VvA2xP6wgkWpBRBDcpnRDeltNA4dYXPBDei/eTcRZTLX12N3g
 AfFRGjGWTtLJfpNPecNMmUyF5xHjgDgMIQRabY+Is5NfP5STkPHJeqULnEpvTtx8
 bd0lnC5jc4vuZiPEh1xVb+ClYDqWS8YQPyFJVjV/BaIf8Qwe5+oRX36byMBaKc9D
 ZgmvmCk5n/HLQQ1uQsqe4xnhFLHN2rypt2BLvFrOtlnSz9VNNpQyB+OIW1mgCD4f
 LhpKIwjP8NhZNQUq8HFu7nDlm8ciQtWmeMPB5NdGQ+OESu7yfKAOzQ+3U6Gl2Gaf
 66zVGyV6SOJJwfDVJ3qKTtroWps9QV7ZClOJ+zJGgiujwU+tJ3pDQyZM9pa7CL3C
 s7ZAUsI6IigSBjD3nJVOyG4DO0a8KQFCIE1mDmyqId45Qz8xJoOrYP33/ZnDuOdo
 2OtL/emyfWsz9ixbHVfwIhb7EC6aoaUxQrhSWmNraaQS43YfioZR7h4we8gwenph
 X4E1KY4SdML+uFf2VKIcd45NM3IBprCxx5UgFAJdrqe8+otqPNF2AVosG4iqhg/b
 k4nxwuIvw2a8Fm77U9ytyXDYMItU/wIlAHMbnmgx+oTwRv6AbZ07MHkyfuQLYuhD
 tq5Y14qSiTS7prNacx98
 =XZiP
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management and ACPI updates from Rafael Wysocki:
 "The only new feature in this batch is support for the ACPI _CCA device
  configuration object, which it a pre-requisite for future ACPI PCI
  support on ARM64, but should not affect the other architectures.

  The rest is fixes and cleanups, mostly in cpufreq (including
  intel_pstate), the Operating Performace Points (OPP) framework and
  tools (cpupower and turbostat).

  Specifics:

   - Support for the ACPI _CCA configuration object intended to tell the
     OS whether or not a bus master device supports hardware managed
     cache coherency and a new set of functions to allow drivers to
     check the cache coherency support for devices in a platform
     firmware interface agnostic way (Suravee Suthikulpanit, Jeremy
     Linton).

   - ACPI backlight quirks for ESPRIMO Mobile M9410 and Dell XPS L421X
     (Aaron Lu, Hans de Goede).

   - Fixes for the arm_big_little and s5pv210-cpufreq cpufreq drivers
     (Jon Medhurst, Nicolas Pitre).

   - kfree()-related fixup for the recently introduced CPPC cpufreq
     frontend (Markus Elfring).

   - intel_pstate fix reducing kernel log noise on systems where
     P-states are managed by hardware (Prarit Bhargava).

   - intel_pstate maintainers information update (Srinivas Pandruvada).

   - cpufreq core optimization related to the handling of delayed work
     items used by governors (Viresh Kumar).

   - Locking fixes and cleanups of the Operating Performance Points
     (OPP) framework (Viresh Kumar).

   - Generic power domains framework cleanups (Lina Iyer).

   - cpupower tool updates (Jacob Tanenbaum, Sriram Raghunathan, Thomas
     Renninger).

   - turbostat tool updates (Len Brown)"

* tag 'pm+acpi-4.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
  PCI: ACPI: Add support for PCI device DMA coherency
  PCI: OF: Move of_pci_dma_configure() to pci_dma_configure()
  of/pci: Fix pci_get_host_bridge_device leak
  device property: ACPI: Remove unused DMA APIs
  device property: ACPI: Make use of the new DMA Attribute APIs
  device property: Adding DMA Attribute APIs for Generic Devices
  ACPI: Adding DMA Attribute APIs for ACPI Device
  device property: Introducing enum dev_dma_attr
  ACPI: Honor ACPI _CCA attribute setting
  cpufreq: CPPC: Delete an unnecessary check before the function call kfree()
  PM / OPP: Add opp_rcu_lockdep_assert() to _find_device_opp()
  PM / OPP: Hold dev_opp_list_lock for writers
  PM / OPP: Protect updates to list_dev with mutex
  PM / OPP: Propagate error properly from dev_pm_opp_set_sharing_cpus()
  cpufreq: s5pv210-cpufreq: fix wrong do_div() usage
  MAINTAINERS: update for intel P-state driver
  Creating a common structure initialization pattern for struct option
  cpupower: Enable disabled Cstates if they are below max latency
  cpupower: Remove debug message when using cpupower idle-set -D switch
  cpupower: cpupower monitor reports uninitialized values for offline cpus
  ...
2015-11-12 11:50:33 -08:00
Linus Torvalds b44a3d2a85 ARM: SoC driver updates for v4.4
As we've enabled multiplatform kernels on ARM, and greatly done away with
 the contents under arch/arm/mach-*, there's still need for SoC-related
 drivers to go somewhere.
 
 Many of them go in through other driver trees, but we still have
 drivers/soc to hold some of the "doesn't fit anywhere" lowlevel code
 that might be shared between ARM and ARM64 (or just in general makes
 sense to not have under the architecture directory).
 
 This branch contains mostly such code:
 
 - Drivers for qualcomm SoCs for SMEM, SMD and SMD-RPM, used to communicate
   with power management blocks on these SoCs for use by clock, regulator and
   bus frequency drivers.
 - Allwinner Reduced Serial Bus driver, again used to communicate with PMICs.
 - Drivers for ARM's SCPI (System Control Processor). Not to be confused with
   PSCI (Power State Coordination Interface). SCPI is used to communicate with
   the assistant embedded cores doing power management, and we have yet to see
   how many of them will implement this for their hardware vs abstracting in
   other ways (or not at all like in the past).
 - To make confusion between SCPI and PSCI more likely, this release also
   includes an update of PSCI to interface version 1.0.
 - Rockchip support for power domains.
 - A driver to talk to the firmware on Raspberry Pi.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQC+cAAoJEIwa5zzehBx3jEUP/0GpxfDVanEUkudVLLe7J0RH
 CNlRan107Cw6hXRUJo7elEsuCALjccXjc1CAH4+RnNpOAeBKW97n+WU7trTv+wUZ
 sQX4SkBPKFBlgwGF2qhsi5q74gms/BrgtCa4kNb9joOYso039tlfIOPzK80DMkOm
 TkyIJdUCgFJMjCQLhX6kGT0PDcrbIjb6aA2cF3FAVeaJA7uz8lNe/eHJr3oHxIEY
 CvC651yJ2mIHQUU4BJx/AJo+wXg3dRUXNCAtBjwLRPEAzduYZXYm1ZTVIby/1q9r
 dR2KDFEuibODXmXrDBzKNJwCu/TLJEwo/1oPaEIVfY91XLKfiWUhgVqa1o1I+d9U
 XoGPibCW461qFahjQW87MfInALpCOA7/RbTNjFp+MVyipCYvkaYq7KFiYEldgFDx
 z4Qx/J4hYc2TlDWrpNiUCZMfmhwi7y+Ib+tnenYTO1eyMuw0e9mfnVdjk5iU3Pvk
 Ye4qPqpYclJruyHbYi164878+1lLaW2NCUgC3rkBO/GWPAzp7d9iLWoZ3PuyD5i5
 PEjs668UcRdZYbI4rdrhGHL8Eq9Gnuc4Rthu7HxPOK+DG0XgP8r97PhM8aYGYVDO
 +yikBtjWRsA9fPj3rMKA3UsQ61DAeR9LmZ0XPGjWFMCjCG0JlUoIMaA+Uu0i8fr8
 95qxBVxbO7rhL39r1rhV
 =dm+I
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC driver updates from Olof Johansson:
 "As we've enabled multiplatform kernels on ARM, and greatly done away
  with the contents under arch/arm/mach-*, there's still need for
  SoC-related drivers to go somewhere.

  Many of them go in through other driver trees, but we still have
  drivers/soc to hold some of the "doesn't fit anywhere" lowlevel code
  that might be shared between ARM and ARM64 (or just in general makes
  sense to not have under the architecture directory).

  This branch contains mostly such code:

   - Drivers for qualcomm SoCs for SMEM, SMD and SMD-RPM, used to
     communicate with power management blocks on these SoCs for use by
     clock, regulator and bus frequency drivers.

   - Allwinner Reduced Serial Bus driver, again used to communicate with
     PMICs.

   - Drivers for ARM's SCPI (System Control Processor).  Not to be
     confused with PSCI (Power State Coordination Interface).  SCPI is
     used to communicate with the assistant embedded cores doing power
     management, and we have yet to see how many of them will implement
     this for their hardware vs abstracting in other ways (or not at all
     like in the past).

   - To make confusion between SCPI and PSCI more likely, this release
     also includes an update of PSCI to interface version 1.0.

   - Rockchip support for power domains.

   - A driver to talk to the firmware on Raspberry Pi"

* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (57 commits)
  soc: qcom: smd-rpm: Correct size of outgoing message
  bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus
  bus: sunxi-rsb: Add Allwinner Reduced Serial Bus (RSB) controller bindings
  ARM: bcm2835: add mutual inclusion protection
  drivers: psci: make PSCI 1.0 functions initialization version dependent
  dt-bindings: Correct paths in Rockchip power domains binding document
  soc: rockchip: power-domain: don't try to print the clock name in error case
  soc: qcom/smem: add HWSPINLOCK dependency
  clk: berlin: add cpuclk
  ARM: berlin: dts: add CLKID_CPU for BG2Q
  ARM: bcm2835: Add the Raspberry Pi firmware driver
  soc: qcom: smem: Move RPM message ram out of smem DT node
  soc: qcom: smd-rpm: Correct the active vs sleep state flagging
  soc: qcom: smd: delete unneeded of_node_put
  firmware: qcom-scm: build for correct architecture level
  soc: qcom: smd: Correct SMEM items for upper channels
  qcom-scm: add missing prototype for qcom_scm_is_available()
  qcom-scm: fix endianess issue in __qcom_scm_is_call_available
  soc: qcom: smd: Reject send of too big packets
  soc: qcom: smd: Handle big endian CPUs
  ...
2015-11-10 15:00:03 -08:00
Rafael J. Wysocki 1f47b0ddf3 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: s5pv210-cpufreq: fix wrong do_div() usage
  MAINTAINERS: update for intel P-state driver
  cpufreq: governor: Quit work-handlers early if governor is stopped
  intel_pstate: decrease number of "HWP enabled" messages
  cpufreq: arm_big_little: fix frequency check when bL switcher is active
2015-11-07 01:30:49 +01:00
Rafael J. Wysocki 3930f660b4 Merge branches 'acpi-video' and 'acpi-cppc'
* acpi-video:
  ACPI / video: only register backlight for LCD device
  ACPI / video: Add a quirk to force acpi-video backlight on Dell XPS L421X

* acpi-cppc:
  cpufreq: CPPC: Delete an unnecessary check before the function call kfree()
2015-11-07 01:30:22 +01:00
Markus Elfring efb2d3be53 cpufreq: CPPC: Delete an unnecessary check before the function call kfree()
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-07 00:03:18 +01:00
Nicolas Pitre d7e53e35f9 cpufreq: s5pv210-cpufreq: fix wrong do_div() usage
It is wrong to use do_div() with 32-bit dividends (unsigned long is
32 bits on 32-bit architectures).

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-05 22:50:48 +01:00
Viresh Kumar 3a91b069ea cpufreq: governor: Quit work-handlers early if governor is stopped
gov_queue_work() acquires cpufreq_governor_lock to allow
cpufreq_governor_stop() to drain delayed work items possibly scheduled
on CPUs that share the policy with a CPU being taken offline.

However, the same goal may be achieved in a more straightforward way if
the policy pointer in the struct cpu_dbs_info matching the policy CPU is
reset upfront by cpufreq_governor_stop() under the timer_mutex belonging
to it and checked against NULL, under the same lock, at the beginning of
dbs_timer().

In that case every instance of dbs_timer() run for a struct cpu_dbs_info
sharing the policy pointer in question after cpufreq_governor_stop() has
started will notice that that pointer is NULL and bail out immediately
without queuing up any new work items.  In turn, gov_cancel_work()
called by cpufreq_governor_stop() before destroying timer_mutex will
wait for all of the delayed work items currently running on the CPUs
sharing the policy to drop the mutex, so it may be destroyed safely.

Make cpufreq_governor_stop() and dbs_timer() work as described and
modify gov_queue_work() so it does not acquire cpufreq_governor_lock any
more.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-02 02:07:11 +01:00
Prarit Bhargava 539342f60b intel_pstate: decrease number of "HWP enabled" messages
When booting an HWP enabled system the kernel displays one "HWP enabled"
message for each cpu.  The messages are superfluous since HWP is globally
enabled across all CPUs. This patch also adds an informational message
when HWP is disabled via intel_pstate=no_hwp.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-02 02:01:18 +01:00
Jon Medhurst \(Tixy\) 14f1ba3af6 cpufreq: arm_big_little: fix frequency check when bL switcher is active
The check for correct frequency being set in bL_cpufreq_set_rate is
broken when the big.LITTLE switcher is active, for two reasons.

 1. The 'new_rate' variable gets overwritten before the test by the
 code calculating the frequency of the old cluster.

 2. The frequency returned by bL_cpufreq_get_rate will be the virtual
 frequency, not the actual one the intended version of new_rate contains.

This means the function always returns an error causing an endless
stream of: "cpufreq: __target_index: Failed to change cpu frequency: -5"

As the intent is to check for errors that clk_set_rate doesn't report
lets move the check to immediately after that and directly use
clk_get_rate, rather than the arm_big_little helpers which only confuse
matters. Also, update the comment to be hopefully clearer about the
purpose of the code.

Fixes: 0a95e630b4 (cpufreq: arm_big_little: check if the frequency is set correctly)
Signed-off-by: Jon Medhurst <tixy@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-02 01:58:27 +01:00
Rafael J. Wysocki 394f7164e6 Merge branch 'pm-opp'
* pm-opp:
  PM / OPP: passing NULL to PTR_ERR()
  PM / OPP: Move cpu specific code to opp/cpu.c
  PM / OPP: Move opp core to its own directory
  PM / OPP: Prefix exported opp routines with dev_pm_opp_
  PM / OPP: Rename opp init/free table routines
  PM / OPP: reuse of_parse_phandle()
2015-11-02 00:54:37 +01:00
Rafael J. Wysocki 69f8947b8c Merge branches 'pm-cpufreq' and 'pm-cpuidle'
* pm-cpufreq:
  cpufreq: postfix policy directory with the first CPU in related_cpus
  cpufreq: create cpu/cpufreq/policyX directories
  cpufreq: remove cpufreq_sysfs_{create|remove}_file()
  cpufreq: create cpu/cpufreq at boot time
  cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
  cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value
  cpufreq: intel_pstate: Avoid calculation for max/min
  Documentation: kernel_parameters for Intel P state driver
  cpufreq: intel_pstate: Use ACPI perf configuration
  cpufreq: intel-pstate: Use separate max pstate for scaling
  cpufreq: intel_pstate: get P1 from TAR when available
  cpufreq: Drop redundant check for inactive policies
  cpufreq : powernv: Report Pmax throttling if capped below nominal frequency
  cpufreq: imx: update the clock switch flow to support imx6ul
  cpufreq: tegra20: remove superfluous CONFIG_PM ifdefs
  cpufreq: conservative: remove 'enable' field
  cpufreq: integrator: Fix module autoload for OF platform driver

* pm-cpuidle:
  cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver
  cpuidle: mvebu: clean up multiple platform drivers
2015-11-02 00:54:10 +01:00
Rafael J. Wysocki 62839e2d01 Merge branch 'acpi-processor'
* acpi-processor:
  ACPI / CPPC: Fix potential memory leak
  ACPI / CPPC: signedness bug in register_pcc_channel()
  ACPI: Allow selection of the ACPI processor driver for ARM64
  CPPC: Probe for CPPC tables for each ACPI Processor object
  ACPI: Add weak routines for ACPI CPU Hotplug
  ACPI / CPPC: Add a CPUFreq driver for use with CPPC
  ACPI: Introduce CPU performance controls using CPPC
2015-11-02 00:50:37 +01:00
Viresh Kumar 3510fac454 cpufreq: postfix policy directory with the first CPU in related_cpus
The sysfs policy directory is postfixed currently with the CPU number
for which the policy was created, which isn't necessarily the first CPU
in related_cpus mask.

To make it more consistent and predictable, lets postfix the policy with
the first cpu in related-cpus mask.

Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-28 09:21:12 +01:00
Viresh Kumar 96bdda61f5 cpufreq: create cpu/cpufreq/policyX directories
The cpufreq sysfs interface had been a bit inconsistent as one of the
CPUs for a policy had a real directory within its sysfs 'cpuX' directory
and all other CPUs had links to it. That also made the code a bit
complex as we need to take care of moving the sysfs directory if the CPU
containing the real directory is getting physically hot-unplugged.

Solve this by creating 'policyX' directories (per-policy) in
/sys/devices/system/cpu/cpufreq/ directory, where X is the CPU for which
the policy was first created.

This also removes the need of keeping kobj_cpu and we can remove it now.

Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: is more of a general agreement from the person that he is
Reviewed-by: is a more strict tag and implies that the reviewer has
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-28 09:21:12 +01:00
Viresh Kumar c82bd44437 cpufreq: remove cpufreq_sysfs_{create|remove}_file()
They don't do anything special now, remove the unnecessary wrapper.

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-28 09:21:12 +01:00
Viresh Kumar 8eec1020f0 cpufreq: create cpu/cpufreq at boot time
Later patches will need to create policy specific directories in
/sys/devices/system/cpu/cpufreq/ directory and so the cpufreq directory
wouldn't be ever empty.

And so no fun creating/destroying it on need basis anymore. Create it
once on system boot.

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-28 09:21:12 +01:00
Viresh Kumar 0998a03a3a cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
->related_cpus is empty at this point of time and copying ->cpus to it
or orring ->related_cpus with ->cpus would result in the same value. But
cpumask_copy makes it rather clear.

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-28 09:21:11 +01:00
Viresh Kumar 083701b13c cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
'timer_mutex' is required to sync work-handlers of policy->cpus.
update_sampling_rate() is just canceling the works and queuing them
again. This isn't protecting anything at all in update_sampling_rate()
and is not gonna be of any use.

Even if a work-handler is already running for a CPU,
cancel_delayed_work_sync() will wait for it to finish.

Drop these unnecessary locks.

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-28 09:20:04 +01:00
Prarit Bhargava 51443fbf3d cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value
On systems that initialize the intel_pstate driver with the performance
governor, and then switch to the powersave governor will not transition to
lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
is set to a low value.

The behavior of governor switching changed after commit a04759924e
("[cpufreq] intel_pstate: honor user space min_perf_pct override on
 resume").  The commit introduced tracking of performance percentage
changes via sysfs in order to restore userspace changes during
suspend/resume.  The problem occurs because the global values of the newly
introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
change and this causes the powersave governor to inherit the performance
governor's settings.

A simple change would have been to reset max_sysfs_pct to 100 and
min_sysfs_pct to 0 on a governor change, which fixes the problem with
governor switching.  However, since we cannot break userspace[1] the fix
is now to give each governor its own limits storage area so that governor
specific changes are tracked.

I successfully tested this by booting with both the performance governor
and the powersave governor by default, and switching between the two
governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
and looking at the output of cpupower frequency-info).  Suspend/Resume
testing was performed by Doug Smythies.

[1] Systems which suspend/resume using the unmaintained pm-utils package
will always transition to the performance governor before the suspend and
after the resume.  This means a system using the powersave governor will
go from powersave to performance, then suspend/resume, performance to
powersave.  The simple change during governor changes would have been
overwritten when the governor changed before and after the suspend/resume.
I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
against Fedora to remove the 94cpufreq file that causes the problem.  It
should be noted that pm-utils is obsoleted with newer versions of systemd.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-16 22:35:11 +02:00
Rafael J. Wysocki 7855e10294 Merge back earlier cpufreq material for v4.4. 2015-10-16 22:12:02 +02:00
Srinivas Pandruvada 8e601a9f97 cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL)
This is a workaround for KNL platform, where in some cases MPERF counter
will not have updated value before next read of MSR_IA32_MPERF. In this
case divide by zero will occur. This change ignores current sample for
busy calculation in this case.

Fixes: b34ef932d7 (intel_pstate: Knights Landing support)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: 4.1+ <stable@vger.kernel.org> # 4.1+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15 22:46:33 +02:00
Srinivas Pandruvada 4ef4514870 cpufreq: intel_pstate: Avoid calculation for max/min
When requested from cpufreq to set policy, look into _pss and get
control values, instead of using max/min perf calculations. These
calculation misses next control state in boundary conditions.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15 01:53:19 +02:00
Srinivas Pandruvada 37afb00032 cpufreq: intel_pstate: Use ACPI perf configuration
Use ACPI _PSS to limit the Intel P State turbo, max and min ratios.
This driver uses acpi processor perf lib calls to register performance.
The following logic is used to adjust Intel P state driver limits:
- If there is no turbo entry in _PSS, then disable Intel P state turbo
and limit to non turbo max
- If the non turbo max ratio is more than _PSS max non turbo value, then
set the max non turbo ratio to _PSS non turbo max
- If the min ratio is less than _PSS min then change the min ratio
matching _PSS min
- Scale the _PSS turbo frequency to max turbo frequency based on control
value.
This feature can be disabled by using kernel parameters:
intel_pstate=no_acpi

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15 01:53:18 +02:00
Srinivas Pandruvada 3bcc6fa971 cpufreq: intel-pstate: Use separate max pstate for scaling
Systems with configurable TDP have multiple max non turbo p state. Intel
P state uses max non turbo P state for scaling. But using the real max
non turbo p state causes underestimation of next P state. So using
the physical max non turbo P state as before for scaling.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15 01:53:18 +02:00
Srinivas Pandruvada 6a35fc2d6c cpufreq: intel_pstate: get P1 from TAR when available
After Ivybridge, the max non turbo ratio obtained from platform info msr
is not always guaranteed P1 on client platforms. The max non turbo
activation ratio (TAR), determines the max for the current level of TDP.
The ratio in platform info is physical max. The TAR MSR can be locked,
so updating this value is not possible on all platforms.
This change gets this ratio from MSR TURBO_ACTIVATION_RATIO if
available,
but also do some sanity checking to make sure that this value is
correct.
The sanity check involves reading the TDP ratio for the current tdp
control value when platform has configurable TDP present and matching
TAC
with this.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15 01:53:18 +02:00
Arnd Bergmann c049adc9fd ARM System Control and Power Interface(SCPI) support
It adds support for the following features provided by SCP firmware
 using different subsystems in Linux:
   1. SCPI mailbox protocol driver which using mailbox framework
   2. Clocks provided by SCP using clock framework
   3. CPU DVFS(cpufreq) using existing arm-big-little driver
   4. SCPI based sensors including temperature sensors
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJWF7ifAAoJEABBurwxfuKYAfMP/34ka/n4+U/aPQXzStNIwr3v
 Nme9WSf3mUPv26MstRDrWRYi1G2WLOTlc196MpdIt6m6QLOjxzEl3tSq5ILrj7yN
 KoLojtISmu/pbhVcJN5fllxgpcJzufLoEWBa5T/Y/4GoIhh1NCYa82QpNgzPmsMd
 rPCkYHqwT6I3sIS+/mbDkGA/QnwJ2qtJ8sp3+fL+dyJbI7Aa1zJZP6ectPsxK22+
 HFoFTY45rdFv/ojZZFZL8E/gcblYwRWKzIgwdASHuDXxIhd/IPwjrex2Iyv75AQK
 zusRQ5Xv82GaYWHVa9GXmZqXkTsvBg4AJwc4Uq2JdB0qOi2a4tc8PkK7Ts5YdHgS
 YVGxbY1POtMBi2bJUjsviMY7dGR3I+iEXJTYnbPnkVa+GTv8/FViVmOOLQnnBF4R
 fN5FN0vfuL6zaQzOPYLGx3SuEHix3ko2DCAcMg6idIxuBHArlJuS7XKECWdHuc0+
 +qn6Iqf8YSKIZ1zrWMggqY/sXuxjtABUBXe3jP3iTKQh8h+9SLfN3wgQM4GFJJcB
 gNfvk3Hl5aPFy/7gsgSDlaYbhGKPwTup+R8Fqd6nSBQO+rpRXvQQftwigYQiIEcE
 IiOS3BntVQWjoVr9WIifguf6rHG1ZoSMTHdtVVEaqsspT/OGJyq/ynEFJYSFqcqX
 NRPdQJNuoXGolGhyoWxD
 =9u+p
 -----END PGP SIGNATURE-----

Merge tag 'arm-scpi-for-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into next/drivers

Merge "ARM System Control and Power Interface(SCPI) support" from Sudeep Holla

It adds support for the following features provided by SCP firmware
using different subsystems in Linux:
  1. SCPI mailbox protocol driver which using mailbox framework
  2. Clocks provided by SCP using clock framework
  3. CPU DVFS(cpufreq) using existing arm-big-little driver
  4. SCPI based sensors including temperature sensors

* tag 'arm-scpi-for-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  hwmon: Support thermal zones registration for SCP temperature sensors
  hwmon: Support sensors exported via ARM SCP interface
  firmware: arm_scpi: Extend to support sensors
  Documentation: add DT bindings for ARM SCPI sensors
  cpufreq: arm_big_little: add SCPI interface driver
  clk: scpi: add support for cpufreq virtual device
  clk: add support for clocks provided by SCP(System Control Processor)
  firmware: add support for ARM System Control and Power Interface(SCPI) protocol
  Documentation: add DT binding for ARM System Control and Power Interface(SCPI) protocol
2015-10-14 17:07:32 +02:00
Viresh Kumar e625742f9c cpufreq: Drop redundant check for inactive policies
We just made sure policy->cpu is online and this check will always fail
as the policy is active. Drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-14 02:30:28 +02:00
Ashwin Chaugule 5477fb3bd1 ACPI / CPPC: Add a CPUFreq driver for use with CPPC
This driver utilizes the methods introduced in a previous
patch titled - "ACPI: Introduce CPU performance controls using CPPC"
and enables usage with existing CPUFreq governors.

Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Reviewed-by: Al Stone <al.stone@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-12 23:04:31 +02:00
Rafael J. Wysocki 51ec005a94 Merge back earlier cpufreq material for v4.4. 2015-10-12 20:24:24 +02:00
Srinivas Pandruvada 55582bccdc cpufreq: prevent lockup on reading scaling_available_frequencies
When scaling_available_frequencies is read on an offlined cpu, then
either lockup or junk values are displayed. This is caused by
freed freq_table, which policy is using.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-08 21:47:56 +02:00
Srinivas Pandruvada e25303676e cpufreq: acpi_cpufreq: prevent crash on reading freqdomain_cpus
When freqdomain_cpus attribute is read from an offlined cpu, it will
cause crash. This change prevents calling cpufreq_show_cpus when
policy driver_data is NULL.

Crash info:

[  170.814949] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[  170.814990] IP: [<ffffffff813b2490>] _find_next_bit.part.0+0x10/0x70
[  170.815021] PGD 227d30067 PUD 229e56067 PMD 0
[  170.815043] Oops: 0000 [#2] SMP
[  170.816022] CPU: 3 PID: 3121 Comm: cat Tainted: G      D    OE   4.3.0-rc3+ #33
...
...
[  170.816657] Call Trace:
[  170.816672]  [<ffffffff813b2505>] ? find_next_bit+0x15/0x20
[  170.816696]  [<ffffffff8160e47c>] cpufreq_show_cpus+0x5c/0xd0
[  170.816722]  [<ffffffffa031a409>] show_freqdomain_cpus+0x19/0x20 [acpi_cpufreq]
[  170.816749]  [<ffffffff8160e65b>] show+0x3b/0x60
[  170.816769]  [<ffffffff8129b31c>] sysfs_kf_seq_show+0xbc/0x130
[  170.816793]  [<ffffffff81299be3>] kernfs_seq_show+0x23/0x30
[  170.816816]  [<ffffffff81240f2c>] seq_read+0xec/0x390
[  170.816837]  [<ffffffff8129a64a>] kernfs_fop_read+0x10a/0x160
[  170.816861]  [<ffffffff8121d9b7>] __vfs_read+0x37/0x100
[  170.816883]  [<ffffffff813217c0>] ? security_file_permission+0xa0/0xc0
[  170.816909]  [<ffffffff8121e2e3>] vfs_read+0x83/0x130
[  170.816930]  [<ffffffff8121f035>] SyS_read+0x55/0xc0
...
...
[  170.817185] ---[ end trace bc6eadf82b2b965a ]---

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.2+ <stable@vger.kernel.org> # 4.2+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-08 21:47:13 +02:00
Sudeep Holla 8def31034d cpufreq: arm_big_little: add SCPI interface driver
On some ARM based systems, a separate Cortex-M based System Control
Processor(SCP) provides the overall power, clock, reset and system
control including CPU DVFS. SCPI Message Protocol is used to
communicate with the SCPI.

This patch adds a interface driver for adding OPPs and registering
the arm_big_little cpufreq driver for such systems.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
2015-09-28 11:53:38 +01:00
Shilpasri G Bhat d43b1b6f8e cpufreq : powernv: Report Pmax throttling if capped below nominal frequency
Log a 'critical' message if the max frequency is reduced below nominal
frequency. We already log 'info' message if the max frequency is
capped below turbo frequency. CPU should guarantee atleast nominal
frequency, but not turbo frequency in all system configurations and
environments. So report the pmax throttling with severity when Pmax is
dipped below nominal frequency.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-26 03:03:06 +02:00
Bai Ping a35fc5a33b cpufreq: imx: update the clock switch flow to support imx6ul
For i.MX6UL, the clock switch flow is slightly different from
other i.MX6 SOCs. It has a 'secondary_sel' clk that will be used
when the CPU freq is higher than 396MHz. So the clock switch flow in
'set_target' callback need to update to support i.MX6UL in the common
i.MX6 SOC cpufreq driver.

Signed-off-by: Bai Ping <b51503@freescale.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-26 03:02:11 +02:00
Bartlomiej Zolnierkiewicz febf63cf61 cpufreq: tegra20: remove superfluous CONFIG_PM ifdefs
CONFIG_PM ifdefs are superfluous and can be removed.

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-26 03:00:57 +02:00
Viresh Kumar 03d5eec000 cpufreq: conservative: remove 'enable' field
Conservative governor has its own 'enable' field to check if
conservative governor is used for a CPU or not

This can be checked by policy->governor with 'cpufreq_gov_conservative'
and so this field can be dropped.

Because its not guaranteed that dbs_info->cdbs.shared will a valid
pointer for all CPUs (will be NULL for CPUs that don't use
ondemand/conservative governors), we can't use it anymore. Lets get
policy with cpufreq_cpu_get_raw() instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-26 02:59:38 +02:00
Luis de Bethencourt 78ce6dbfa3 cpufreq: integrator: Fix module autoload for OF platform driver
This platform driver has a OF device ID table but the OF module
alias information is not created so module autoloading won't work.

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-25 23:29:35 +02:00
Rafael J. Wysocki 1f0bd44e93 cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()
cpufreq_cpu_get() called by get_cur_freq_on_cpu() is overkill,
because the ->get() callback is always invoked in a context in
which all of the conditions checked by cpufreq_cpu_get() are
guaranteed to be satisfied.

Use cpufreq_cpu_get_raw() instead of it and drop the
corresponding cpufreq_cpu_put() from get_cur_freq_on_cpu().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2015-09-16 02:17:49 +02:00
Viresh Kumar 33692dc381 PM / OPP: Move opp core to its own directory
OPP code is expanding and is already present in multiple directories
(cpufreq and power). Lets move it to its own directory, to manage it
better.

This also moves/renames the cpufreq_opp file to cpu.c, as it will
contain helpers for cpu device. Its not just about cpufreq, other
frameworks can use OPPs as well.

Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-15 02:03:16 +02:00