Commit Graph

565 Commits

Author SHA1 Message Date
Rafael J. Wysocki 3000ce3c52 cpufreq: Use per-policy frequency QoS
Replace the CPU device PM QoS used for the management of min and max
frequency constraints in cpufreq (and its users) with per-policy
frequency QoS to avoid problems with cpufreq policies covering
more then one CPU.

Namely, a cpufreq driver is registered with the subsys interface
which calls cpufreq_add_dev() for each CPU, starting from CPU0, so
currently the PM QoS notifiers are added to the first CPU in the
policy (i.e. CPU0 in the majority of cases).

In turn, when the cpufreq driver is unregistered, the subsys interface
doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0,
and the PM QoS notifiers are only removed when cpufreq_remove_dev() is
called for the last CPU in the policy, say CPUx, which as a rule is
not CPU0 if the policy covers more than one CPU.  Then, the PM QoS
notifiers cannot be removed, because CPUx does not have them, and
they are still there in the device PM QoS notifiers list of CPU0,
which prevents new PM QoS notifiers from being registered for CPU0
on the next attempt to register the cpufreq driver.

The same issue occurs when the first CPU in the policy goes offline
before unregistering the driver.

After this change it does not matter which CPU is the policy CPU at
the driver registration time and whether or not it is online all the
time, because the frequency QoS is per policy and not per CPU.

Fixes: 67d874c3b2 ("cpufreq: Register notifiers with the PM QoS framework")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Diagnosed-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f
Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-10-21 02:05:21 +02:00
Rafael J. Wysocki 65650b3513 cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
It is incorrect to set the cpufreq syscore shutdown callback pointer
to cpufreq_suspend(), because that function cannot be run in the
syscore stage of system shutdown for two reasons: (a) it may attempt
to carry out actions depending on devices that have already been shut
down at that point and (b) the RCU synchronization carried out by it
may not be able to make progress then.

The latter issue has been present since commit 45975c7d21 ("rcu:
Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds"),
but the former one has been there since commit 90de2a4aa9 ("cpufreq:
suspend cpufreq governors on shutdown") regardless.

Fix that by dropping cpufreq_syscore_ops altogether and making
device_shutdown() call cpufreq_suspend() directly before shutting
down devices, which is along the lines of what system-wide power
management does.

Fixes: 45975c7d21 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")
Fixes: 90de2a4aa9 ("cpufreq: suspend cpufreq governors on shutdown")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
2019-10-10 11:11:17 +02:00
Rafael J. Wysocki beb4e08e21 Merge branch 'pm-cpufreq-qos'
* pm-cpufreq-qos:
  Documentation: cpufreq: Update policy notifier documentation
  cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events
  ACPI: cpufreq: Switch to QoS requests instead of cpufreq notifier
  video: pxafb: Remove cpufreq policy notifier
  video: sa1100fb: Remove cpufreq policy notifier
  arch_topology: Use CPUFREQ_CREATE_POLICY instead of CPUFREQ_NOTIFY
  cpufreq: powerpc_cbe: Switch to QoS requests for freq limits
  cpufreq: powerpc: macintosh: Switch to QoS requests for freq limits
  thermal: cpu_cooling: Switch to QoS requests for freq limits
  cpufreq: Add policy create/remove notifiers back
2019-09-05 09:01:26 +02:00
Viresh Kumar df0eea4488 cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events
No driver makes reference to these events now, remove them and the code
related to them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-09-02 22:44:04 +02:00
Florian Fainelli e9a7cc1d97 cpufreq: Print driver name if cpufreq_suspend() fails
Instead of printing the policy, which is incidentally a kernel pointer,
so with limited interest, print the cpufreq driver name that failed to
be suspend, which is more useful for debugging.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-22 09:27:25 +02:00
Colin Ian King 62c23a89fd cpufreq: remove redundant assignment to ret
Variable ret is initialized to a value that is never read and it is
re-assigned later. The initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-21 00:28:30 +02:00
Viresh Kumar 6a1490367c cpufreq: Add policy create/remove notifiers back
This effectively reverts some changes made by commit f9f41e3ef9
("cpufreq: Remove policy create/remove notifiers").

We have a new use case for policy create/remove notifiers (for
allocating/freeing QoS requests per policy), so add them back.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-10 14:05:48 +02:00
Viresh Kumar e61a41256e cpufreq: dev_pm_qos_update_request() can return 1 on success
dev_pm_qos_update_request() can return 1 on success, so don't treat
it as an error.

Fixes: 18c49926c4 ("cpufreq: Add QoS requests for userspace constraints")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-10 13:39:47 +02:00
Rafael J. Wysocki 918e162e6a Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: Make cpufreq_generic_init() return void
  cpufreq: imx-cpufreq-dt: Add i.MX8MN support
  cpufreq: Add QoS requests for userspace constraints
  cpufreq: intel_pstate: Reuse refresh_frequency_limits()
  cpufreq: Register notifiers with the PM QoS framework
  PM / QoS: Add support for MIN/MAX frequency constraints
  PM / QOS: Pass request type to dev_pm_qos_read_value()
  PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value()
  PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier()
2019-07-18 09:49:30 +02:00
Viresh Kumar c4dcc8a162 cpufreq: Make cpufreq_generic_init() return void
It always returns 0 (success) and its return type should really be void.

Over that, many drivers have added error handling code based on its
return value, which is not required at all.

Change its return type to void and update all the callers.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-16 10:20:11 +02:00
Viresh Kumar 18c49926c4 cpufreq: Add QoS requests for userspace constraints
This implements QoS requests to manage userspace configuration of min
and max frequency.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: syzbot <syzbot+de771ae9390dffed7266@syzkaller.appspotmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-08 23:56:39 +02:00
Viresh Kumar c57b25bdf7 cpufreq: intel_pstate: Reuse refresh_frequency_limits()
The implementation of intel_pstate_update_max_freq() is quite similar to
refresh_frequency_limits(), lets reuse it.

Finding minimum of policy->user_policy.max and policy->cpuinfo.max_freq
in intel_pstate_update_max_freq() is redundant as cpufreq_set_policy()
will call the ->verify() callback of intel-pstate driver, which will do
this comparison anyway and so dropping it from
intel_pstate_update_max_freq() doesn't harm.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-08 23:56:39 +02:00
Viresh Kumar 67d874c3b2 cpufreq: Register notifiers with the PM QoS framework
Register notifiers for min/max frequency constraints with the PM QoS
framework. The constraints are also taken into consideration in
cpufreq_set_policy().

This also relocates cpufreq_policy_put_kobj() as it is required to be
called from cpufreq_policy_alloc() now.

refresh_frequency_limits() is updated to avoid calling
cpufreq_set_policy() for inactive policies and handle_update() is
updated to have proper locking in place.

No constraints are added until now though.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-08 23:56:13 +02:00
Rafael J. Wysocki 586a07dca8 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: Avoid calling cpufreq_verify_current_freq() from handle_update()
  cpufreq: Consolidate cpufreq_update_current_freq() and __cpufreq_get()
  cpufreq: Don't skip frequency validation for has_target() drivers
  cpufreq: Use has_target() instead of !setpolicy
  cpufreq: Remove redundant !setpolicy check
  cpufreq: Move the IS_ENABLED(CPU_THERMAL) macro into a stub
  cpufreq: s5pv210: Don't flood kernel log after cpufreq change
  cpufreq: pcc-cpufreq: Fail initialization if driver cannot be registered
  cpufreq: add driver for Raspberry Pi
  cpufreq: Switch imx7d to imx-cpufreq-dt for speed grading
  cpufreq: imx-cpufreq-dt: Remove global platform match list
  cpufreq: brcmstb-avs-cpufreq: Fix types for voltage/frequency
  cpufreq: brcmstb-avs-cpufreq: Fix initial command check
  cpufreq: armada-37xx: Remove set but not used variable 'freq'
  cpufreq: imx-cpufreq-dt: Fix no OPPs available on unfused parts
  dt-bindings: imx-cpufreq-dt: Document opp-supported-hw usage
  cpufreq: Add imx-cpufreq-dt driver
2019-07-08 11:00:02 +02:00
Viresh Kumar 70a59fde6e cpufreq: Avoid calling cpufreq_verify_current_freq() from handle_update()
On some occasions cpufreq_verify_current_freq() schedules a work whose
callback is handle_update(), which further calls cpufreq_update_policy()
which may end up calling cpufreq_verify_current_freq() again.

On the other hand, when cpufreq_update_policy() is called from
handle_update(), the pointer to the cpufreq policy is already
available, but cpufreq_cpu_acquire() is still called to get it in
cpufreq_update_policy(), which should be avoided as well.

To fix these issues, create a new helper, refresh_frequency_limits(),
and make both handle_update() call it cpufreq_update_policy().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rename reeval_frequency_limits() as refresh_frequency_limits() ]
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-28 11:24:56 +02:00
Viresh Kumar 5980752e6e cpufreq: Consolidate cpufreq_update_current_freq() and __cpufreq_get()
Their implementations are quite similar, so modify
cpufreq_update_current_freq() somewhat and call it from
__cpufreq_get().

Also rename cpufreq_update_current_freq() to
cpufreq_verify_current_freq(), as that's what it is doing.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-28 11:17:12 +02:00
Viresh Kumar 9801522840 cpufreq: Don't skip frequency validation for has_target() drivers
CPUFREQ_CONST_LOOPS was introduced in a very old commit from pre-2.6
kernel release by commit 6a4a93f9c0d5 ("[CPUFREQ] Fix 'out of sync'
issue").

Basically, that commit does two things:

 - It adds the frequency verification code (which is quite similar to
   what we have today as well).

 - And it sets the CPUFREQ_CONST_LOOPS flag only for setpolicy drivers,
   rightly so based on the code we had then. The idea was to avoid
   frequency validation for setpolicy drivers as the cpufreq core doesn't
   know what frequency the hardware is running at and so no point in
   doing frequency verification.

The problem happened when we started to use the same CPUFREQ_CONST_LOOPS
flag for constant loops-per-jiffy thing as well and many has_target()
drivers started using the same flag and unknowingly skipped the
verification of frequency. There is no logical reason behind skipping
frequency validation because of the presence of CPUFREQ_CONST_LOOPS
flag otherwise.

Fix this issue by skipping frequency validation only for setpolicy
drivers and always doing it for has_target() drivers irrespective of
the presence or absence of CPUFREQ_CONST_LOOPS flag.

cpufreq_notify_transition() is only called for has_target() type driver
and not for set_policy type, and the check is simply redundant. Remove
it as well.

Also remove () around freq comparison statement as they aren't required
and checkpatch also warns for them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-28 10:16:14 +02:00
Viresh Kumar 5ddc6d4e30 cpufreq: Use has_target() instead of !setpolicy
For code consistency, use has_target() instead of !setpolicy everywhere,
as it is already done at several places. Maybe we should also use
"!has_target()" instead of "cpufreq_driver->setpolicy" where we need to
check if the driver supports setpolicy, so to use only one expression
for this kind of differentiation.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-26 11:41:04 +02:00
Viresh Kumar 407d0fff22 cpufreq: Remove redundant !setpolicy check
cpufreq_start_governor() is only called for !setpolicy case, checking it
again is not required.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-26 11:36:55 +02:00
Daniel Lezcano bcc6156999 cpufreq: Move the IS_ENABLED(CPU_THERMAL) macro into a stub
cpufreq_online() and cpufreq_offline() [un]register the driver as
a cooling device. This is done if the driver is flagged as a cooling
device in addition with an IS_ENABLED() check to compile out the branching
code.

Group this test in a stub function added in the cpufreq header instead
of having the IS_ENABLED() in the code.

Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-26 10:59:57 +02:00
Thomas Gleixner d2912cb15b treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:55 +02:00
Linus Torvalds bfbfbf7368 More power management updates for 5.2-rc1
- Fix recent regression causing kernels built with CONFIG_PM
    unset to crash on systems that support the Performance and
    Energy Bias Hint (EPB) by avoiding to compile the EPB-related
    code depending on CONFIG_PM when it is unset (Rafael Wysocki).
 
  - Clean up the transition notifier invocation code in the cpufreq
    core and change some users of cpufreq transition notifiers
    accordingly (Viresh Kumar).
 
  - Change MAINTAINERS to cover the schedutil governor as part of
    cpufreq (Viresh Kumar).
 
  - Simplify cpufreq_init_policy() to avoid redundant computations
    (Yue Hu).
 
  - Add explanatory comment to the cpufreq core (Rafael Wysocki).
 
  - Introduce a new flag, GENPD_FLAG_RPM_ALWAYS_ON, to the generic
    power domains (genpd) framework along with the first user of it
    (Leonard Crestez).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAlzb4TASHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxiEAP/37uQOx+I8J3IU7HQcPIkdI1hgksLEzo
 g2eoREekjszIjFK9xa70X3V/QnGK4YSPQ/cHCjgXfVhwkO5TJzte5T5M2z9gUCDT
 7OMYWCI6hP6Mo5UWlP4dQ9Cqce4SB3TdibadevxcVOhFAW/xz42y5Gr6s4WkexJf
 Swb2uoLS4gGANyhUhx6XEZ5NpWZkWcK2ygZ8VJZETnoIwxMSUW7FTJkF+4s2tXLZ
 GH+F5jWAbwPlg6g2c54lPL1HtiAvK+/018aF8CZMqUBec94RHDFybVOlb5sacfQW
 +Y0W/mc/6SMqT3OUcQ0H3Z/qkgwR8mL01hH6gCP1jA5OBljmTjzk0Bbc4c3n9BEN
 aRy4M8Qc/GXzEBPO3Z9AlYik6ALH9iUgL2hewGZAFN8kn9ZGPAqYsctdCVkfKL1u
 4Esz5+wOsyYmBx910PozL+p2jbTH0x89sSo1qXUQr2JEiNm2iL4I4+ndqhuiq4LO
 sQPHCpe4HhYWzIQzJLDurv6hAxxU5PUsGg8XDEGlsyowIPDoIkMgC93RRLGZ/taY
 Ivc2FSlwLTSkzBHwVfckakXPvfyFdw8DFL2n66dQbXS9FFNshOF/TFx40iV42i5H
 wusyIZIT1y1H74De0EVntUho3xBo3nrrsu1o2NaXsTBoEsYwJiCji4yOZlI1Zh+m
 A9coiXKm4hY5
 =LqTN
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These fix a recent regression causing kernels built with CONFIG_PM
  unset to crash on systems that support the Performance and Energy Bias
  Hint (EPB), clean up the cpufreq core and some users of transition
  notifiers and introduce a new power domain flag into the generic power
  domains framework (genpd).

  Specifics:

   - Fix recent regression causing kernels built with CONFIG_PM unset to
     crash on systems that support the Performance and Energy Bias Hint
     (EPB) by avoiding to compile the EPB-related code depending on
     CONFIG_PM when it is unset (Rafael Wysocki).

   - Clean up the transition notifier invocation code in the cpufreq
     core and change some users of cpufreq transition notifiers
     accordingly (Viresh Kumar).

   - Change MAINTAINERS to cover the schedutil governor as part of
     cpufreq (Viresh Kumar).

   - Simplify cpufreq_init_policy() to avoid redundant computations (Yue
     Hu).

   - Add explanatory comment to the cpufreq core (Rafael Wysocki).

   - Introduce a new flag, GENPD_FLAG_RPM_ALWAYS_ON, to the generic
     power domains (genpd) framework along with the first user of it
     (Leonard Crestez)"

* tag 'pm-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  soc: imx: gpc: Use GENPD_FLAG_RPM_ALWAYS_ON for ERR009619
  PM / Domains: Add GENPD_FLAG_RPM_ALWAYS_ON flag
  cpufreq: Update MAINTAINERS to include schedutil governor
  cpufreq: Don't find governor for setpolicy drivers in cpufreq_init_policy()
  cpufreq: Explain the kobject_put() in cpufreq_policy_alloc()
  cpufreq: Call transition notifier only once for each policy
  x86: intel_epb: Take CONFIG_PM into account
2019-05-15 08:46:44 -07:00
Yue Hu ab05d97a37 cpufreq: Don't find governor for setpolicy drivers in cpufreq_init_policy()
In cpufreq_init_policy() we will check if there's last_governor for target
and setpolicy type. However last_governor is set only if has_target() is
true in cpufreq_offline(). That means find last_governor for setpolicy
type is pointless. Also new_policy.governor will not be used if ->setpolicy
callback is set in cpufreq_set_policy().

Moreover, there's duplicate ->setpolicy check in using default policy path.
Let's add a new helper function to avoid it. Also update comments.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-05-13 10:46:24 +02:00
Rafael J. Wysocki 2acb9bdae9 cpufreq: Explain the kobject_put() in cpufreq_policy_alloc()
It may not be particularly clear why the kobject_put() after
failing kobject_init_and_add() in cpufreq_policy_alloc() is not
redundant, so add a comment to explain that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-05-13 10:43:53 +02:00
Viresh Kumar df24014abe cpufreq: Call transition notifier only once for each policy
Currently, the notifiers are called once for each CPU of the policy->cpus
cpumask. It would be more optimal if the notifier can be called only
once and all the relevant information be provided to it. Out of the 23
drivers that register for the transition notifiers today, only 4 of them
do per-cpu updates and the callback for the rest can be called only once
for the policy without any impact.

This would also avoid multiple function calls to the notifier callbacks
and reduce multiple iterations of notifier core's code (which does
locking as well).

This patch adds pointer to the cpufreq policy to the struct
cpufreq_freqs, so the notifier callback has all the information
available to it with a single call. The five drivers which perform
per-cpu updates are updated to use the cpufreq policy. The freqs->cpu
field is redundant now and is removed.

Acked-by: David S. Miller <davem@davemloft.net> (sparc)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-05-10 12:20:36 +02:00
Linus Torvalds 0968621917 Printk changes for 5.2
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA
 lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0
 m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO
 EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK
 r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k
 FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq
 uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj
 lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR
 waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w
 wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55
 Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10
 c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY=
 =WZfC
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - Allow state reset of printk_once() calls.

 - Prevent crashes when dereferencing invalid pointers in vsprintf().
   Only the first byte is checked for simplicity.

 - Make vsprintf warnings consistent and inlined.

 - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
   modifiers.

 - Some clean up of vsprintf and test_printf code.

* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  lib/vsprintf: Make function pointer_string static
  vsprintf: Limit the length of inlined error messages
  vsprintf: Avoid confusion between invalid address and value
  vsprintf: Prevent crash when dereferencing invalid pointers
  vsprintf: Consolidate handling of unknown pointer specifiers
  vsprintf: Factor out %pO handler as kobject_string()
  vsprintf: Factor out %pV handler as va_format()
  vsprintf: Factor out %p[iI] handler as ip_addr_string()
  vsprintf: Do not check address of well-known strings
  vsprintf: Consistent %pK handling for kptr_restrict == 0
  vsprintf: Shuffle restricted_pointer()
  printk: Tie printk_once / printk_deferred_once into .data.once for reset
  treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
  lib/test_printf: Switch to bitmap_zalloc()
2019-05-07 09:18:12 -07:00
Viresh Kumar 4ebe36c94a cpufreq: Fix kobject memleak
Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking the
kobject.

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-30 10:54:23 +02:00
Yue Hu 4db7c34cb4 cpufreq: Move ->get callback check outside of __cpufreq_get()
Currenly, __cpufreq_get() called by show_cpuinfo_cur_freq() will check
->get callback. That is needless since cpuinfo_cur_freq attribute will
not be created if ->get is not set. So let's drop it in __cpufreq_get().
Also keep this check in cpufreq_get().

Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-23 10:58:43 +02:00
Yue Hu b23aa311fa cpufreq: Remove needless bios_limit check in show_bios_limit()
Initially, bios_limit attribute will be created if driver->bios_limit
is set in cpufreq_add_dev_interface(). So remove the redundant check
for latter show operation.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-16 23:10:42 +02:00
Sakari Ailus d75f773c86 treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.

The changes have been produced by the following command:

	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

And verifying the result.

Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-04-09 14:19:06 +02:00
Yue Hu 89f98d7e5f cpufreq: Remove cpufreq_driver check in cpufreq_boost_supported()
Currently there are three calling paths for cpufreq_boost_supported() in
all as below, we can see the cpufreq_driver null check is needless since
it is already checked before.

<path1>
  cpufreq_enable_boost_support()
    |-> if (!cpufreq_driver)
    |-> cpufreq_boost_supported()

<path2>
  cpufreq_register_driver()
    |-> if (!driver_data ...
    |-> cpufreq_driver = driver_data
    |-> cpufreq_boost_supported()
    |-> remove_boost_sysfs_file()
          |-> cpufreq_boost_supported()

<path3>
  cpufreq_unregister_driver()
    |-> if (!cpufreq_driver ...
    |-> remove_boost_sysfs_file()
          |-> cpufreq_boost_supported()

Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-09 09:57:16 +02:00
Rafael J. Wysocki 9083e49861 cpufreq: intel_pstate: Update max frequency on global turbo changes
While the cpuinfo.max_freq value doesn't really matter for
intel_pstate in the active mode, in the passive mode it is used by
governors as the maximum physical frequency of the CPU and the
results of governor computations generally depend on it.  Also it
is made available to user space via sysfs and it should match the
current HW configuration.

For this reason, make intel_pstate update cpuinfo.max_freq for all
CPUs if it detects a global change of turbo frequency settings from
"disable" to "enable" or the other way associated with a _PPC change
notification from the platform firmware.

Note that policy_is_inactive(),  cpufreq_cpu_acquire(),
cpufreq_cpu_release(), and cpufreq_set_policy() need to be made
available to it for this purpose.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200759
Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-04-08 11:26:09 +02:00
Rafael J. Wysocki 540a375822 cpufreq: Add cpufreq_cpu_acquire() and cpufreq_cpu_release()
It sometimes is necessary to find a cpufreq policy for a given CPU
and acquire its rwsem (for writing) immediately after that, so
introduce cpufreq_cpu_acquire() as a helper for that and the
complementary cpufreq_cpu_release().

Make cpufreq_update_policy() use the new functions.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-04-01 23:43:05 +02:00
Rafael J. Wysocki 5a25e3f7cc cpufreq: intel_pstate: Driver-specific handling of _PPC updates
In some cases, the platform firmware disables or enables turbo
frequencies for all CPUs globally before triggering a _PPC change
notification for one of them.  Obviously, that global change affects
all CPUs, not just the notified one, and it needs to be acted upon by
cpufreq.

The intel_pstate driver is able to detect such global changes of
the settings, but it also needs to update policy limits for all
CPUs if that happens, in particular if turbo frequencies are
enabled globally - to allow them to be used.

For this reason, introduce a new cpufreq driver callback to be
invoked on _PPC notifications, if present, instead of simply
calling cpufreq_update_policy() for the notified CPU and make
intel_pstate use it to trigger policy updates for all CPUs
in the system if global settings change.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200759
Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-04-01 23:43:05 +02:00
Rafael J. Wysocki 5d094fea14 cpufreq: Improve kerneldoc comments for cpufreq_cpu_get/put()
Fix the formatting of the cpufreq_cpu_get() and cpufreq_cpu_put()
kerneldoc comments and rework them to be somewhat easier to follow.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-03-07 10:55:29 +01:00
Rafael J. Wysocki 167a38dcd5 cpufreq: Pass updated policy to driver ->setpolicy() callback
The invocation of the ->setpolicy() cpufreq driver callback should
be equivalent to calling cpufreq_governor_limits(policy) for drivers
with internal governors, but in fact it isn't so, because the
temporary new_policy object is passed to it instead of the updated
policy.

That is a bit confusing, so make cpufreq_set_policy() pass the
updated policy to the driver ->setpolicy() callback.

No intentional changes of behavior.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-02-20 10:18:37 +01:00
Rafael J. Wysocki 2bb4059e07 cpufreq: Fix two debug messages in cpufreq_set_policy()
Remove the redundant "cpufreq:" prefix from two debug messages in
cpufreq_set_policy().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-02-20 10:18:37 +01:00
Rafael J. Wysocki 348a2ec5f5 cpufreq: Reorder and simplify cpufreq_update_policy()
In cpufreq_update_policy(), instead of updating new_policy.cur
separately, which is kind of confusing, because cpufreq_set_policy()
doesn't take that value into account directly anyway, make the copy
of the existing policy after calling cpufreq_update_current_freq().

No intentional changes of behavior.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-02-20 10:18:37 +01:00
Rafael J. Wysocki a0dbb819b8 cpufreq: Add kerneldoc comments for two core functions
Add kerneldoc comments describing cpufreq_set_policy() and
cpufreq_update_policy() as they have not been properly documented
so far and they really need to be documented.

While at it, fix white space around the cpufreq_set_policy() header.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-02-20 10:18:37 +01:00
Viresh Kumar a9a22b570b cpufreq: Replace double NOT (!!) with single NOT (!)
Double NOT (!!) operation is normally done to convert a non-zero value
to 1 and keep zero as is, but that isn't the requirement in this case.
All we wanted was to make sure that only one of the two routines isn't
set, i.e. either both function pointers are set or both are unset.

This can be done with a single NOT (!) operation as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-14 12:06:38 +01:00
Viresh Kumar 91a12e91dc cpufreq: Allow light-weight tear down and bring up of CPUs
The cpufreq core doesn't remove the cpufreq policy anymore on CPU
offline operation, rather that happens when the CPU device gets
unregistered from the kernel. This allows faster recovery when the CPU
comes back online. This is also very useful during system wide
suspend/resume where we offline all non-boot CPUs during suspend and
then bring them back on resume.

This commit takes the same idea a step ahead to allow drivers to do
light weight tear-down and bring-up during CPU offline and online
operations.

A new set of callbacks is introduced, online/offline(). online() gets
called when the first CPU of an inactive policy is brought up and
offline() gets called when all the CPUs of a policy are offlined.

The existing init/exit() callback get called on policy
creation/destruction. They also get called instead of online/offline()
callbacks if the online/offline() callbacks aren't provided.

This also moves around some code to get executed only for the new-policy
case going forward.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-12 23:47:42 +01:00
Amit Kucheria 5c238a8b59 cpufreq: Auto-register the driver as a thermal cooling device if asked
All cpufreq drivers do similar things to register as a cooling device.
Provide a cpufreq driver flag so drivers can just ask the cpufreq core
to register the cooling device on their behalf. This allows us to get
rid of duplicated code in the drivers.

In order to allow this, we add a struct thermal_cooling_device pointer
to struct cpufreq_policy so that drivers don't need to store it in a
private data structure.

Suggested-by: Stephen Boyd <swboyd@chromium.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-30 23:02:26 +01:00
Viresh Kumar 625c85a62c cpufreq: Use struct kobj_attribute instead of struct global_attr
The cpufreq_global_kobject is created using kobject_create_and_add()
helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
routines are set to kobj_attr_show() and kobj_attr_store().

These routines pass struct kobj_attribute as an argument to the
show/store callbacks. But all the cpufreq files created using the
cpufreq_global_kobject expect the argument to be of type struct
attribute. Things work fine currently as no one accesses the "attr"
argument. We may not see issues even if the argument is used, as struct
kobj_attribute has struct attribute as its first element and so they
will both get same address.

But this is logically incorrect and we should rather use struct
kobj_attribute instead of struct global_attr in the cpufreq core and
drivers and the show/store callbacks should take struct kobj_attribute
as argument instead.

This bug is caught using CFI CLANG builds in android kernel which
catches mismatch in function prototypes for such callbacks.

Reported-by: Donghee Han <dh.han@samsung.com>
Reported-by: Sangkyu Kim <skwith.kim@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-29 11:44:30 +01:00
Viresh Kumar 21469df467 cpufreq: Don't update new_policy on failures
The local variable "new_policy" hasn't been used in the error path of
cpufreq_online() since commit f9f41e3ef9 (cpufreq: Remove policy
create/remove notifiers).  Don't update it in that error path.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-15 22:57:04 +01:00
Rafael J. Wysocki 343e60e52a Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-sleep'
* pm-cpuidle:
  doc: trace: fix reference to cpuidle documentation file
  cpuidle / Documentation: Update cpuidle MAINTAINERS entry

* pm-cpufreq:
  cpufreq: scmi: Fix frequency invariance in slow path
  cpufreq: check if policy is inactive early in __cpufreq_get()
  cpufreq: scpi/scmi: Fix freeing of dynamic OPPs
  cpufreq / Documentation: Update cpufreq MAINTAINERS entry

* pm-sleep:
  PM: sleep: call devfreq suspend/resume
2019-01-11 10:09:51 +01:00
Sudeep Holla 2f66196208 cpufreq: check if policy is inactive early in __cpufreq_get()
cpuinfo_cur_freq gets current CPU frequency as detected by hardware
while scaling_cur_freq last known CPU frequency. Some platforms may not
allow checking the CPU frequency of an offline CPU or the associated
resources may have been released via cpufreq_exit when the CPU gets
offlined, in which case the policy would have been invalidated already.
If we attempt to get current frequency from the hardware, it may result
in hang or crash.

For example on Juno, I see:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
[0000000000000188] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 5 PID: 4202 Comm: cat Not tainted 4.20.0-08251-ga0f2c0318a15-dirty #87
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : scmi_cpufreq_get_rate+0x34/0xb0
lr : scmi_cpufreq_get_rate+0x34/0xb0
Call trace:
 scmi_cpufreq_get_rate+0x34/0xb0
 __cpufreq_get+0x34/0xc0
 show_cpuinfo_cur_freq+0x24/0x78
 show+0x40/0x60
 sysfs_kf_seq_show+0xc0/0x148
 kernfs_seq_show+0x44/0x50
 seq_read+0xd4/0x480
 kernfs_fop_read+0x15c/0x208
 __vfs_read+0x60/0x188
 vfs_read+0x94/0x150
 ksys_read+0x6c/0xd8
 __arm64_sys_read+0x24/0x30
 el0_svc_common+0x78/0x100
 el0_svc_handler+0x38/0x78
 el0_svc+0x8/0xc
---[ end trace 3d1024e58f77f6b2 ]---

So fix the issue by checking if the policy is invalid early in
__cpufreq_get before attempting to get the current frequency.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-08 10:49:50 +01:00
Quentin Perret 531b5c9f5c sched/topology: Make Energy Aware Scheduling depend on schedutil
Energy Aware Scheduling (EAS) is designed with the assumption that
frequencies of CPUs follow their utilization value. When using a CPUFreq
governor other than schedutil, the chances of this assumption being true
are small, if any. When schedutil is being used, EAS' predictions are at
least consistent with the frequency requests. Although those requests
have no guarantees to be honored by the hardware, they should at least
guide DVFS in the right direction and provide some hope in regards to the
EAS model being accurate.

To make sure EAS is only used in a sane configuration, create a strong
dependency on schedutil being used. Since having sugov compiled-in does
not provide that guarantee, make CPUFreq call a scheduler function on
governor changes hence letting it rebuild the scheduling domains, check
the governors of the online CPUs, and enable/disable EAS accordingly.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-9-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:17:00 +01:00
Igor Stoppa 0e7ea2f3b0 cpufreq: remove unnecessary unlikely()
WARN_ON() already contains an unlikely(), so it's not necessary to wrap it
into another.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-09-10 12:04:42 +02:00
Waiman Long 9b3d9bb3e4 cpufreq: Fix a circular lock dependency problem
With lockdep turned on, the following circular lock dependency problem
was reported:

[   57.470040] ======================================================
[   57.502900] WARNING: possible circular locking dependency detected
[   57.535208] 4.18.0-0.rc3.1.el8+7.x86_64+debug #1 Tainted: G
[   57.577761] ------------------------------------------------------
[   57.609714] tuned/1505 is trying to acquire lock:
[   57.633808] 00000000559deec5 (cpu_hotplug_lock.rw_sem){++++}, at: store+0x27/0x120
[   57.672880]
[   57.672880] but task is already holding lock:
[   57.702184] 000000002136ca64 (kn->count#118){++++}, at: kernfs_fop_write+0x1d0/0x410
[   57.742176]
[   57.742176] which lock already depends on the new lock.
[   57.742176]
[   57.785220]
[   57.785220] the existing dependency chain (in reverse order) is:
    :
[   58.932512] other info that might help us debug this:
[   58.932512]
[   58.973344] Chain exists of:
[   58.973344]   cpu_hotplug_lock.rw_sem --> subsys mutex#5 --> kn->count#118
[   58.973344]
[   59.030795]  Possible unsafe locking scenario:
[   59.030795]
[   59.061248]        CPU0                    CPU1
[   59.085377]        ----                    ----
[   59.108160]   lock(kn->count#118);
[   59.124935]                                lock(subsys mutex#5);
[   59.156330]                                lock(kn->count#118);
[   59.186088]   lock(cpu_hotplug_lock.rw_sem);
[   59.208541]
[   59.208541]  *** DEADLOCK ***

In the cpufreq_register_driver() function, the lock sequence is:

  cpus_read_lock --> kn->count

For the cpufreq sysfs store method, the lock sequence is:

  kn->count --> cpus_read_lock

These sequences are actually safe as they are taking a share lock on
cpu_hotplug_lock. However, the current lockdep code doesn't check for
share locking when detecting circular lock dependency.  Fixing that
could be a substantial effort.

Instead, we can work around this problem by using cpus_read_trylock()
in the store method which is much simpler. The chance of not getting
the read lock is very small. If that happens, the userspace application
that writes the sysfs file will get an error.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-07-26 10:37:36 +02:00
Ruchi Kandoi 601b218568 cpufreq: trace frequency limits change
systrace used for tracing for Android systems has carried a patch for
many years in the Android tree that traces when the cpufreq limits
change.  With the help of this information, systrace can know when the
policy limits change and can visually display the data. Lets add
upstream support for the same.

Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-07-26 10:17:47 +02:00