Merge branches 'pm-cpuidle' and 'pm-em'
* pm-cpuidle: cpuidle: Select polling interval based on a c-state with a longer target residency cpuidle: psci: Enable suspend-to-idle for PSCI OSI mode PM: domains: Enable dev_pm_genpd_suspend|resume() for suspend-to-idle PM: domains: Rename pm_genpd_syscore_poweroff|poweron() * pm-em: PM / EM: Micro optimization in em_cpu_energy PM: EM: Update Energy Model with new flag indicating power scale PM: EM: update the comments related to power scale PM: EM: Clarify abstract scale usage for power values in Energy Model
This commit is contained in:
commit
4c5744a0c4
|
@ -71,7 +71,9 @@ to the speed-grade of the silicon. `sustainable_power` is therefore
|
|||
simply an estimate, and may be tuned to affect the aggressiveness of
|
||||
the thermal ramp. For reference, the sustainable power of a 4" phone
|
||||
is typically 2000mW, while on a 10" tablet is around 4500mW (may vary
|
||||
depending on screen size).
|
||||
depending on screen size). It is possible to have the power value
|
||||
expressed in an abstract scale. The sustained power should be aligned
|
||||
to the scale used by the related cooling devices.
|
||||
|
||||
If you are using device tree, do add it as a property of the
|
||||
thermal-zone. For example::
|
||||
|
@ -269,3 +271,11 @@ won't be very good. Note that this is not particular to this
|
|||
governor, step-wise will also misbehave if you call its throttle()
|
||||
faster than the normal thermal framework tick (due to interrupts for
|
||||
example) as it will overreact.
|
||||
|
||||
Energy Model requirements
|
||||
=========================
|
||||
|
||||
Another important thing is the consistent scale of the power values
|
||||
provided by the cooling devices. All of the cooling devices in a single
|
||||
thermal zone should have power values reported either in milli-Watts
|
||||
or scaled to the same 'abstract scale'.
|
||||
|
|
|
@ -20,6 +20,21 @@ possible source of information on its own, the EM framework intervenes as an
|
|||
abstraction layer which standardizes the format of power cost tables in the
|
||||
kernel, hence enabling to avoid redundant work.
|
||||
|
||||
The power values might be expressed in milli-Watts or in an 'abstract scale'.
|
||||
Multiple subsystems might use the EM and it is up to the system integrator to
|
||||
check that the requirements for the power value scale types are met. An example
|
||||
can be found in the Energy-Aware Scheduler documentation
|
||||
Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or
|
||||
powercap power values expressed in an 'abstract scale' might cause issues.
|
||||
These subsystems are more interested in estimation of power used in the past,
|
||||
thus the real milli-Watts might be needed. An example of these requirements can
|
||||
be found in the Intelligent Power Allocation in
|
||||
Documentation/driver-api/thermal/power_allocator.rst.
|
||||
Kernel subsystems might implement automatic detection to check whether EM
|
||||
registered devices have inconsistent scale (based on EM internal flag).
|
||||
Important thing to keep in mind is that when the power values are expressed in
|
||||
an 'abstract scale' deriving real energy in milli-Joules would not be possible.
|
||||
|
||||
The figure below depicts an example of drivers (Arm-specific here, but the
|
||||
approach is applicable to any architecture) providing power costs to the EM
|
||||
framework, and interested clients reading the data from it::
|
||||
|
@ -73,7 +88,7 @@ Drivers are expected to register performance domains into the EM framework by
|
|||
calling the following API::
|
||||
|
||||
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
|
||||
struct em_data_callback *cb, cpumask_t *cpus);
|
||||
struct em_data_callback *cb, cpumask_t *cpus, bool milliwatts);
|
||||
|
||||
Drivers must provide a callback function returning <frequency, power> tuples
|
||||
for each performance state. The callback function provided by the driver is free
|
||||
|
@ -81,6 +96,10 @@ to fetch data from any relevant location (DT, firmware, ...), and by any mean
|
|||
deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
|
||||
performance domains using cpumask. For other devices than CPUs the last
|
||||
argument must be set to NULL.
|
||||
The last argument 'milliwatts' is important to set with correct value. Kernel
|
||||
subsystems which use EM might rely on this flag to check if all EM devices use
|
||||
the same scale. If there are different scales, these subsystems might decide
|
||||
to: return warning/error, stop working or panic.
|
||||
See Section 3. for an example of driver implementing this
|
||||
callback, and kernel/power/energy_model.c for further documentation on this
|
||||
API.
|
||||
|
@ -156,7 +175,8 @@ EM framework::
|
|||
37 nr_opp = foo_get_nr_opp(policy);
|
||||
38
|
||||
39 /* And register the new performance domain */
|
||||
40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
|
||||
41
|
||||
42 return 0;
|
||||
43 }
|
||||
40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
|
||||
41 true);
|
||||
42
|
||||
43 return 0;
|
||||
44 }
|
||||
|
|
|
@ -350,6 +350,11 @@ independent EM framework in Documentation/power/energy-model.rst.
|
|||
Please also note that the scheduling domains need to be re-built after the
|
||||
EM has been registered in order to start EAS.
|
||||
|
||||
EAS uses the EM to make a forecasting decision on energy usage and thus it is
|
||||
more focused on the difference when checking possible options for task
|
||||
placement. For EAS it doesn't matter whether the EM power values are expressed
|
||||
in milli-Watts or in an 'abstract scale'.
|
||||
|
||||
|
||||
6.3 - Energy Model complexity
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
|
|
@ -1363,41 +1363,60 @@ static void genpd_complete(struct device *dev)
|
|||
genpd_unlock(genpd);
|
||||
}
|
||||
|
||||
/**
|
||||
* genpd_syscore_switch - Switch power during system core suspend or resume.
|
||||
* @dev: Device that normally is marked as "always on" to switch power for.
|
||||
*
|
||||
* This routine may only be called during the system core (syscore) suspend or
|
||||
* resume phase for devices whose "always on" flags are set.
|
||||
*/
|
||||
static void genpd_syscore_switch(struct device *dev, bool suspend)
|
||||
static void genpd_switch_state(struct device *dev, bool suspend)
|
||||
{
|
||||
struct generic_pm_domain *genpd;
|
||||
bool use_lock;
|
||||
|
||||
genpd = dev_to_genpd_safe(dev);
|
||||
if (!genpd)
|
||||
return;
|
||||
|
||||
use_lock = genpd_is_irq_safe(genpd);
|
||||
|
||||
if (use_lock)
|
||||
genpd_lock(genpd);
|
||||
|
||||
if (suspend) {
|
||||
genpd->suspended_count++;
|
||||
genpd_sync_power_off(genpd, false, 0);
|
||||
genpd_sync_power_off(genpd, use_lock, 0);
|
||||
} else {
|
||||
genpd_sync_power_on(genpd, false, 0);
|
||||
genpd_sync_power_on(genpd, use_lock, 0);
|
||||
genpd->suspended_count--;
|
||||
}
|
||||
|
||||
if (use_lock)
|
||||
genpd_unlock(genpd);
|
||||
}
|
||||
|
||||
void pm_genpd_syscore_poweroff(struct device *dev)
|
||||
/**
|
||||
* dev_pm_genpd_suspend - Synchronously try to suspend the genpd for @dev
|
||||
* @dev: The device that is attached to the genpd, that can be suspended.
|
||||
*
|
||||
* This routine should typically be called for a device that needs to be
|
||||
* suspended during the syscore suspend phase. It may also be called during
|
||||
* suspend-to-idle to suspend a corresponding CPU device that is attached to a
|
||||
* genpd.
|
||||
*/
|
||||
void dev_pm_genpd_suspend(struct device *dev)
|
||||
{
|
||||
genpd_syscore_switch(dev, true);
|
||||
genpd_switch_state(dev, true);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(pm_genpd_syscore_poweroff);
|
||||
EXPORT_SYMBOL_GPL(dev_pm_genpd_suspend);
|
||||
|
||||
void pm_genpd_syscore_poweron(struct device *dev)
|
||||
/**
|
||||
* dev_pm_genpd_resume - Synchronously try to resume the genpd for @dev
|
||||
* @dev: The device that is attached to the genpd, which needs to be resumed.
|
||||
*
|
||||
* This routine should typically be called for a device that needs to be resumed
|
||||
* during the syscore resume phase. It may also be called during suspend-to-idle
|
||||
* to resume a corresponding CPU device that is attached to a genpd.
|
||||
*/
|
||||
void dev_pm_genpd_resume(struct device *dev)
|
||||
{
|
||||
genpd_syscore_switch(dev, false);
|
||||
genpd_switch_state(dev, false);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(pm_genpd_syscore_poweron);
|
||||
EXPORT_SYMBOL_GPL(dev_pm_genpd_resume);
|
||||
|
||||
#else /* !CONFIG_PM_SLEEP */
|
||||
|
||||
|
|
|
@ -658,7 +658,7 @@ static void sh_cmt_clocksource_suspend(struct clocksource *cs)
|
|||
return;
|
||||
|
||||
sh_cmt_stop(ch, FLAG_CLOCKSOURCE);
|
||||
pm_genpd_syscore_poweroff(&ch->cmt->pdev->dev);
|
||||
dev_pm_genpd_suspend(&ch->cmt->pdev->dev);
|
||||
}
|
||||
|
||||
static void sh_cmt_clocksource_resume(struct clocksource *cs)
|
||||
|
@ -668,7 +668,7 @@ static void sh_cmt_clocksource_resume(struct clocksource *cs)
|
|||
if (!ch->cs_enabled)
|
||||
return;
|
||||
|
||||
pm_genpd_syscore_poweron(&ch->cmt->pdev->dev);
|
||||
dev_pm_genpd_resume(&ch->cmt->pdev->dev);
|
||||
sh_cmt_start(ch, FLAG_CLOCKSOURCE);
|
||||
}
|
||||
|
||||
|
@ -760,7 +760,7 @@ static void sh_cmt_clock_event_suspend(struct clock_event_device *ced)
|
|||
{
|
||||
struct sh_cmt_channel *ch = ced_to_sh_cmt(ced);
|
||||
|
||||
pm_genpd_syscore_poweroff(&ch->cmt->pdev->dev);
|
||||
dev_pm_genpd_suspend(&ch->cmt->pdev->dev);
|
||||
clk_unprepare(ch->cmt->clk);
|
||||
}
|
||||
|
||||
|
@ -769,7 +769,7 @@ static void sh_cmt_clock_event_resume(struct clock_event_device *ced)
|
|||
struct sh_cmt_channel *ch = ced_to_sh_cmt(ced);
|
||||
|
||||
clk_prepare(ch->cmt->clk);
|
||||
pm_genpd_syscore_poweron(&ch->cmt->pdev->dev);
|
||||
dev_pm_genpd_resume(&ch->cmt->pdev->dev);
|
||||
}
|
||||
|
||||
static int sh_cmt_register_clockevent(struct sh_cmt_channel *ch,
|
||||
|
|
|
@ -297,12 +297,12 @@ static int sh_mtu2_clock_event_set_periodic(struct clock_event_device *ced)
|
|||
|
||||
static void sh_mtu2_clock_event_suspend(struct clock_event_device *ced)
|
||||
{
|
||||
pm_genpd_syscore_poweroff(&ced_to_sh_mtu2(ced)->mtu->pdev->dev);
|
||||
dev_pm_genpd_suspend(&ced_to_sh_mtu2(ced)->mtu->pdev->dev);
|
||||
}
|
||||
|
||||
static void sh_mtu2_clock_event_resume(struct clock_event_device *ced)
|
||||
{
|
||||
pm_genpd_syscore_poweron(&ced_to_sh_mtu2(ced)->mtu->pdev->dev);
|
||||
dev_pm_genpd_resume(&ced_to_sh_mtu2(ced)->mtu->pdev->dev);
|
||||
}
|
||||
|
||||
static void sh_mtu2_register_clockevent(struct sh_mtu2_channel *ch,
|
||||
|
|
|
@ -292,7 +292,7 @@ static void sh_tmu_clocksource_suspend(struct clocksource *cs)
|
|||
|
||||
if (--ch->enable_count == 0) {
|
||||
__sh_tmu_disable(ch);
|
||||
pm_genpd_syscore_poweroff(&ch->tmu->pdev->dev);
|
||||
dev_pm_genpd_suspend(&ch->tmu->pdev->dev);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -304,7 +304,7 @@ static void sh_tmu_clocksource_resume(struct clocksource *cs)
|
|||
return;
|
||||
|
||||
if (ch->enable_count++ == 0) {
|
||||
pm_genpd_syscore_poweron(&ch->tmu->pdev->dev);
|
||||
dev_pm_genpd_resume(&ch->tmu->pdev->dev);
|
||||
__sh_tmu_enable(ch);
|
||||
}
|
||||
}
|
||||
|
@ -394,12 +394,12 @@ static int sh_tmu_clock_event_next(unsigned long delta,
|
|||
|
||||
static void sh_tmu_clock_event_suspend(struct clock_event_device *ced)
|
||||
{
|
||||
pm_genpd_syscore_poweroff(&ced_to_sh_tmu(ced)->tmu->pdev->dev);
|
||||
dev_pm_genpd_suspend(&ced_to_sh_tmu(ced)->tmu->pdev->dev);
|
||||
}
|
||||
|
||||
static void sh_tmu_clock_event_resume(struct clock_event_device *ced)
|
||||
{
|
||||
pm_genpd_syscore_poweron(&ced_to_sh_tmu(ced)->tmu->pdev->dev);
|
||||
dev_pm_genpd_resume(&ced_to_sh_tmu(ced)->tmu->pdev->dev);
|
||||
}
|
||||
|
||||
static void sh_tmu_register_clockevent(struct sh_tmu_channel *ch,
|
||||
|
|
|
@ -327,6 +327,8 @@ struct device *psci_dt_attach_cpu(int cpu)
|
|||
if (cpu_online(cpu))
|
||||
pm_runtime_get_sync(dev);
|
||||
|
||||
dev_pm_syscore_device(dev, true);
|
||||
|
||||
return dev;
|
||||
}
|
||||
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
#include <linux/of_device.h>
|
||||
#include <linux/platform_device.h>
|
||||
#include <linux/psci.h>
|
||||
#include <linux/pm_domain.h>
|
||||
#include <linux/pm_runtime.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/string.h>
|
||||
|
@ -52,8 +53,9 @@ static inline int psci_enter_state(int idx, u32 state)
|
|||
return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
|
||||
}
|
||||
|
||||
static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
|
||||
struct cpuidle_driver *drv, int idx)
|
||||
static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
|
||||
struct cpuidle_driver *drv, int idx,
|
||||
bool s2idle)
|
||||
{
|
||||
struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
|
||||
u32 *states = data->psci_states;
|
||||
|
@ -66,7 +68,12 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
|
|||
return -1;
|
||||
|
||||
/* Do runtime PM to manage a hierarchical CPU toplogy. */
|
||||
RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));
|
||||
rcu_irq_enter_irqson();
|
||||
if (s2idle)
|
||||
dev_pm_genpd_suspend(pd_dev);
|
||||
else
|
||||
pm_runtime_put_sync_suspend(pd_dev);
|
||||
rcu_irq_exit_irqson();
|
||||
|
||||
state = psci_get_domain_state();
|
||||
if (!state)
|
||||
|
@ -74,7 +81,12 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
|
|||
|
||||
ret = psci_cpu_suspend_enter(state) ? -1 : idx;
|
||||
|
||||
RCU_NONIDLE(pm_runtime_get_sync(pd_dev));
|
||||
rcu_irq_enter_irqson();
|
||||
if (s2idle)
|
||||
dev_pm_genpd_resume(pd_dev);
|
||||
else
|
||||
pm_runtime_get_sync(pd_dev);
|
||||
rcu_irq_exit_irqson();
|
||||
|
||||
cpu_pm_exit();
|
||||
|
||||
|
@ -83,6 +95,19 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
|
|||
return ret;
|
||||
}
|
||||
|
||||
static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
|
||||
struct cpuidle_driver *drv, int idx)
|
||||
{
|
||||
return __psci_enter_domain_idle_state(dev, drv, idx, false);
|
||||
}
|
||||
|
||||
static int psci_enter_s2idle_domain_idle_state(struct cpuidle_device *dev,
|
||||
struct cpuidle_driver *drv,
|
||||
int idx)
|
||||
{
|
||||
return __psci_enter_domain_idle_state(dev, drv, idx, true);
|
||||
}
|
||||
|
||||
static int psci_idle_cpuhp_up(unsigned int cpu)
|
||||
{
|
||||
struct device *pd_dev = __this_cpu_read(psci_cpuidle_data.dev);
|
||||
|
@ -170,6 +195,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv,
|
|||
* deeper states.
|
||||
*/
|
||||
drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
|
||||
drv->states[state_count - 1].enter_s2idle = psci_enter_s2idle_domain_idle_state;
|
||||
psci_cpuidle_use_cpuhp = true;
|
||||
|
||||
return 0;
|
||||
|
|
|
@ -368,6 +368,19 @@ void cpuidle_reflect(struct cpuidle_device *dev, int index)
|
|||
cpuidle_curr_governor->reflect(dev, index);
|
||||
}
|
||||
|
||||
/*
|
||||
* Min polling interval of 10usec is a guess. It is assuming that
|
||||
* for most users, the time for a single ping-pong workload like
|
||||
* perf bench pipe would generally complete within 10usec but
|
||||
* this is hardware dependant. Actual time can be estimated with
|
||||
*
|
||||
* perf bench sched pipe -l 10000
|
||||
*
|
||||
* Run multiple times to avoid cpufreq effects.
|
||||
*/
|
||||
#define CPUIDLE_POLL_MIN 10000
|
||||
#define CPUIDLE_POLL_MAX (TICK_NSEC / 16)
|
||||
|
||||
/**
|
||||
* cpuidle_poll_time - return amount of time to poll for,
|
||||
* governors can override dev->poll_limit_ns if necessary
|
||||
|
@ -382,15 +395,23 @@ u64 cpuidle_poll_time(struct cpuidle_driver *drv,
|
|||
int i;
|
||||
u64 limit_ns;
|
||||
|
||||
BUILD_BUG_ON(CPUIDLE_POLL_MIN > CPUIDLE_POLL_MAX);
|
||||
|
||||
if (dev->poll_limit_ns)
|
||||
return dev->poll_limit_ns;
|
||||
|
||||
limit_ns = TICK_NSEC;
|
||||
limit_ns = CPUIDLE_POLL_MAX;
|
||||
for (i = 1; i < drv->state_count; i++) {
|
||||
u64 state_limit;
|
||||
|
||||
if (dev->states_usage[i].disable)
|
||||
continue;
|
||||
|
||||
limit_ns = drv->states[i].target_residency_ns;
|
||||
state_limit = drv->states[i].target_residency_ns;
|
||||
if (state_limit < CPUIDLE_POLL_MIN)
|
||||
continue;
|
||||
|
||||
limit_ns = min_t(u64, state_limit, CPUIDLE_POLL_MAX);
|
||||
break;
|
||||
}
|
||||
|
||||
|
|
|
@ -13,9 +13,8 @@
|
|||
/**
|
||||
* em_perf_state - Performance state of a performance domain
|
||||
* @frequency: The frequency in KHz, for consistency with CPUFreq
|
||||
* @power: The power consumed at this level, in milli-watts (by 1 CPU or
|
||||
by a registered device). It can be a total power: static and
|
||||
dynamic.
|
||||
* @power: The power consumed at this level (by 1 CPU or by a registered
|
||||
* device). It can be a total power: static and dynamic.
|
||||
* @cost: The cost coefficient associated with this level, used during
|
||||
* energy calculation. Equal to: power * max_frequency / frequency
|
||||
*/
|
||||
|
@ -58,7 +57,7 @@ struct em_data_callback {
|
|||
/**
|
||||
* active_power() - Provide power at the next performance state of
|
||||
* a device
|
||||
* @power : Active power at the performance state in mW
|
||||
* @power : Active power at the performance state
|
||||
* (modified)
|
||||
* @freq : Frequency at the performance state in kHz
|
||||
* (modified)
|
||||
|
@ -69,8 +68,8 @@ struct em_data_callback {
|
|||
* and frequency.
|
||||
*
|
||||
* In case of CPUs, the power is the one of a single CPU in the domain,
|
||||
* expressed in milli-watts. It is expected to fit in the
|
||||
* [0, EM_MAX_POWER] range.
|
||||
* expressed in milli-Watts or an abstract scale. It is expected to
|
||||
* fit in the [0, EM_MAX_POWER] range.
|
||||
*
|
||||
* Return 0 on success.
|
||||
*/
|
||||
|
@ -107,6 +106,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
|
|||
struct em_perf_state *ps;
|
||||
int i, cpu;
|
||||
|
||||
if (!sum_util)
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* In order to predict the performance state, map the utilization of
|
||||
* the most utilized CPU of the performance domain to a requested
|
||||
|
|
|
@ -280,11 +280,11 @@ static inline int dev_pm_genpd_remove_notifier(struct device *dev)
|
|||
#endif
|
||||
|
||||
#ifdef CONFIG_PM_GENERIC_DOMAINS_SLEEP
|
||||
void pm_genpd_syscore_poweroff(struct device *dev);
|
||||
void pm_genpd_syscore_poweron(struct device *dev);
|
||||
void dev_pm_genpd_suspend(struct device *dev);
|
||||
void dev_pm_genpd_resume(struct device *dev);
|
||||
#else
|
||||
static inline void pm_genpd_syscore_poweroff(struct device *dev) {}
|
||||
static inline void pm_genpd_syscore_poweron(struct device *dev) {}
|
||||
static inline void dev_pm_genpd_suspend(struct device *dev) {}
|
||||
static inline void dev_pm_genpd_resume(struct device *dev) {}
|
||||
#endif
|
||||
|
||||
/* OF PM domain providers */
|
||||
|
|
|
@ -143,7 +143,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
|
|||
|
||||
/*
|
||||
* The power returned by active_state() is expected to be
|
||||
* positive, in milli-watts and to fit into 16 bits.
|
||||
* positive and to fit into 16 bits.
|
||||
*/
|
||||
if (!power || power > EM_MAX_POWER) {
|
||||
dev_err(dev, "EM: invalid power: %lu\n",
|
||||
|
|
Loading…
Reference in New Issue