Merge branches 'powercap' and 'pm-misc'
* powercap: powercap: intel_rapl: Use topology interface in rapl_init_domains() powercap: intel_rapl: Use topology interface in rapl_add_package() powercap/intel_rapl: add support for AlderLake Mobile powercap/drivers/dtpm: Fix size of object being allocated powercap/drivers/dtpm: Fix an IS_ERR() vs NULL check powercap/drivers/dtpm: Fix some missing unlock bugs powercap/drivers/dtpm: Fix a double shift bug powercap/drivers/dtpm: Fix __udivdi3 and __aeabi_uldivmod unresolved symbols powercap/drivers/dtpm: Add CPU energy model based support powercap/drivers/dtpm: Add API for dynamic thermal power management Documentation/powercap/dtpm: Add documentation for dtpm units: Add Watt units * pm-misc: PM: Kconfig: remove unneeded "default n" options PM: EM: update Kconfig description and drop "default n" option
This commit is contained in:
commit
a9a939cb34
|
@ -30,6 +30,7 @@ Power Management
|
|||
userland-swsusp
|
||||
|
||||
powercap/powercap
|
||||
powercap/dtpm
|
||||
|
||||
regulator/consumer
|
||||
regulator/design
|
||||
|
|
|
@ -0,0 +1,212 @@
|
|||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========================================
|
||||
Dynamic Thermal Power Management framework
|
||||
==========================================
|
||||
|
||||
On the embedded world, the complexity of the SoC leads to an
|
||||
increasing number of hotspots which need to be monitored and mitigated
|
||||
as a whole in order to prevent the temperature to go above the
|
||||
normative and legally stated 'skin temperature'.
|
||||
|
||||
Another aspect is to sustain the performance for a given power budget,
|
||||
for example virtual reality where the user can feel dizziness if the
|
||||
performance is capped while a big CPU is processing something else. Or
|
||||
reduce the battery charging because the dissipated power is too high
|
||||
compared with the power consumed by other devices.
|
||||
|
||||
The user space is the most adequate place to dynamically act on the
|
||||
different devices by limiting their power given an application
|
||||
profile: it has the knowledge of the platform.
|
||||
|
||||
The Dynamic Thermal Power Management (DTPM) is a technique acting on
|
||||
the device power by limiting and/or balancing a power budget among
|
||||
different devices.
|
||||
|
||||
The DTPM framework provides an unified interface to act on the
|
||||
device power.
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
The DTPM framework relies on the powercap framework to create the
|
||||
powercap entries in the sysfs directory and implement the backend
|
||||
driver to do the connection with the power manageable device.
|
||||
|
||||
The DTPM is a tree representation describing the power constraints
|
||||
shared between devices, not their physical positions.
|
||||
|
||||
The nodes of the tree are a virtual description aggregating the power
|
||||
characteristics of the children nodes and their power limitations.
|
||||
|
||||
The leaves of the tree are the real power manageable devices.
|
||||
|
||||
For instance::
|
||||
|
||||
SoC
|
||||
|
|
||||
`-- pkg
|
||||
|
|
||||
|-- pd0 (cpu0-3)
|
||||
|
|
||||
`-- pd1 (cpu4-5)
|
||||
|
||||
The pkg power will be the sum of pd0 and pd1 power numbers::
|
||||
|
||||
SoC (400mW - 3100mW)
|
||||
|
|
||||
`-- pkg (400mW - 3100mW)
|
||||
|
|
||||
|-- pd0 (100mW - 700mW)
|
||||
|
|
||||
`-- pd1 (300mW - 2400mW)
|
||||
|
||||
When the nodes are inserted in the tree, their power characteristics are propagated to the parents::
|
||||
|
||||
SoC (600mW - 5900mW)
|
||||
|
|
||||
|-- pkg (400mW - 3100mW)
|
||||
| |
|
||||
| |-- pd0 (100mW - 700mW)
|
||||
| |
|
||||
| `-- pd1 (300mW - 2400mW)
|
||||
|
|
||||
`-- pd2 (200mW - 2800mW)
|
||||
|
||||
Each node have a weight on a 2^10 basis reflecting the percentage of power consumption along the siblings::
|
||||
|
||||
SoC (w=1024)
|
||||
|
|
||||
|-- pkg (w=538)
|
||||
| |
|
||||
| |-- pd0 (w=231)
|
||||
| |
|
||||
| `-- pd1 (w=794)
|
||||
|
|
||||
`-- pd2 (w=486)
|
||||
|
||||
Note the sum of weights at the same level are equal to 1024.
|
||||
|
||||
When a power limitation is applied to a node, then it is distributed along the children given their weights. For example, if we set a power limitation of 3200mW at the 'SoC' root node, the resulting tree will be::
|
||||
|
||||
SoC (w=1024) <--- power_limit = 3200mW
|
||||
|
|
||||
|-- pkg (w=538) --> power_limit = 1681mW
|
||||
| |
|
||||
| |-- pd0 (w=231) --> power_limit = 378mW
|
||||
| |
|
||||
| `-- pd1 (w=794) --> power_limit = 1303mW
|
||||
|
|
||||
`-- pd2 (w=486) --> power_limit = 1519mW
|
||||
|
||||
|
||||
Flat description
|
||||
----------------
|
||||
|
||||
A root node is created and it is the parent of all the nodes. This
|
||||
description is the simplest one and it is supposed to give to user
|
||||
space a flat representation of all the devices supporting the power
|
||||
limitation without any power limitation distribution.
|
||||
|
||||
Hierarchical description
|
||||
------------------------
|
||||
|
||||
The different devices supporting the power limitation are represented
|
||||
hierarchically. There is one root node, all intermediate nodes are
|
||||
grouping the child nodes which can be intermediate nodes also or real
|
||||
devices.
|
||||
|
||||
The intermediate nodes aggregate the power information and allows to
|
||||
set the power limit given the weight of the nodes.
|
||||
|
||||
User space API
|
||||
==============
|
||||
|
||||
As stated in the overview, the DTPM framework is built on top of the
|
||||
powercap framework. Thus the sysfs interface is the same, please refer
|
||||
to the powercap documentation for further details.
|
||||
|
||||
* power_uw: Instantaneous power consumption. If the node is an
|
||||
intermediate node, then the power consumption will be the sum of all
|
||||
children power consumption.
|
||||
|
||||
* max_power_range_uw: The power range resulting of the maximum power
|
||||
minus the minimum power.
|
||||
|
||||
* name: The name of the node. This is implementation dependent. Even
|
||||
if it is not recommended for the user space, several nodes can have
|
||||
the same name.
|
||||
|
||||
* constraint_X_name: The name of the constraint.
|
||||
|
||||
* constraint_X_max_power_uw: The maximum power limit to be applicable
|
||||
to the node.
|
||||
|
||||
* constraint_X_power_limit_uw: The power limit to be applied to the
|
||||
node. If the value contained in constraint_X_max_power_uw is set,
|
||||
the constraint will be removed.
|
||||
|
||||
* constraint_X_time_window_us: The meaning of this file will depend
|
||||
on the constraint number.
|
||||
|
||||
Constraints
|
||||
-----------
|
||||
|
||||
* Constraint 0: The power limitation is immediately applied, without
|
||||
limitation in time.
|
||||
|
||||
Kernel API
|
||||
==========
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
The DTPM framework has no power limiting backend support. It is
|
||||
generic and provides a set of API to let the different drivers to
|
||||
implement the backend part for the power limitation and create the
|
||||
power constraints tree.
|
||||
|
||||
It is up to the platform to provide the initialization function to
|
||||
allocate and link the different nodes of the tree.
|
||||
|
||||
A special macro has the role of declaring a node and the corresponding
|
||||
initialization function via a description structure. This one contains
|
||||
an optional parent field allowing to hook different devices to an
|
||||
already existing tree at boot time.
|
||||
|
||||
For instance::
|
||||
|
||||
struct dtpm_descr my_descr = {
|
||||
.name = "my_name",
|
||||
.init = my_init_func,
|
||||
};
|
||||
|
||||
DTPM_DECLARE(my_descr);
|
||||
|
||||
The nodes of the DTPM tree are described with dtpm structure. The
|
||||
steps to add a new power limitable device is done in three steps:
|
||||
|
||||
* Allocate the dtpm node
|
||||
* Set the power number of the dtpm node
|
||||
* Register the dtpm node
|
||||
|
||||
The registration of the dtpm node is done with the powercap
|
||||
ops. Basically, it must implements the callbacks to get and set the
|
||||
power and the limit.
|
||||
|
||||
Alternatively, if the node to be inserted is an intermediate one, then
|
||||
a simple function to insert it as a future parent is available.
|
||||
|
||||
If a device has its power characteristics changing, then the tree must
|
||||
be updated with the new power numbers and weights.
|
||||
|
||||
Nomenclature
|
||||
------------
|
||||
|
||||
* dtpm_alloc() : Allocate and initialize a dtpm structure
|
||||
|
||||
* dtpm_register() : Add the dtpm node to the tree
|
||||
|
||||
* dtpm_unregister() : Remove the dtpm node from the tree
|
||||
|
||||
* dtpm_update_power() : Update the power characteristics of the dtpm node
|
|
@ -43,4 +43,17 @@ config IDLE_INJECT
|
|||
CPUs for power capping. Idle period can be injected
|
||||
synchronously on a set of specified CPUs or alternatively
|
||||
on a per CPU basis.
|
||||
|
||||
config DTPM
|
||||
bool "Power capping for Dynamic Thermal Power Management"
|
||||
help
|
||||
This enables support for the power capping for the dynamic
|
||||
thermal power management userspace engine.
|
||||
|
||||
config DTPM_CPU
|
||||
bool "Add CPU power capping based on the energy model"
|
||||
depends on DTPM && ENERGY_MODEL
|
||||
help
|
||||
This enables support for CPU power limitation based on
|
||||
energy model.
|
||||
endif
|
||||
|
|
|
@ -1,4 +1,6 @@
|
|||
# SPDX-License-Identifier: GPL-2.0-only
|
||||
obj-$(CONFIG_DTPM) += dtpm.o
|
||||
obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o
|
||||
obj-$(CONFIG_POWERCAP) += powercap_sys.o
|
||||
obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
|
||||
obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
|
||||
|
|
|
@ -0,0 +1,480 @@
|
|||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
/*
|
||||
* Copyright 2020 Linaro Limited
|
||||
*
|
||||
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
|
||||
*
|
||||
* The powercap based Dynamic Thermal Power Management framework
|
||||
* provides to the userspace a consistent API to set the power limit
|
||||
* on some devices.
|
||||
*
|
||||
* DTPM defines the functions to create a tree of constraints. Each
|
||||
* parent node is a virtual description of the aggregation of the
|
||||
* children. It propagates the constraints set at its level to its
|
||||
* children and collect the children power information. The leaves of
|
||||
* the tree are the real devices which have the ability to get their
|
||||
* current power consumption and set their power limit.
|
||||
*/
|
||||
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
||||
|
||||
#include <linux/dtpm.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/powercap.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/mutex.h>
|
||||
|
||||
#define DTPM_POWER_LIMIT_FLAG 0
|
||||
|
||||
static const char *constraint_name[] = {
|
||||
"Instantaneous",
|
||||
};
|
||||
|
||||
static DEFINE_MUTEX(dtpm_lock);
|
||||
static struct powercap_control_type *pct;
|
||||
static struct dtpm *root;
|
||||
|
||||
static int get_time_window_us(struct powercap_zone *pcz, int cid, u64 *window)
|
||||
{
|
||||
return -ENOSYS;
|
||||
}
|
||||
|
||||
static int set_time_window_us(struct powercap_zone *pcz, int cid, u64 window)
|
||||
{
|
||||
return -ENOSYS;
|
||||
}
|
||||
|
||||
static int get_max_power_range_uw(struct powercap_zone *pcz, u64 *max_power_uw)
|
||||
{
|
||||
struct dtpm *dtpm = to_dtpm(pcz);
|
||||
|
||||
mutex_lock(&dtpm_lock);
|
||||
*max_power_uw = dtpm->power_max - dtpm->power_min;
|
||||
mutex_unlock(&dtpm_lock);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int __get_power_uw(struct dtpm *dtpm, u64 *power_uw)
|
||||
{
|
||||
struct dtpm *child;
|
||||
u64 power;
|
||||
int ret = 0;
|
||||
|
||||
if (dtpm->ops) {
|
||||
*power_uw = dtpm->ops->get_power_uw(dtpm);
|
||||
return 0;
|
||||
}
|
||||
|
||||
*power_uw = 0;
|
||||
|
||||
list_for_each_entry(child, &dtpm->children, sibling) {
|
||||
ret = __get_power_uw(child, &power);
|
||||
if (ret)
|
||||
break;
|
||||
*power_uw += power;
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int get_power_uw(struct powercap_zone *pcz, u64 *power_uw)
|
||||
{
|
||||
struct dtpm *dtpm = to_dtpm(pcz);
|
||||
int ret;
|
||||
|
||||
mutex_lock(&dtpm_lock);
|
||||
ret = __get_power_uw(dtpm, power_uw);
|
||||
mutex_unlock(&dtpm_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static void __dtpm_rebalance_weight(struct dtpm *dtpm)
|
||||
{
|
||||
struct dtpm *child;
|
||||
|
||||
list_for_each_entry(child, &dtpm->children, sibling) {
|
||||
|
||||
pr_debug("Setting weight '%d' for '%s'\n",
|
||||
child->weight, child->zone.name);
|
||||
|
||||
child->weight = DIV64_U64_ROUND_CLOSEST(
|
||||
child->power_max * 1024, dtpm->power_max);
|
||||
|
||||
__dtpm_rebalance_weight(child);
|
||||
}
|
||||
}
|
||||
|
||||
static void __dtpm_sub_power(struct dtpm *dtpm)
|
||||
{
|
||||
struct dtpm *parent = dtpm->parent;
|
||||
|
||||
while (parent) {
|
||||
parent->power_min -= dtpm->power_min;
|
||||
parent->power_max -= dtpm->power_max;
|
||||
parent->power_limit -= dtpm->power_limit;
|
||||
parent = parent->parent;
|
||||
}
|
||||
|
||||
__dtpm_rebalance_weight(root);
|
||||
}
|
||||
|
||||
static void __dtpm_add_power(struct dtpm *dtpm)
|
||||
{
|
||||
struct dtpm *parent = dtpm->parent;
|
||||
|
||||
while (parent) {
|
||||
parent->power_min += dtpm->power_min;
|
||||
parent->power_max += dtpm->power_max;
|
||||
parent->power_limit += dtpm->power_limit;
|
||||
parent = parent->parent;
|
||||
}
|
||||
|
||||
__dtpm_rebalance_weight(root);
|
||||
}
|
||||
|
||||
/**
|
||||
* dtpm_update_power - Update the power on the dtpm
|
||||
* @dtpm: a pointer to a dtpm structure to update
|
||||
* @power_min: a u64 representing the new power_min value
|
||||
* @power_max: a u64 representing the new power_max value
|
||||
*
|
||||
* Function to update the power values of the dtpm node specified in
|
||||
* parameter. These new values will be propagated to the tree.
|
||||
*
|
||||
* Return: zero on success, -EINVAL if the values are inconsistent
|
||||
*/
|
||||
int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max)
|
||||
{
|
||||
int ret = 0;
|
||||
|
||||
mutex_lock(&dtpm_lock);
|
||||
|
||||
if (power_min == dtpm->power_min && power_max == dtpm->power_max)
|
||||
goto unlock;
|
||||
|
||||
if (power_max < power_min) {
|
||||
ret = -EINVAL;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
__dtpm_sub_power(dtpm);
|
||||
|
||||
dtpm->power_min = power_min;
|
||||
dtpm->power_max = power_max;
|
||||
if (!test_bit(DTPM_POWER_LIMIT_FLAG, &dtpm->flags))
|
||||
dtpm->power_limit = power_max;
|
||||
|
||||
__dtpm_add_power(dtpm);
|
||||
|
||||
unlock:
|
||||
mutex_unlock(&dtpm_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* dtpm_release_zone - Cleanup when the node is released
|
||||
* @pcz: a pointer to a powercap_zone structure
|
||||
*
|
||||
* Do some housecleaning and update the weight on the tree. The
|
||||
* release will be denied if the node has children. This function must
|
||||
* be called by the specific release callback of the different
|
||||
* backends.
|
||||
*
|
||||
* Return: 0 on success, -EBUSY if there are children
|
||||
*/
|
||||
int dtpm_release_zone(struct powercap_zone *pcz)
|
||||
{
|
||||
struct dtpm *dtpm = to_dtpm(pcz);
|
||||
struct dtpm *parent = dtpm->parent;
|
||||
|
||||
mutex_lock(&dtpm_lock);
|
||||
|
||||
if (!list_empty(&dtpm->children)) {
|
||||
mutex_unlock(&dtpm_lock);
|
||||
return -EBUSY;
|
||||
}
|
||||
|
||||
if (parent)
|
||||
list_del(&dtpm->sibling);
|
||||
|
||||
__dtpm_sub_power(dtpm);
|
||||
|
||||
mutex_unlock(&dtpm_lock);
|
||||
|
||||
if (dtpm->ops)
|
||||
dtpm->ops->release(dtpm);
|
||||
|
||||
kfree(dtpm);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int __get_power_limit_uw(struct dtpm *dtpm, int cid, u64 *power_limit)
|
||||
{
|
||||
*power_limit = dtpm->power_limit;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int get_power_limit_uw(struct powercap_zone *pcz,
|
||||
int cid, u64 *power_limit)
|
||||
{
|
||||
struct dtpm *dtpm = to_dtpm(pcz);
|
||||
int ret;
|
||||
|
||||
mutex_lock(&dtpm_lock);
|
||||
ret = __get_power_limit_uw(dtpm, cid, power_limit);
|
||||
mutex_unlock(&dtpm_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* Set the power limit on the nodes, the power limit is distributed
|
||||
* given the weight of the children.
|
||||
*
|
||||
* The dtpm node lock must be held when calling this function.
|
||||
*/
|
||||
static int __set_power_limit_uw(struct dtpm *dtpm, int cid, u64 power_limit)
|
||||
{
|
||||
struct dtpm *child;
|
||||
int ret = 0;
|
||||
u64 power;
|
||||
|
||||
/*
|
||||
* A max power limitation means we remove the power limit,
|
||||
* otherwise we set a constraint and flag the dtpm node.
|
||||
*/
|
||||
if (power_limit == dtpm->power_max) {
|
||||
clear_bit(DTPM_POWER_LIMIT_FLAG, &dtpm->flags);
|
||||
} else {
|
||||
set_bit(DTPM_POWER_LIMIT_FLAG, &dtpm->flags);
|
||||
}
|
||||
|
||||
pr_debug("Setting power limit for '%s': %llu uW\n",
|
||||
dtpm->zone.name, power_limit);
|
||||
|
||||
/*
|
||||
* Only leaves of the dtpm tree has ops to get/set the power
|
||||
*/
|
||||
if (dtpm->ops) {
|
||||
dtpm->power_limit = dtpm->ops->set_power_uw(dtpm, power_limit);
|
||||
} else {
|
||||
dtpm->power_limit = 0;
|
||||
|
||||
list_for_each_entry(child, &dtpm->children, sibling) {
|
||||
|
||||
/*
|
||||
* Integer division rounding will inevitably
|
||||
* lead to a different min or max value when
|
||||
* set several times. In order to restore the
|
||||
* initial value, we force the child's min or
|
||||
* max power every time if the constraint is
|
||||
* at the boundaries.
|
||||
*/
|
||||
if (power_limit == dtpm->power_max) {
|
||||
power = child->power_max;
|
||||
} else if (power_limit == dtpm->power_min) {
|
||||
power = child->power_min;
|
||||
} else {
|
||||
power = DIV_ROUND_CLOSEST_ULL(
|
||||
power_limit * child->weight, 1024);
|
||||
}
|
||||
|
||||
pr_debug("Setting power limit for '%s': %llu uW\n",
|
||||
child->zone.name, power);
|
||||
|
||||
ret = __set_power_limit_uw(child, cid, power);
|
||||
if (!ret)
|
||||
ret = __get_power_limit_uw(child, cid, &power);
|
||||
|
||||
if (ret)
|
||||
break;
|
||||
|
||||
dtpm->power_limit += power;
|
||||
}
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int set_power_limit_uw(struct powercap_zone *pcz,
|
||||
int cid, u64 power_limit)
|
||||
{
|
||||
struct dtpm *dtpm = to_dtpm(pcz);
|
||||
int ret;
|
||||
|
||||
mutex_lock(&dtpm_lock);
|
||||
|
||||
/*
|
||||
* Don't allow values outside of the power range previously
|
||||
* set when initializing the power numbers.
|
||||
*/
|
||||
power_limit = clamp_val(power_limit, dtpm->power_min, dtpm->power_max);
|
||||
|
||||
ret = __set_power_limit_uw(dtpm, cid, power_limit);
|
||||
|
||||
pr_debug("%s: power limit: %llu uW, power max: %llu uW\n",
|
||||
dtpm->zone.name, dtpm->power_limit, dtpm->power_max);
|
||||
|
||||
mutex_unlock(&dtpm_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static const char *get_constraint_name(struct powercap_zone *pcz, int cid)
|
||||
{
|
||||
return constraint_name[cid];
|
||||
}
|
||||
|
||||
static int get_max_power_uw(struct powercap_zone *pcz, int id, u64 *max_power)
|
||||
{
|
||||
struct dtpm *dtpm = to_dtpm(pcz);
|
||||
|
||||
mutex_lock(&dtpm_lock);
|
||||
*max_power = dtpm->power_max;
|
||||
mutex_unlock(&dtpm_lock);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct powercap_zone_constraint_ops constraint_ops = {
|
||||
.set_power_limit_uw = set_power_limit_uw,
|
||||
.get_power_limit_uw = get_power_limit_uw,
|
||||
.set_time_window_us = set_time_window_us,
|
||||
.get_time_window_us = get_time_window_us,
|
||||
.get_max_power_uw = get_max_power_uw,
|
||||
.get_name = get_constraint_name,
|
||||
};
|
||||
|
||||
static struct powercap_zone_ops zone_ops = {
|
||||
.get_max_power_range_uw = get_max_power_range_uw,
|
||||
.get_power_uw = get_power_uw,
|
||||
.release = dtpm_release_zone,
|
||||
};
|
||||
|
||||
/**
|
||||
* dtpm_alloc - Allocate and initialize a dtpm struct
|
||||
* @name: a string specifying the name of the node
|
||||
*
|
||||
* Return: a struct dtpm pointer, NULL in case of error
|
||||
*/
|
||||
struct dtpm *dtpm_alloc(struct dtpm_ops *ops)
|
||||
{
|
||||
struct dtpm *dtpm;
|
||||
|
||||
dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL);
|
||||
if (dtpm) {
|
||||
INIT_LIST_HEAD(&dtpm->children);
|
||||
INIT_LIST_HEAD(&dtpm->sibling);
|
||||
dtpm->weight = 1024;
|
||||
dtpm->ops = ops;
|
||||
}
|
||||
|
||||
return dtpm;
|
||||
}
|
||||
|
||||
/**
|
||||
* dtpm_unregister - Unregister a dtpm node from the hierarchy tree
|
||||
* @dtpm: a pointer to a dtpm structure corresponding to the node to be removed
|
||||
*
|
||||
* Call the underlying powercap unregister function. That will call
|
||||
* the release callback of the powercap zone.
|
||||
*/
|
||||
void dtpm_unregister(struct dtpm *dtpm)
|
||||
{
|
||||
powercap_unregister_zone(pct, &dtpm->zone);
|
||||
|
||||
pr_info("Unregistered dtpm node '%s'\n", dtpm->zone.name);
|
||||
}
|
||||
|
||||
/**
|
||||
* dtpm_register - Register a dtpm node in the hierarchy tree
|
||||
* @name: a string specifying the name of the node
|
||||
* @dtpm: a pointer to a dtpm structure corresponding to the new node
|
||||
* @parent: a pointer to a dtpm structure corresponding to the parent node
|
||||
*
|
||||
* Create a dtpm node in the tree. If no parent is specified, the node
|
||||
* is the root node of the hierarchy. If the root node already exists,
|
||||
* then the registration will fail. The powercap controller must be
|
||||
* initialized before calling this function.
|
||||
*
|
||||
* The dtpm structure must be initialized with the power numbers
|
||||
* before calling this function.
|
||||
*
|
||||
* Return: zero on success, a negative value in case of error:
|
||||
* -EAGAIN: the function is called before the framework is initialized.
|
||||
* -EBUSY: the root node is already inserted
|
||||
* -EINVAL: * there is no root node yet and @parent is specified
|
||||
* * no all ops are defined
|
||||
* * parent have ops which are reserved for leaves
|
||||
* Other negative values are reported back from the powercap framework
|
||||
*/
|
||||
int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent)
|
||||
{
|
||||
struct powercap_zone *pcz;
|
||||
|
||||
if (!pct)
|
||||
return -EAGAIN;
|
||||
|
||||
if (root && !parent)
|
||||
return -EBUSY;
|
||||
|
||||
if (!root && parent)
|
||||
return -EINVAL;
|
||||
|
||||
if (parent && parent->ops)
|
||||
return -EINVAL;
|
||||
|
||||
if (!dtpm)
|
||||
return -EINVAL;
|
||||
|
||||
if (dtpm->ops && !(dtpm->ops->set_power_uw &&
|
||||
dtpm->ops->get_power_uw &&
|
||||
dtpm->ops->release))
|
||||
return -EINVAL;
|
||||
|
||||
pcz = powercap_register_zone(&dtpm->zone, pct, name,
|
||||
parent ? &parent->zone : NULL,
|
||||
&zone_ops, MAX_DTPM_CONSTRAINTS,
|
||||
&constraint_ops);
|
||||
if (IS_ERR(pcz))
|
||||
return PTR_ERR(pcz);
|
||||
|
||||
mutex_lock(&dtpm_lock);
|
||||
|
||||
if (parent) {
|
||||
list_add_tail(&dtpm->sibling, &parent->children);
|
||||
dtpm->parent = parent;
|
||||
} else {
|
||||
root = dtpm;
|
||||
}
|
||||
|
||||
__dtpm_add_power(dtpm);
|
||||
|
||||
pr_info("Registered dtpm node '%s' / %llu-%llu uW, \n",
|
||||
dtpm->zone.name, dtpm->power_min, dtpm->power_max);
|
||||
|
||||
mutex_unlock(&dtpm_lock);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int __init dtpm_init(void)
|
||||
{
|
||||
struct dtpm_descr **dtpm_descr;
|
||||
|
||||
pct = powercap_register_control_type(NULL, "dtpm", NULL);
|
||||
if (IS_ERR(pct)) {
|
||||
pr_err("Failed to register control type\n");
|
||||
return PTR_ERR(pct);
|
||||
}
|
||||
|
||||
for_each_dtpm_table(dtpm_descr)
|
||||
(*dtpm_descr)->init(*dtpm_descr);
|
||||
|
||||
return 0;
|
||||
}
|
||||
late_initcall(dtpm_init);
|
|
@ -0,0 +1,257 @@
|
|||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
/*
|
||||
* Copyright 2020 Linaro Limited
|
||||
*
|
||||
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
|
||||
*
|
||||
* The DTPM CPU is based on the energy model. It hooks the CPU in the
|
||||
* DTPM tree which in turns update the power number by propagating the
|
||||
* power number from the CPU energy model information to the parents.
|
||||
*
|
||||
* The association between the power and the performance state, allows
|
||||
* to set the power of the CPU at the OPP granularity.
|
||||
*
|
||||
* The CPU hotplug is supported and the power numbers will be updated
|
||||
* if a CPU is hot plugged / unplugged.
|
||||
*/
|
||||
#include <linux/cpumask.h>
|
||||
#include <linux/cpufreq.h>
|
||||
#include <linux/cpuhotplug.h>
|
||||
#include <linux/dtpm.h>
|
||||
#include <linux/energy_model.h>
|
||||
#include <linux/pm_qos.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/units.h>
|
||||
|
||||
static struct dtpm *__parent;
|
||||
|
||||
static DEFINE_PER_CPU(struct dtpm *, dtpm_per_cpu);
|
||||
|
||||
struct dtpm_cpu {
|
||||
struct freq_qos_request qos_req;
|
||||
int cpu;
|
||||
};
|
||||
|
||||
/*
|
||||
* When a new CPU is inserted at hotplug or boot time, add the power
|
||||
* contribution and update the dtpm tree.
|
||||
*/
|
||||
static int power_add(struct dtpm *dtpm, struct em_perf_domain *em)
|
||||
{
|
||||
u64 power_min, power_max;
|
||||
|
||||
power_min = em->table[0].power;
|
||||
power_min *= MICROWATT_PER_MILLIWATT;
|
||||
power_min += dtpm->power_min;
|
||||
|
||||
power_max = em->table[em->nr_perf_states - 1].power;
|
||||
power_max *= MICROWATT_PER_MILLIWATT;
|
||||
power_max += dtpm->power_max;
|
||||
|
||||
return dtpm_update_power(dtpm, power_min, power_max);
|
||||
}
|
||||
|
||||
/*
|
||||
* When a CPU is unplugged, remove its power contribution from the
|
||||
* dtpm tree.
|
||||
*/
|
||||
static int power_sub(struct dtpm *dtpm, struct em_perf_domain *em)
|
||||
{
|
||||
u64 power_min, power_max;
|
||||
|
||||
power_min = em->table[0].power;
|
||||
power_min *= MICROWATT_PER_MILLIWATT;
|
||||
power_min = dtpm->power_min - power_min;
|
||||
|
||||
power_max = em->table[em->nr_perf_states - 1].power;
|
||||
power_max *= MICROWATT_PER_MILLIWATT;
|
||||
power_max = dtpm->power_max - power_max;
|
||||
|
||||
return dtpm_update_power(dtpm, power_min, power_max);
|
||||
}
|
||||
|
||||
static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
|
||||
{
|
||||
struct dtpm_cpu *dtpm_cpu = dtpm->private;
|
||||
struct em_perf_domain *pd;
|
||||
struct cpumask cpus;
|
||||
unsigned long freq;
|
||||
u64 power;
|
||||
int i, nr_cpus;
|
||||
|
||||
pd = em_cpu_get(dtpm_cpu->cpu);
|
||||
|
||||
cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus));
|
||||
|
||||
nr_cpus = cpumask_weight(&cpus);
|
||||
|
||||
for (i = 0; i < pd->nr_perf_states; i++) {
|
||||
|
||||
power = pd->table[i].power * MICROWATT_PER_MILLIWATT * nr_cpus;
|
||||
|
||||
if (power > power_limit)
|
||||
break;
|
||||
}
|
||||
|
||||
freq = pd->table[i - 1].frequency;
|
||||
|
||||
freq_qos_update_request(&dtpm_cpu->qos_req, freq);
|
||||
|
||||
power_limit = pd->table[i - 1].power *
|
||||
MICROWATT_PER_MILLIWATT * nr_cpus;
|
||||
|
||||
return power_limit;
|
||||
}
|
||||
|
||||
static u64 get_pd_power_uw(struct dtpm *dtpm)
|
||||
{
|
||||
struct dtpm_cpu *dtpm_cpu = dtpm->private;
|
||||
struct em_perf_domain *pd;
|
||||
struct cpumask cpus;
|
||||
unsigned long freq;
|
||||
int i, nr_cpus;
|
||||
|
||||
pd = em_cpu_get(dtpm_cpu->cpu);
|
||||
freq = cpufreq_quick_get(dtpm_cpu->cpu);
|
||||
cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus));
|
||||
nr_cpus = cpumask_weight(&cpus);
|
||||
|
||||
for (i = 0; i < pd->nr_perf_states; i++) {
|
||||
|
||||
if (pd->table[i].frequency < freq)
|
||||
continue;
|
||||
|
||||
return pd->table[i].power *
|
||||
MICROWATT_PER_MILLIWATT * nr_cpus;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void pd_release(struct dtpm *dtpm)
|
||||
{
|
||||
struct dtpm_cpu *dtpm_cpu = dtpm->private;
|
||||
|
||||
if (freq_qos_request_active(&dtpm_cpu->qos_req))
|
||||
freq_qos_remove_request(&dtpm_cpu->qos_req);
|
||||
|
||||
kfree(dtpm_cpu);
|
||||
}
|
||||
|
||||
static struct dtpm_ops dtpm_ops = {
|
||||
.set_power_uw = set_pd_power_limit,
|
||||
.get_power_uw = get_pd_power_uw,
|
||||
.release = pd_release,
|
||||
};
|
||||
|
||||
static int cpuhp_dtpm_cpu_offline(unsigned int cpu)
|
||||
{
|
||||
struct cpufreq_policy *policy;
|
||||
struct em_perf_domain *pd;
|
||||
struct dtpm *dtpm;
|
||||
|
||||
policy = cpufreq_cpu_get(cpu);
|
||||
|
||||
if (!policy)
|
||||
return 0;
|
||||
|
||||
pd = em_cpu_get(cpu);
|
||||
if (!pd)
|
||||
return -EINVAL;
|
||||
|
||||
dtpm = per_cpu(dtpm_per_cpu, cpu);
|
||||
|
||||
power_sub(dtpm, pd);
|
||||
|
||||
if (cpumask_weight(policy->cpus) != 1)
|
||||
return 0;
|
||||
|
||||
for_each_cpu(cpu, policy->related_cpus)
|
||||
per_cpu(dtpm_per_cpu, cpu) = NULL;
|
||||
|
||||
dtpm_unregister(dtpm);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int cpuhp_dtpm_cpu_online(unsigned int cpu)
|
||||
{
|
||||
struct dtpm *dtpm;
|
||||
struct dtpm_cpu *dtpm_cpu;
|
||||
struct cpufreq_policy *policy;
|
||||
struct em_perf_domain *pd;
|
||||
char name[CPUFREQ_NAME_LEN];
|
||||
int ret = -ENOMEM;
|
||||
|
||||
policy = cpufreq_cpu_get(cpu);
|
||||
|
||||
if (!policy)
|
||||
return 0;
|
||||
|
||||
pd = em_cpu_get(cpu);
|
||||
if (!pd)
|
||||
return -EINVAL;
|
||||
|
||||
dtpm = per_cpu(dtpm_per_cpu, cpu);
|
||||
if (dtpm)
|
||||
return power_add(dtpm, pd);
|
||||
|
||||
dtpm = dtpm_alloc(&dtpm_ops);
|
||||
if (!dtpm)
|
||||
return -EINVAL;
|
||||
|
||||
dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);
|
||||
if (!dtpm_cpu)
|
||||
goto out_kfree_dtpm;
|
||||
|
||||
dtpm->private = dtpm_cpu;
|
||||
dtpm_cpu->cpu = cpu;
|
||||
|
||||
for_each_cpu(cpu, policy->related_cpus)
|
||||
per_cpu(dtpm_per_cpu, cpu) = dtpm;
|
||||
|
||||
sprintf(name, "cpu%d", dtpm_cpu->cpu);
|
||||
|
||||
ret = dtpm_register(name, dtpm, __parent);
|
||||
if (ret)
|
||||
goto out_kfree_dtpm_cpu;
|
||||
|
||||
ret = power_add(dtpm, pd);
|
||||
if (ret)
|
||||
goto out_dtpm_unregister;
|
||||
|
||||
ret = freq_qos_add_request(&policy->constraints,
|
||||
&dtpm_cpu->qos_req, FREQ_QOS_MAX,
|
||||
pd->table[pd->nr_perf_states - 1].frequency);
|
||||
if (ret)
|
||||
goto out_power_sub;
|
||||
|
||||
return 0;
|
||||
|
||||
out_power_sub:
|
||||
power_sub(dtpm, pd);
|
||||
|
||||
out_dtpm_unregister:
|
||||
dtpm_unregister(dtpm);
|
||||
dtpm_cpu = NULL;
|
||||
dtpm = NULL;
|
||||
|
||||
out_kfree_dtpm_cpu:
|
||||
for_each_cpu(cpu, policy->related_cpus)
|
||||
per_cpu(dtpm_per_cpu, cpu) = NULL;
|
||||
kfree(dtpm_cpu);
|
||||
|
||||
out_kfree_dtpm:
|
||||
kfree(dtpm);
|
||||
return ret;
|
||||
}
|
||||
|
||||
int dtpm_register_cpu(struct dtpm *parent)
|
||||
{
|
||||
__parent = parent;
|
||||
|
||||
return cpuhp_setup_state(CPUHP_AP_DTPM_CPU_ONLINE,
|
||||
"dtpm_cpu:online",
|
||||
cpuhp_dtpm_cpu_online,
|
||||
cpuhp_dtpm_cpu_offline);
|
||||
}
|
|
@ -547,7 +547,7 @@ static void rapl_init_domains(struct rapl_package *rp)
|
|||
|
||||
if (i == RAPL_DOMAIN_PLATFORM && rp->id > 0) {
|
||||
snprintf(rd->name, RAPL_DOMAIN_NAME_LENGTH, "psys-%d",
|
||||
cpu_data(rp->lead_cpu).phys_proc_id);
|
||||
topology_physical_package_id(rp->lead_cpu));
|
||||
} else
|
||||
snprintf(rd->name, RAPL_DOMAIN_NAME_LENGTH, "%s",
|
||||
rapl_domain_names[i]);
|
||||
|
@ -1049,6 +1049,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = {
|
|||
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE, &rapl_defaults_core),
|
||||
X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE, &rapl_defaults_core),
|
||||
X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &rapl_defaults_core),
|
||||
X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &rapl_defaults_core),
|
||||
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &rapl_defaults_spr_server),
|
||||
X86_MATCH_INTEL_FAM6_MODEL(LAKEFIELD, &rapl_defaults_core),
|
||||
|
||||
|
@ -1309,7 +1310,6 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
|
|||
{
|
||||
int id = topology_logical_die_id(cpu);
|
||||
struct rapl_package *rp;
|
||||
struct cpuinfo_x86 *c = &cpu_data(cpu);
|
||||
int ret;
|
||||
|
||||
if (!rapl_defaults)
|
||||
|
@ -1326,10 +1326,11 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
|
|||
|
||||
if (topology_max_die_per_package() > 1)
|
||||
snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH,
|
||||
"package-%d-die-%d", c->phys_proc_id, c->cpu_die_id);
|
||||
"package-%d-die-%d",
|
||||
topology_physical_package_id(cpu), topology_die_id(cpu));
|
||||
else
|
||||
snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d",
|
||||
c->phys_proc_id);
|
||||
topology_physical_package_id(cpu));
|
||||
|
||||
/* check if the package contains valid domains */
|
||||
if (rapl_detect_domains(rp, cpu) || rapl_defaults->check_unit(rp, cpu)) {
|
||||
|
|
|
@ -316,6 +316,16 @@
|
|||
#define THERMAL_TABLE(name)
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_DTPM
|
||||
#define DTPM_TABLE() \
|
||||
. = ALIGN(8); \
|
||||
__dtpm_table = .; \
|
||||
KEEP(*(__dtpm_table)) \
|
||||
__dtpm_table_end = .;
|
||||
#else
|
||||
#define DTPM_TABLE()
|
||||
#endif
|
||||
|
||||
#define KERNEL_DTB() \
|
||||
STRUCT_ALIGN(); \
|
||||
__dtb_start = .; \
|
||||
|
@ -733,6 +743,7 @@
|
|||
ACPI_PROBE_TABLE(irqchip) \
|
||||
ACPI_PROBE_TABLE(timer) \
|
||||
THERMAL_TABLE(governor) \
|
||||
DTPM_TABLE() \
|
||||
EARLYCON_TABLE() \
|
||||
LSM_TABLE() \
|
||||
EARLY_LSM_TABLE() \
|
||||
|
|
|
@ -193,6 +193,7 @@ enum cpuhp_state {
|
|||
CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 30,
|
||||
CPUHP_AP_X86_HPET_ONLINE,
|
||||
CPUHP_AP_X86_KVM_CLK_ONLINE,
|
||||
CPUHP_AP_DTPM_CPU_ONLINE,
|
||||
CPUHP_AP_ACTIVE,
|
||||
CPUHP_ONLINE,
|
||||
};
|
||||
|
|
|
@ -0,0 +1,77 @@
|
|||
/* SPDX-License-Identifier: GPL-2.0-only */
|
||||
/*
|
||||
* Copyright (C) 2020 Linaro Ltd
|
||||
*
|
||||
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
|
||||
*/
|
||||
#ifndef ___DTPM_H__
|
||||
#define ___DTPM_H__
|
||||
|
||||
#include <linux/powercap.h>
|
||||
|
||||
#define MAX_DTPM_DESCR 8
|
||||
#define MAX_DTPM_CONSTRAINTS 1
|
||||
|
||||
struct dtpm {
|
||||
struct powercap_zone zone;
|
||||
struct dtpm *parent;
|
||||
struct list_head sibling;
|
||||
struct list_head children;
|
||||
struct dtpm_ops *ops;
|
||||
unsigned long flags;
|
||||
u64 power_limit;
|
||||
u64 power_max;
|
||||
u64 power_min;
|
||||
int weight;
|
||||
void *private;
|
||||
};
|
||||
|
||||
struct dtpm_ops {
|
||||
u64 (*set_power_uw)(struct dtpm *, u64);
|
||||
u64 (*get_power_uw)(struct dtpm *);
|
||||
void (*release)(struct dtpm *);
|
||||
};
|
||||
|
||||
struct dtpm_descr;
|
||||
|
||||
typedef int (*dtpm_init_t)(struct dtpm_descr *);
|
||||
|
||||
struct dtpm_descr {
|
||||
struct dtpm *parent;
|
||||
const char *name;
|
||||
dtpm_init_t init;
|
||||
};
|
||||
|
||||
/* Init section thermal table */
|
||||
extern struct dtpm_descr *__dtpm_table[];
|
||||
extern struct dtpm_descr *__dtpm_table_end[];
|
||||
|
||||
#define DTPM_TABLE_ENTRY(name) \
|
||||
static typeof(name) *__dtpm_table_entry_##name \
|
||||
__used __section("__dtpm_table") = &name
|
||||
|
||||
#define DTPM_DECLARE(name) DTPM_TABLE_ENTRY(name)
|
||||
|
||||
#define for_each_dtpm_table(__dtpm) \
|
||||
for (__dtpm = __dtpm_table; \
|
||||
__dtpm < __dtpm_table_end; \
|
||||
__dtpm++)
|
||||
|
||||
static inline struct dtpm *to_dtpm(struct powercap_zone *zone)
|
||||
{
|
||||
return container_of(zone, struct dtpm, zone);
|
||||
}
|
||||
|
||||
int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max);
|
||||
|
||||
int dtpm_release_zone(struct powercap_zone *pcz);
|
||||
|
||||
struct dtpm *dtpm_alloc(struct dtpm_ops *ops);
|
||||
|
||||
void dtpm_unregister(struct dtpm *dtpm);
|
||||
|
||||
int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent);
|
||||
|
||||
int dtpm_register_cpu(struct dtpm *parent);
|
||||
|
||||
#endif
|
|
@ -4,6 +4,10 @@
|
|||
|
||||
#include <linux/math.h>
|
||||
|
||||
#define MILLIWATT_PER_WATT 1000L
|
||||
#define MICROWATT_PER_MILLIWATT 1000L
|
||||
#define MICROWATT_PER_WATT 1000000L
|
||||
|
||||
#define ABSOLUTE_ZERO_MILLICELSIUS -273150
|
||||
|
||||
static inline long milli_kelvin_to_millicelsius(long t)
|
||||
|
|
|
@ -139,7 +139,6 @@ config PM_SLEEP_SMP_NONZERO_CPU
|
|||
config PM_AUTOSLEEP
|
||||
bool "Opportunistic sleep"
|
||||
depends on PM_SLEEP
|
||||
default n
|
||||
help
|
||||
Allow the kernel to trigger a system transition into a global sleep
|
||||
state automatically whenever there are no active wakeup sources.
|
||||
|
@ -147,7 +146,6 @@ config PM_AUTOSLEEP
|
|||
config PM_WAKELOCKS
|
||||
bool "User space wakeup sources interface"
|
||||
depends on PM_SLEEP
|
||||
default n
|
||||
help
|
||||
Allow user space to create, activate and deactivate wakeup source
|
||||
objects with the help of a sysfs-based interface.
|
||||
|
@ -293,7 +291,6 @@ config PM_GENERIC_DOMAINS
|
|||
config WQ_POWER_EFFICIENT_DEFAULT
|
||||
bool "Enable workqueue power-efficient mode by default"
|
||||
depends on PM
|
||||
default n
|
||||
help
|
||||
Per-cpu workqueues are generally preferred because they show
|
||||
better performance thanks to cache locality; unfortunately,
|
||||
|
@ -322,15 +319,14 @@ config CPU_PM
|
|||
bool
|
||||
|
||||
config ENERGY_MODEL
|
||||
bool "Energy Model for CPUs"
|
||||
bool "Energy Model for devices with DVFS (CPUs, GPUs, etc)"
|
||||
depends on SMP
|
||||
depends on CPU_FREQ
|
||||
default n
|
||||
help
|
||||
Several subsystems (thermal and/or the task scheduler for example)
|
||||
can leverage information about the energy consumed by CPUs to make
|
||||
smarter decisions. This config option enables the framework from
|
||||
which subsystems can access the energy models.
|
||||
can leverage information about the energy consumed by devices to
|
||||
make smarter decisions. This config option enables the framework
|
||||
from which subsystems can access the energy models.
|
||||
|
||||
The exact usage of the energy model is subsystem-dependent.
|
||||
|
||||
|
|
Loading…
Reference in New Issue