OpenCloudOS-Kernel/kernel/power/energy_model.c

439 lines
10 KiB
C
Raw Normal View History

PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
// SPDX-License-Identifier: GPL-2.0
/*
* Energy Model of devices
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
*
* Copyright (c) 2018-2021, Arm ltd.
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
* Written by: Quentin Perret, Arm ltd.
* Improvements provided by: Lukasz Luba, Arm ltd.
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
*/
#define pr_fmt(fmt) "energy_model: " fmt
#include <linux/cpu.h>
#include <linux/cpufreq.h>
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
#include <linux/cpumask.h>
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
#include <linux/debugfs.h>
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
#include <linux/energy_model.h>
#include <linux/sched/topology.h>
#include <linux/slab.h>
/*
* Mutex serializing the registrations of performance domains and letting
* callbacks defined by drivers sleep.
*/
static DEFINE_MUTEX(em_pd_mutex);
static bool _is_cpu_device(struct device *dev)
{
return (dev->bus == &cpu_subsys);
}
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
#ifdef CONFIG_DEBUG_FS
static struct dentry *rootdir;
static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
{
struct dentry *d;
char name[24];
snprintf(name, sizeof(name), "ps:%lu", ps->frequency);
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
/* Create per-ps directory */
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
d = debugfs_create_dir(name, pd);
debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
debugfs_create_ulong("power", 0444, d, &ps->power);
debugfs_create_ulong("cost", 0444, d, &ps->cost);
debugfs_create_ulong("inefficient", 0444, d, &ps->flags);
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
}
static int em_debug_cpus_show(struct seq_file *s, void *unused)
{
seq_printf(s, "%*pbl\n", cpumask_pr_args(to_cpumask(s->private)));
return 0;
}
DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
static int em_debug_flags_show(struct seq_file *s, void *unused)
{
struct em_perf_domain *pd = s->private;
seq_printf(s, "%#lx\n", pd->flags);
return 0;
}
DEFINE_SHOW_ATTRIBUTE(em_debug_flags);
static void em_debug_create_pd(struct device *dev)
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
{
struct dentry *d;
int i;
/* Create the directory of the performance domain */
d = debugfs_create_dir(dev_name(dev), rootdir);
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
if (_is_cpu_device(dev))
debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
&em_debug_cpus_fops);
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
debugfs_create_file("flags", 0444, d, dev->em_pd,
&em_debug_flags_fops);
/* Create a sub-directory for each performance state */
for (i = 0; i < dev->em_pd->nr_perf_states; i++)
em_debug_create_ps(&dev->em_pd->table[i], d);
}
static void em_debug_remove_pd(struct device *dev)
{
debugfs_lookup_and_remove(dev_name(dev), rootdir);
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
}
static int __init em_debug_init(void)
{
/* Create /sys/kernel/debug/energy_model directory */
rootdir = debugfs_create_dir("energy_model", NULL);
return 0;
}
fs_initcall(em_debug_init);
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
#else /* CONFIG_DEBUG_FS */
static void em_debug_create_pd(struct device *dev) {}
static void em_debug_remove_pd(struct device *dev) {}
PM / EM: Expose the Energy Model in debugfs The recently introduced Energy Model (EM) framework manages power cost tables of CPUs. These tables are currently only visible from kernel space. However, in order to debug the behaviour of subsystems that use the EM (EAS for example), it is often required to know what the power costs are from userspace. For this reason, introduce under /sys/kernel/debug/energy_model a set of directories representing the performance domains of the system. Each performance domain contains a set of sub-directories representing the different capacity states (cs) and their attributes, as well as a file exposing the related CPUs. The resulting hierarchy is as follows on Arm juno r0 for example: /sys/kernel/debug/energy_model ├── pd0 │   ├── cpus │   ├── cs:450000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:575000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:700000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   ├── cs:775000 │   │   ├── cost │   │   ├── frequency │   │   └── power │   └── cs:850000 │   ├── cost │   ├── frequency │   └── power └── pd1 ├── cpus ├── cs:1100000 │   ├── cost │   ├── frequency │   └── power ├── cs:450000 │   ├── cost │   ├── frequency │   └── power ├── cs:625000 │   ├── cost │   ├── frequency │   └── power ├── cs:800000 │   ├── cost │   ├── frequency │   └── power └── cs:950000 ├── cost ├── frequency └── power Signed-off-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 00:42:47 +08:00
#endif
static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
int nr_states, struct em_data_callback *cb,
unsigned long flags)
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
{
unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX;
struct em_perf_state *table;
int i, ret;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
u64 fmax;
table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL);
if (!table)
return -ENOMEM;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
/* Build the list of performance states for this performance domain */
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
for (i = 0, freq = 0; i < nr_states; i++, freq++) {
/*
* active_power() is a driver callback which ceils 'freq' to
* lowest performance state of 'dev' above 'freq' and updates
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
* 'power' and 'freq' accordingly.
*/
ret = cb->active_power(dev, &power, &freq);
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
if (ret) {
dev_err(dev, "EM: invalid perf. state: %d\n",
ret);
goto free_ps_table;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
}
/*
* We expect the driver callback to increase the frequency for
* higher performance states.
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
*/
if (freq <= prev_freq) {
dev_err(dev, "EM: non-increasing freq: %lu\n",
freq);
goto free_ps_table;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
}
/*
* The power returned by active_state() is expected to be
PM: EM: convert power field to micro-Watts precision and align drivers The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-07 15:15:52 +08:00
* positive and be in range.
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
*/
if (!power || power > EM_MAX_POWER) {
dev_err(dev, "EM: invalid power: %lu\n",
power);
goto free_ps_table;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
}
table[i].power = power;
table[i].frequency = prev_freq = freq;
}
/* Compute the cost of each performance state. */
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
fmax = (u64) table[nr_states - 1].frequency;
for (i = nr_states - 1; i >= 0; i--) {
unsigned long power_res, cost;
if (flags & EM_PERF_DOMAIN_ARTIFICIAL) {
ret = cb->get_cost(dev, table[i].frequency, &cost);
if (ret || !cost || cost > EM_MAX_POWER) {
dev_err(dev, "EM: invalid cost %lu %d\n",
cost, ret);
goto free_ps_table;
}
} else {
PM: EM: convert power field to micro-Watts precision and align drivers The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-07 15:15:52 +08:00
power_res = table[i].power;
cost = div64_u64(fmax * power_res, table[i].frequency);
}
table[i].cost = cost;
PM: EM: Increase energy calculation precision The Energy Model (EM) provides useful information about device power in each performance state to other subsystems like: Energy Aware Scheduler (EAS). The energy calculation in EAS does arithmetic operation based on the EM em_cpu_energy(). Current implementation of that function uses em_perf_state::cost as a pre-computed cost coefficient equal to: cost = power * max_frequency / frequency. The 'power' is expressed in milli-Watts (or in abstract scale). There are corner cases when the EAS energy calculation for two Performance Domains (PDs) return the same value. The EAS compares these values to choose smaller one. It might happen that this values are equal due to rounding error. In such scenario, we need better resolution, e.g. 1000 times better. To provide this possibility increase the resolution in the em_perf_state::cost for 64-bit architectures. The cost of increasing resolution on 32-bit is pretty high (64-bit division) and is not justified since there are no new 32bit big.LITTLE EAS systems expected which would benefit from this higher resolution. This patch allows to avoid the rounding to milli-Watt errors, which might occur in EAS energy estimation for each PD. The rounding error is common for small tasks which have small utilization value. There are two places in the code where it makes a difference: 1. In the find_energy_efficient_cpu() where we are searching for best_delta. We might suffer there when two PDs return the same result, like in the example below. Scenario: Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There are quite a few small tasks ~10-15 util. These tasks would suffer for the rounding error. These utilization values are typical when running games on Android. One of our partners has reported 5..10mA less battery drain when running with increased resolution. Some details: We have two PDs: PD0 (big) and PD1 (little) Let's compare w/o patch set ('old') and w/ patch set ('new') We are comparing energy w/ task and w/o task placed in the PDs a) 'old' w/o patch set, PD0 task_util = 13 cost = 480 sum_util_w/o_task = 215 sum_util_w_task = 228 scale_cpu = 1024 energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100 energy_w_task = 480 * 228 / 1024 = 106.87 => 106 energy_diff = 106 - 100 = 6 (this is equal to 'old' PD1's energy_diff in 'c)') b) 'new' w/ patch set, PD0 task_util = 13 cost = 480 * 1000 = 480000 sum_util_w/o_task = 215 sum_util_w_task = 228 energy_w/o_task = 480000 * 215 / 1024 = 100781 energy_w_task = 480000 * 228 / 1024 = 106875 energy_diff = 106875 - 100781 = 6094 (this is not equal to 'new' PD1's energy_diff in 'd)') c) 'old' w/o patch set, PD1 task_util = 13 cost = 160 sum_util_w/o_task = 283 sum_util_w_task = 293 scale_cpu = 355 energy_w/o_task = 160 * 283 / 355 = 127.55 => 127 energy_w_task = 160 * 296 / 355 = 133.41 => 133 energy_diff = 133 - 127 = 6 (this is equal to 'old' PD0's energy_diff in 'a)') d) 'new' w/ patch set, PD1 task_util = 13 cost = 160 * 1000 = 160000 sum_util_w/o_task = 283 sum_util_w_task = 293 scale_cpu = 355 energy_w/o_task = 160000 * 283 / 355 = 127549 energy_w_task = 160000 * 296 / 355 = 133408 energy_diff = 133408 - 127549 = 5859 (this is not equal to 'new' PD0's energy_diff in 'b)') 2. Difference in the 6% energy margin filter at the end of find_energy_efficient_cpu(). With this patch the margin comparison also has better resolution, so it's possible to have better task placement thanks to that. Fixes: 27871f7a8a341ef ("PM: Introduce an Energy Model management framework") Reported-by: CCJ Yeh <CCj.Yeh@mediatek.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-03 18:27:43 +08:00
if (table[i].cost >= prev_cost) {
table[i].flags = EM_PERF_STATE_INEFFICIENT;
dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
table[i].frequency);
} else {
prev_cost = table[i].cost;
}
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
}
pd->table = table;
pd->nr_perf_states = nr_states;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
return 0;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
free_ps_table:
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
kfree(table);
return -EINVAL;
}
static int em_create_pd(struct device *dev, int nr_states,
struct em_data_callback *cb, cpumask_t *cpus,
unsigned long flags)
{
struct em_perf_domain *pd;
struct device *cpu_dev;
PM: EM: convert power field to micro-Watts precision and align drivers The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-07 15:15:52 +08:00
int cpu, ret, num_cpus;
if (_is_cpu_device(dev)) {
PM: EM: convert power field to micro-Watts precision and align drivers The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-07 15:15:52 +08:00
num_cpus = cpumask_weight(cpus);
/* Prevent max possible energy calculation to not overflow */
if (num_cpus > EM_MAX_NUM_CPUS) {
dev_err(dev, "EM: too many CPUs, overflow possible\n");
return -EINVAL;
}
pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
if (!pd)
return -ENOMEM;
cpumask_copy(em_span_cpus(pd), cpus);
} else {
pd = kzalloc(sizeof(*pd), GFP_KERNEL);
if (!pd)
return -ENOMEM;
}
ret = em_create_perf_table(dev, pd, nr_states, cb, flags);
if (ret) {
kfree(pd);
return ret;
}
if (_is_cpu_device(dev))
for_each_cpu(cpu, cpus) {
cpu_dev = get_cpu_device(cpu);
cpu_dev->em_pd = pd;
}
dev->em_pd = pd;
return 0;
}
static void em_cpufreq_update_efficiencies(struct device *dev)
{
struct em_perf_domain *pd = dev->em_pd;
struct em_perf_state *table;
struct cpufreq_policy *policy;
int found = 0;
int i;
if (!_is_cpu_device(dev) || !pd)
return;
policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd)));
if (!policy) {
dev_warn(dev, "EM: Access to CPUFreq policy failed");
return;
}
table = pd->table;
for (i = 0; i < pd->nr_perf_states; i++) {
if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
continue;
if (!cpufreq_table_set_inefficient(policy, table[i].frequency))
found++;
}
cpufreq_cpu_put(policy);
if (!found)
return;
/*
* Efficiencies have been installed in CPUFreq, inefficient frequencies
* will be skipped. The EM can do the same.
*/
pd->flags |= EM_PERF_DOMAIN_SKIP_INEFFICIENCIES;
}
/**
* em_pd_get() - Return the performance domain for a device
* @dev : Device to find the performance domain for
*
* Returns the performance domain to which @dev belongs, or NULL if it doesn't
* exist.
*/
struct em_perf_domain *em_pd_get(struct device *dev)
{
if (IS_ERR_OR_NULL(dev))
return NULL;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
return dev->em_pd;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
}
EXPORT_SYMBOL_GPL(em_pd_get);
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
/**
* em_cpu_get() - Return the performance domain for a CPU
* @cpu : CPU to find the performance domain for
*
* Returns the performance domain to which @cpu belongs, or NULL if it doesn't
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
* exist.
*/
struct em_perf_domain *em_cpu_get(int cpu)
{
struct device *cpu_dev;
cpu_dev = get_cpu_device(cpu);
if (!cpu_dev)
return NULL;
return em_pd_get(cpu_dev);
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
}
EXPORT_SYMBOL_GPL(em_cpu_get);
/**
* em_dev_register_perf_domain() - Register the Energy Model (EM) for a device
* @dev : Device for which the EM is to register
* @nr_states : Number of performance states to register
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
* @cb : Callback functions providing the data of the Energy Model
* @cpus : Pointer to cpumask_t, which in case of a CPU device is
* obligatory. It can be taken from i.e. 'policy->cpus'. For other
* type of devices this should be set to NULL.
PM: EM: convert power field to micro-Watts precision and align drivers The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-07 15:15:52 +08:00
* @microwatts : Flag indicating that the power values are in micro-Watts or
* in some other scale. It must be set properly.
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
*
* Create Energy Model tables for a performance domain using the callbacks
* defined in cb.
*
PM: EM: convert power field to micro-Watts precision and align drivers The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-07 15:15:52 +08:00
* The @microwatts is important to set with correct value. Some kernel
* sub-systems might rely on this flag and check if all devices in the EM are
* using the same scale.
*
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
* If multiple clients register the same performance domain, all but the first
* registration will be ignored.
*
* Return 0 on success
*/
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
struct em_data_callback *cb, cpumask_t *cpus,
PM: EM: convert power field to micro-Watts precision and align drivers The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-07 15:15:52 +08:00
bool microwatts)
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
{
unsigned long cap, prev_cap = 0;
unsigned long flags = 0;
int cpu, ret;
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
if (!dev || !nr_states || !cb)
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
return -EINVAL;
/*
* Use a mutex to serialize the registration of performance domains and
* let the driver-defined callback functions sleep.
*/
mutex_lock(&em_pd_mutex);
if (dev->em_pd) {
ret = -EEXIST;
goto unlock;
}
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
if (_is_cpu_device(dev)) {
if (!cpus) {
dev_err(dev, "EM: invalid CPU mask\n");
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
ret = -EINVAL;
goto unlock;
}
for_each_cpu(cpu, cpus) {
if (em_cpu_get(cpu)) {
dev_err(dev, "EM: exists for CPU%d\n", cpu);
ret = -EEXIST;
goto unlock;
}
/*
* All CPUs of a domain must have the same
* micro-architecture since they all share the same
* table.
*/
cap = arch_scale_cpu_capacity(cpu);
if (prev_cap && prev_cap != cap) {
dev_err(dev, "EM: CPUs of %*pbl must have the same capacity\n",
cpumask_pr_args(cpus));
ret = -EINVAL;
goto unlock;
}
prev_cap = cap;
}
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
}
PM: EM: convert power field to micro-Watts precision and align drivers The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-07 15:15:52 +08:00
if (microwatts)
flags |= EM_PERF_DOMAIN_MICROWATTS;
else if (cb->get_cost)
flags |= EM_PERF_DOMAIN_ARTIFICIAL;
ret = em_create_pd(dev, nr_states, cb, cpus, flags);
if (ret)
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
goto unlock;
dev->em_pd->flags |= flags;
em_cpufreq_update_efficiencies(dev);
em_debug_create_pd(dev);
dev_info(dev, "EM: created perf domain\n");
PM: Introduce an Energy Model management framework Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 17:56:16 +08:00
unlock:
mutex_unlock(&em_pd_mutex);
return ret;
}
EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
/**
* em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
* @dev : Device for which the EM is registered
*
* Unregister the EM for the specified @dev (but not a CPU device).
*/
void em_dev_unregister_perf_domain(struct device *dev)
{
if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
return;
if (_is_cpu_device(dev))
return;
/*
* The mutex separates all register/unregister requests and protects
* from potential clean-up/setup issues in the debugfs directories.
* The debugfs directory name is the same as device's name.
*/
mutex_lock(&em_pd_mutex);
em_debug_remove_pd(dev);
kfree(dev->em_pd->table);
kfree(dev->em_pd);
dev->em_pd = NULL;
mutex_unlock(&em_pd_mutex);
}
EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain);