From 104dc5e20ff52748a16f756ae946391bdc6a4d0a Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 20 Sep 2017 02:26:00 +0200 Subject: [PATCH 01/88] PM: Document rules on using pm_runtime_resume() in system suspend callbacks It quite often is necessary to resume devices from runtime suspend during system suspend for various reasons (for example, if their wakeup settings need to be changed), but that requires middle-layer or subsystem code to follow additional rules which currently are not clearly documented. Namely, if a driver calls pm_runtime_resume() for the device from its ->suspend (or equivalent) system sleep callback, that may not work if the middle layer above it has updated the state of the device from its ->prepare or ->suspend callbacks already in an incompatible way. For this reason, all middle layers must follow the rule that, until the ->suspend callback provided by the device's driver is invoked, the only way in which the device's state can be updated is by calling pm_runtime_resume() for it, if necessary. Fortunately enough, all middle layers in the code base today follow this rule, but it is not explicitly stated anywhere, so do that. Note that calling pm_runtime_resume() from the ->suspend callback of a driver will cause the ->runtime_resume callback provided by the middle layer to be invoked, but the rule above guarantees that this callback will nest properly with the middle layer's ->suspend callback and it will play well with the ->prepare one invoked before. Signed-off-by: Rafael J. Wysocki Reviewed-by: Ulf Hansson --- Documentation/driver-api/pm/devices.rst | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst index bedd32388dac..a8b07ec732bd 100644 --- a/Documentation/driver-api/pm/devices.rst +++ b/Documentation/driver-api/pm/devices.rst @@ -328,7 +328,10 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``. After the ``->prepare`` callback method returns, no new children may be registered below the device. The method may also prepare the device or driver in some way for the upcoming system power transition, but it - should not put the device into a low-power state. + should not put the device into a low-power state. Moreover, if the + device supports runtime power management, the ``->prepare`` callback + method must not update its state in case it is necessary to resume it + from runtime suspend later on. For devices supporting runtime power management, the return value of the prepare callback can be used to indicate to the PM core that it may @@ -356,6 +359,16 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``. the appropriate low-power state, depending on the bus type the device is on, and they may enable wakeup events. + However, for devices supporting runtime power management, the + ``->suspend`` methods provided by subsystems (bus types and PM domains + in particular) must follow an additional rule regarding what can be done + to the devices before their drivers' ``->suspend`` methods are called. + Namely, they can only resume the devices from runtime suspend by + calling :c:func:`pm_runtime_resume` for them, if that is necessary, and + they must not update the state of the devices in any other way at that + time (in case the drivers need to resume the devices from runtime + suspend in their ``->suspend`` methods). + 3. For a number of devices it is convenient to split suspend into the "quiesce device" and "save device state" phases, in which cases ``suspend_late`` is meant to do the latter. It is always executed after @@ -729,6 +742,16 @@ state temporarily, for example so that its system wakeup capability can be disabled. This all depends on the hardware and the design of the subsystem and device driver in question. +If it is necessary to resume a device from runtime suspend during a system-wide +transition into a sleep state, that can be done by calling +:c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its +couterpart for transitions related to hibernation) of either the device's driver +or a subsystem responsible for it (for example, a bus type or a PM domain). +That is guaranteed to work by the requirement that subsystems must not change +the state of devices (possibly except for resuming them from runtime suspend) +from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before* +invoking device drivers' ``->suspend`` callbacks (or equivalent). + During system-wide resume from a sleep state it's easiest to put devices into the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. Refer to that document for more information regarding this particular issue as From a380f2edef65b2447a043251bb3c00a9d2153a8b Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 25 Sep 2017 14:56:44 +0200 Subject: [PATCH 02/88] PM / core: Drop legacy class suspend/resume operations There are no classes using the legacy suspend/resume operations in the tree any more, so drop these operations and update the code referring to them accordingly. Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman --- drivers/base/power/main.c | 32 +++++++++----------------------- include/linux/device.h | 5 ----- 2 files changed, 9 insertions(+), 28 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 770b1539a083..12abcf6084a5 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -848,16 +848,10 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) goto Driver; } - if (dev->class) { - if (dev->class->pm) { - info = "class "; - callback = pm_op(dev->class->pm, state); - goto Driver; - } else if (dev->class->resume) { - info = "legacy class "; - callback = dev->class->resume; - goto End; - } + if (dev->class && dev->class->pm) { + info = "class "; + callback = pm_op(dev->class->pm, state); + goto Driver; } if (dev->bus) { @@ -1508,17 +1502,10 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) goto Run; } - if (dev->class) { - if (dev->class->pm) { - info = "class "; - callback = pm_op(dev->class->pm, state); - goto Run; - } else if (dev->class->suspend) { - pm_dev_dbg(dev, state, "legacy class "); - error = legacy_suspend(dev, state, dev->class->suspend, - "legacy class "); - goto End; - } + if (dev->class && dev->class->pm) { + info = "class "; + callback = pm_op(dev->class->pm, state); + goto Run; } if (dev->bus) { @@ -1862,8 +1849,7 @@ void device_pm_check_callbacks(struct device *dev) dev->power.no_pm_callbacks = (!dev->bus || (pm_ops_is_empty(dev->bus->pm) && !dev->bus->suspend && !dev->bus->resume)) && - (!dev->class || (pm_ops_is_empty(dev->class->pm) && - !dev->class->suspend && !dev->class->resume)) && + (!dev->class || pm_ops_is_empty(dev->class->pm)) && (!dev->type || pm_ops_is_empty(dev->type->pm)) && (!dev->pm_domain || pm_ops_is_empty(&dev->pm_domain->ops)) && (!dev->driver || (pm_ops_is_empty(dev->driver->pm) && diff --git a/include/linux/device.h b/include/linux/device.h index 1d2607923a24..c1527f887050 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -372,9 +372,6 @@ int subsys_virtual_register(struct bus_type *subsys, * @devnode: Callback to provide the devtmpfs. * @class_release: Called to release this class. * @dev_release: Called to release the device. - * @suspend: Used to put the device to sleep mode, usually to a low power - * state. - * @resume: Used to bring the device from the sleep mode. * @shutdown_pre: Called at shut-down time before driver shutdown. * @ns_type: Callbacks so sysfs can detemine namespaces. * @namespace: Namespace of the device belongs to this class. @@ -402,8 +399,6 @@ struct class { void (*class_release)(struct class *class); void (*dev_release)(struct device *dev); - int (*suspend)(struct device *dev, pm_message_t state); - int (*resume)(struct device *dev); int (*shutdown_pre)(struct device *dev); const struct kobj_ns_type_operations *ns_type; From f187851b9b4a76952b1158b86434563dd2031103 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 1 Sep 2017 14:29:56 +1000 Subject: [PATCH 03/88] cpuidle: fix broadcast control when broadcast can not be entered When failing to enter broadcast timer mode for an idle state that requires it, a new state is selected that does not require broadcast, but the broadcast variable remains set. This causes tick_broadcast_exit to be called despite not having entered broadcast mode. This causes the WARN_ON_ONCE(!irqs_disabled()) to trigger in some cases. It does not appear to cause problems for code today, but seems to violate the interface so should be fixed. Signed-off-by: Nicholas Piggin Reviewed-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/cpuidle.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 484cc8909d5c..ed4df58a855e 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -208,6 +208,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, return -EBUSY; } target_state = &drv->states[index]; + broadcast = false; } /* Take note of the planned idle state. */ From 1cb31d3fd4d96b19624328da0a0496adf76f98a6 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 25 Sep 2017 01:33:13 +0200 Subject: [PATCH 04/88] PCI / PM: Do not resume any devices in pci_pm_prepare() It should not be necessary to resume devices with ignore_children set in pci_pm_prepare(), because they should be resumed explicitly by their children drivers during suspend if need be and they will be resumed by pci_pm_suspend() after that anyway, so avoid doing that. Signed-off-by: Rafael J. Wysocki Acked-by: Bjorn Helgaas Signed-off-by: Rafael J. Wysocki --- drivers/pci/pci-driver.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 11bd267fc137..3d04b59ffdb2 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -680,13 +680,6 @@ static int pci_pm_prepare(struct device *dev) { struct device_driver *drv = dev->driver; - /* - * Devices having power.ignore_children set may still be necessary for - * suspending their children in the next phase of device suspend. - */ - if (dev->power.ignore_children) - pm_runtime_resume(dev); - if (drv && drv->pm && drv->pm->prepare) { int error = drv->pm->prepare(dev); if (error) From 5408211a8f290ace85147858f4e05e18b942f489 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:06 +0100 Subject: [PATCH 05/88] drivers base/arch_topology: free cpumask cpus_to_visit Free cpumask cpus_to_visit in case registering init_cpu_capacity_notifier has failed or the parsing of the cpu capacity-dmips-mhz property is done. The cpumask cpus_to_visit is only used inside the notifier call init_cpu_capacity_callback. Reported-by: Vincent Guittot Signed-off-by: Dietmar Eggemann Acked-by: Vincent Guittot Acked-by: Viresh Kumar Tested-by: Juri Lelli Reviewed-by: Juri Lelli Signed-off-by: Rafael J. Wysocki --- drivers/base/arch_topology.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 41be9ff7d70a..e9bb368f32b3 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -212,6 +212,8 @@ static struct notifier_block init_cpu_capacity_notifier = { static int __init register_cpufreq_notifier(void) { + int ret; + /* * on ACPI-based systems we need to use the default cpu capacity * until we have the necessary code to parse the cpu capacity, so @@ -227,8 +229,13 @@ static int __init register_cpufreq_notifier(void) cpumask_copy(cpus_to_visit, cpu_possible_mask); - return cpufreq_register_notifier(&init_cpu_capacity_notifier, - CPUFREQ_POLICY_NOTIFIER); + ret = cpufreq_register_notifier(&init_cpu_capacity_notifier, + CPUFREQ_POLICY_NOTIFIER); + + if (ret) + free_cpumask_var(cpus_to_visit); + + return ret; } core_initcall(register_cpufreq_notifier); @@ -236,6 +243,7 @@ static void parsing_done_workfn(struct work_struct *work) { cpufreq_unregister_notifier(&init_cpu_capacity_notifier, CPUFREQ_POLICY_NOTIFIER); + free_cpumask_var(cpus_to_visit); } #else From e7d5459dfaf613799915e901189d296bdc7534f9 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:07 +0100 Subject: [PATCH 06/88] cpufreq: provide default frequency-invariance setter function Frequency-invariant accounting support based on the ratio of current frequency and maximum supported frequency is an optional feature an arch can implement. Since there are cpufreq drivers (e.g. cpufreq-dt) which can be build for different arch's a default implementation of the frequency-invariance setter function arch_set_freq_scale() is needed. This default implementation is an empty weak function which will be overwritten by a strong function in case the arch provides one. The setter function passes the cpumask of related (to the frequency change) cpus (online and offline cpus), the (new) current frequency and the maximum supported frequency. Signed-off-by: Dietmar Eggemann Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq.c | 6 ++++++ include/linux/cpufreq.h | 3 +++ 2 files changed, 9 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index ea43b147a7fe..41d148af7748 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -161,6 +161,12 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy) } EXPORT_SYMBOL_GPL(get_cpu_idle_time); +__weak void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq, + unsigned long max_freq) +{ +} +EXPORT_SYMBOL_GPL(arch_set_freq_scale); + /* * This is a generic cpufreq init() routine which can be used by cpufreq * drivers of SMP systems. It will do following: diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 537ff842ff73..28734ee185a7 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -919,6 +919,9 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy) extern unsigned int arch_freq_get_on_cpu(int cpu); +extern void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq, + unsigned long max_freq); + /* the following are really really optional */ extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs; extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs; From 518accf2062971344310ee85d7fdab55e8c11337 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:08 +0100 Subject: [PATCH 07/88] cpufreq: arm_big_little: invoke frequency-invariance setter function Call the frequency-invariance setter function arch_set_freq_scale() if the new frequency has been successfully set which is indicated by bL_cpufreq_set_rate() returning 0. Signed-off-by: Dietmar Eggemann Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/arm_big_little.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c index 17504129fd77..0c41ab3b16eb 100644 --- a/drivers/cpufreq/arm_big_little.c +++ b/drivers/cpufreq/arm_big_little.c @@ -213,6 +213,7 @@ static int bL_cpufreq_set_target(struct cpufreq_policy *policy, { u32 cpu = policy->cpu, cur_cluster, new_cluster, actual_cluster; unsigned int freqs_new; + int ret; cur_cluster = cpu_to_cluster(cpu); new_cluster = actual_cluster = per_cpu(physical_cluster, cpu); @@ -229,7 +230,14 @@ static int bL_cpufreq_set_target(struct cpufreq_policy *policy, } } - return bL_cpufreq_set_rate(cpu, actual_cluster, new_cluster, freqs_new); + ret = bL_cpufreq_set_rate(cpu, actual_cluster, new_cluster, freqs_new); + + if (!ret) { + arch_set_freq_scale(policy->related_cpus, freqs_new, + policy->cpuinfo.max_freq); + } + + return ret; } static inline u32 get_table_count(struct cpufreq_frequency_table *table) From 400ec74d3b3784a48b09db9518aa2c4b6d4e497f Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:09 +0100 Subject: [PATCH 08/88] cpufreq: dt: invoke frequency-invariance setter function Call the frequency-invariance setter function arch_set_freq_scale() if the new frequency has been successfully set which is indicated by dev_pm_opp_set_rate() returning 0. Signed-off-by: Dietmar Eggemann Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq-dt.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c index d83ab94d041a..545946ad0752 100644 --- a/drivers/cpufreq/cpufreq-dt.c +++ b/drivers/cpufreq/cpufreq-dt.c @@ -43,9 +43,17 @@ static struct freq_attr *cpufreq_dt_attr[] = { static int set_target(struct cpufreq_policy *policy, unsigned int index) { struct private_data *priv = policy->driver_data; + unsigned long freq = policy->freq_table[index].frequency; + int ret; - return dev_pm_opp_set_rate(priv->cpu_dev, - policy->freq_table[index].frequency * 1000); + ret = dev_pm_opp_set_rate(priv->cpu_dev, freq * 1000); + + if (!ret) { + arch_set_freq_scale(policy->related_cpus, freq, + policy->cpuinfo.max_freq); + } + + return ret; } /* From 0e27c567d1673137b06aa96bb7aef635fb657dee Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:10 +0100 Subject: [PATCH 09/88] drivers base/arch_topology: provide frequency-invariant accounting support Implements the arch-specific (arm and arm64) frequency-invariance setter function arch_set_freq_scale() which provides the following frequency scaling factor: current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_supported_freq(cpu) One possible consumer of the frequency-invariance getter function topology_get_freq_scale() is the Per-Entity Load Tracking (PELT) mechanism of the task scheduler. Allow inlining of topology_get_freq_scale() into the task scheduler fast path (e.g. __update_load_avg_se()) by coding it as a static inline function in the arch topology header file. Signed-off-by: Dietmar Eggemann Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/base/arch_topology.c | 14 ++++++++++++++ include/linux/arch_topology.h | 9 +++++++++ 2 files changed, 23 insertions(+) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index e9bb368f32b3..416ec2f5211d 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -22,6 +22,20 @@ #include #include +DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE; + +void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq, + unsigned long max_freq) +{ + unsigned long scale; + int i; + + scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq; + + for_each_cpu(i, cpus) + per_cpu(freq_scale, i) = scale; +} + static DEFINE_MUTEX(cpu_scale_mutex); static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index 716ce587247e..f6e490312e4d 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -5,6 +5,7 @@ #define _LINUX_ARCH_TOPOLOGY_H_ #include +#include void topology_normalize_cpu_scale(void); @@ -16,4 +17,12 @@ unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu); void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity); +DECLARE_PER_CPU(unsigned long, freq_scale); + +static inline +unsigned long topology_get_freq_scale(struct sched_domain *sd, int cpu) +{ + return per_cpu(freq_scale, cpu); +} + #endif /* _LINUX_ARCH_TOPOLOGY_H_ */ From 8216f588b52b61ce36fc0080218e4730435e58b7 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:11 +0100 Subject: [PATCH 10/88] drivers base/arch_topology: allow inlining cpu-invariant accounting support Allow inlining of topology_get_cpu_scale() into the task scheduler fast path (e.g. __update_load_avg_se()) by coding it as a static inline function in the arch topology header file. Signed-off-by: Dietmar Eggemann Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/base/arch_topology.c | 7 +------ include/linux/arch_topology.h | 8 +++++++- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 416ec2f5211d..aea0b9d521f6 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -37,12 +37,7 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq, } static DEFINE_MUTEX(cpu_scale_mutex); -static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; - -unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu) -{ - return per_cpu(cpu_scale, cpu); -} +DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity) { diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index f6e490312e4d..c189de3ef5df 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -12,8 +12,14 @@ void topology_normalize_cpu_scale(void); struct device_node; bool topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu); +DECLARE_PER_CPU(unsigned long, cpu_scale); + struct sched_domain; -unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu); +static inline +unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu) +{ + return per_cpu(cpu_scale, cpu); +} void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity); From 3a1ed9cfaf2849195886b80274008526ff2b5f02 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:12 +0100 Subject: [PATCH 11/88] arm: wire frequency-invariant accounting support up to the task scheduler Commit dfbca41f3479 ("sched: Optimize freq invariant accounting") changed the wiring which now has to be done by associating arch_scale_freq_capacity with the actual implementation provided by the architecture. Define arch_scale_freq_capacity to use the arch_topology "driver" function topology_get_freq_scale() for the task scheduler's frequency-invariant accounting instead of the default arch_scale_freq_capacity() in kernel/sched/sched.h. Signed-off-by: Dietmar Eggemann Acked-by: Vincent Guittot Acked-by: Russell King Tested-by: Juri Lelli Reviewed-by: Juri Lelli Signed-off-by: Rafael J. Wysocki --- arch/arm/include/asm/topology.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h index 370f7a732900..a56a9e24f4c0 100644 --- a/arch/arm/include/asm/topology.h +++ b/arch/arm/include/asm/topology.h @@ -24,6 +24,11 @@ void init_cpu_topology(void); void store_cpu_topology(unsigned int cpuid); const struct cpumask *cpu_coregroup_mask(int cpu); +#include + +/* Replace task scheduler's default frequency-invariant accounting */ +#define arch_scale_freq_capacity topology_get_freq_scale + #else static inline void init_cpu_topology(void) { } From 552c4653bf89147c945bd676d28d4746c9500002 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:13 +0100 Subject: [PATCH 12/88] arm: wire cpu-invariant accounting support up to the task scheduler Commit 8cd5601c5060 ("sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define") changed the wiring which now has to be done by associating arch_scale_cpu_capacity with the actual implementation provided by the architecture. Define arch_scale_cpu_capacity to use the arch_topology "driver" function topology_get_cpu_scale() for the task scheduler's cpu-invariant accounting instead of the default arch_scale_cpu_capacity() in kernel/sched/sched.h. Signed-off-by: Dietmar Eggemann Acked-by: Vincent Guittot Acked-by: Russell King Tested-by: Juri Lelli Reviewed-by: Juri Lelli Signed-off-by: Rafael J. Wysocki --- arch/arm/include/asm/topology.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h index a56a9e24f4c0..b713e7223bc4 100644 --- a/arch/arm/include/asm/topology.h +++ b/arch/arm/include/asm/topology.h @@ -29,6 +29,9 @@ const struct cpumask *cpu_coregroup_mask(int cpu); /* Replace task scheduler's default frequency-invariant accounting */ #define arch_scale_freq_capacity topology_get_freq_scale +/* Replace task scheduler's default cpu-invariant accounting */ +#define arch_scale_cpu_capacity topology_get_cpu_scale + #else static inline void init_cpu_topology(void) { } From 4e63ebe50d456d7284600822d414d69da35f6977 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:14 +0100 Subject: [PATCH 13/88] arm64: wire frequency-invariant accounting support up to the task scheduler Commit dfbca41f3479 ("sched: Optimize freq invariant accounting") changed the wiring which now has to be done by associating arch_scale_freq_capacity with the actual implementation provided by the architecture. Define arch_scale_freq_capacity to use the arch_topology "driver" function topology_get_freq_scale() for the task scheduler's frequency-invariant accounting instead of the default arch_scale_freq_capacity() in kernel/sched/sched.h. Signed-off-by: Dietmar Eggemann Acked-by: Catalin Marinas Acked-by: Vincent Guittot Tested-by: Juri Lelli Reviewed-by: Juri Lelli Signed-off-by: Rafael J. Wysocki --- arch/arm64/include/asm/topology.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h index 8b57339823e9..44598a86ec4a 100644 --- a/arch/arm64/include/asm/topology.h +++ b/arch/arm64/include/asm/topology.h @@ -32,6 +32,11 @@ int pcibus_to_node(struct pci_bus *bus); #endif /* CONFIG_NUMA */ +#include + +/* Replace task scheduler's default frequency-invariant accounting */ +#define arch_scale_freq_capacity topology_get_freq_scale + #include #endif /* _ASM_ARM_TOPOLOGY_H */ From 431ead0ff19b440b1ba25c72d732190b44615cdb Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 26 Sep 2017 17:41:15 +0100 Subject: [PATCH 14/88] arm64: wire cpu-invariant accounting support up to the task scheduler Commit 8cd5601c5060 ("sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define") changed the wiring which now has to be done by associating arch_scale_cpu_capacity with the actual implementation provided by the architecture. Define arch_scale_cpu_capacity to use the arch_topology "driver" function topology_get_cpu_scale() for the task scheduler's cpu-invariant accounting instead of the default arch_scale_cpu_capacity() in kernel/sched/sched.h. Signed-off-by: Dietmar Eggemann Acked-by: Catalin Marinas Acked-by: Vincent Guittot Tested-by: Juri Lelli Reviewed-by: Juri Lelli Signed-off-by: Rafael J. Wysocki --- arch/arm64/include/asm/topology.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h index 44598a86ec4a..e313eeb10756 100644 --- a/arch/arm64/include/asm/topology.h +++ b/arch/arm64/include/asm/topology.h @@ -37,6 +37,9 @@ int pcibus_to_node(struct pci_bus *bus); /* Replace task scheduler's default frequency-invariant accounting */ #define arch_scale_freq_capacity topology_get_freq_scale +/* Replace task scheduler's default cpu-invariant accounting */ +#define arch_scale_cpu_capacity topology_get_cpu_scale + #include #endif /* _ASM_ARM_TOPOLOGY_H */ From ca67ab5c5afbcec9df199e01838270eb5668af68 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sat, 30 Sep 2017 01:31:15 +0200 Subject: [PATCH 15/88] PCI / PM: Add dev_dbg() to print device suspend power states It sometimes is useful to know what power states the kernel thinks it puts PCI devices into during system suspend, so add a dev_dbg() statement for that. Signed-off-by: Rafael J. Wysocki Acked-by: Bjorn Helgaas --- drivers/pci/pci-driver.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 3d04b59ffdb2..9be563067c0c 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -798,6 +798,9 @@ static int pci_pm_suspend_noirq(struct device *dev) pci_prepare_to_sleep(pci_dev); } + dev_dbg(dev, "PCI PM: Suspend power state: %s\n", + pci_power_name(pci_dev->current_state)); + pci_pm_set_unknown_state(pci_dev); /* From 7813dd6fc75fb375d4caf002e7f80a826fc3153a Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 26 Sep 2017 15:12:40 -0700 Subject: [PATCH 16/88] PM / OPP: Move the OPP directory out of power/ The drivers/base/power/ directory is special and contains code related to power management core like system suspend/resume, hibernation, etc. It was fine to keep the OPP code inside it when we had just one file for it, but it is growing now and already has a directory for itself. Lets move it directly under drivers/ directory, just like cpufreq and cpuidle. Signed-off-by: Viresh Kumar Acked-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- MAINTAINERS | 2 +- drivers/Kconfig | 2 ++ drivers/Makefile | 1 + drivers/base/power/Makefile | 1 - drivers/opp/Kconfig | 13 +++++++++++++ drivers/{base/power => }/opp/Makefile | 0 drivers/{base/power => }/opp/core.c | 0 drivers/{base/power => }/opp/cpu.c | 0 drivers/{base/power => }/opp/debugfs.c | 0 drivers/{base/power => }/opp/of.c | 0 drivers/{base/power => }/opp/opp.h | 0 kernel/power/Kconfig | 14 -------------- 12 files changed, 17 insertions(+), 16 deletions(-) create mode 100644 drivers/opp/Kconfig rename drivers/{base/power => }/opp/Makefile (100%) rename drivers/{base/power => }/opp/core.c (100%) rename drivers/{base/power => }/opp/cpu.c (100%) rename drivers/{base/power => }/opp/debugfs.c (100%) rename drivers/{base/power => }/opp/of.c (100%) rename drivers/{base/power => }/opp/opp.h (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 65b0c88d5ee0..7c8c649fc68b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10043,7 +10043,7 @@ M: Stephen Boyd L: linux-pm@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git -F: drivers/base/power/opp/ +F: drivers/opp/ F: include/linux/pm_opp.h F: Documentation/power/opp.txt F: Documentation/devicetree/bindings/opp/ diff --git a/drivers/Kconfig b/drivers/Kconfig index 505c676fa9c7..9e264d410c23 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -208,4 +208,6 @@ source "drivers/tee/Kconfig" source "drivers/mux/Kconfig" +source "drivers/opp/Kconfig" + endmenu diff --git a/drivers/Makefile b/drivers/Makefile index d90fdc413648..dd718a3007e9 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -125,6 +125,7 @@ obj-$(CONFIG_ACCESSIBILITY) += accessibility/ obj-$(CONFIG_ISDN) += isdn/ obj-$(CONFIG_EDAC) += edac/ obj-$(CONFIG_EISA) += eisa/ +obj-$(CONFIG_PM_OPP) += opp/ obj-$(CONFIG_CPU_FREQ) += cpufreq/ obj-$(CONFIG_CPU_IDLE) += cpuidle/ obj-y += mmc/ diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile index 5998c53280f5..73a1cffc0a5f 100644 --- a/drivers/base/power/Makefile +++ b/drivers/base/power/Makefile @@ -1,7 +1,6 @@ obj-$(CONFIG_PM) += sysfs.o generic_ops.o common.o qos.o runtime.o wakeirq.o obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o obj-$(CONFIG_PM_TRACE_RTC) += trace.o -obj-$(CONFIG_PM_OPP) += opp/ obj-$(CONFIG_PM_GENERIC_DOMAINS) += domain.o domain_governor.o obj-$(CONFIG_HAVE_CLK) += clock_ops.o diff --git a/drivers/opp/Kconfig b/drivers/opp/Kconfig new file mode 100644 index 000000000000..a7fbb93f302c --- /dev/null +++ b/drivers/opp/Kconfig @@ -0,0 +1,13 @@ +config PM_OPP + bool + select SRCU + ---help--- + SOCs have a standard set of tuples consisting of frequency and + voltage pairs that the device will support per voltage domain. This + is called Operating Performance Point or OPP. The actual definitions + of OPP varies over silicon within the same family of devices. + + OPP layer organizes the data internally using device pointers + representing individual voltage domains and provides SOC + implementations a ready to use framework to manage OPPs. + For more information, read diff --git a/drivers/base/power/opp/Makefile b/drivers/opp/Makefile similarity index 100% rename from drivers/base/power/opp/Makefile rename to drivers/opp/Makefile diff --git a/drivers/base/power/opp/core.c b/drivers/opp/core.c similarity index 100% rename from drivers/base/power/opp/core.c rename to drivers/opp/core.c diff --git a/drivers/base/power/opp/cpu.c b/drivers/opp/cpu.c similarity index 100% rename from drivers/base/power/opp/cpu.c rename to drivers/opp/cpu.c diff --git a/drivers/base/power/opp/debugfs.c b/drivers/opp/debugfs.c similarity index 100% rename from drivers/base/power/opp/debugfs.c rename to drivers/opp/debugfs.c diff --git a/drivers/base/power/opp/of.c b/drivers/opp/of.c similarity index 100% rename from drivers/base/power/opp/of.c rename to drivers/opp/of.c diff --git a/drivers/base/power/opp/opp.h b/drivers/opp/opp.h similarity index 100% rename from drivers/base/power/opp/opp.h rename to drivers/opp/opp.h diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig index e8517b63eb37..e880ca22c5a5 100644 --- a/kernel/power/Kconfig +++ b/kernel/power/Kconfig @@ -259,20 +259,6 @@ config APM_EMULATION anything, try disabling/enabling this option (or disabling/enabling APM in your BIOS). -config PM_OPP - bool - select SRCU - ---help--- - SOCs have a standard set of tuples consisting of frequency and - voltage pairs that the device will support per voltage domain. This - is called Operating Performance Point or OPP. The actual definitions - of OPP varies over silicon within the same family of devices. - - OPP layer organizes the data internally using device pointers - representing individual voltage domains and provides SOC - implementations a ready to use framework to manage OPPs. - For more information, read - config PM_CLK def_bool y depends on PM && HAVE_CLK From 3eba6e121155d56e2b50305a48fa11749c503a2e Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Wed, 30 Aug 2017 00:37:03 +0900 Subject: [PATCH 17/88] cpufreq: dt-platdev: drop socionext,uniphier-ld6b from whitelist As you see arch/arm/boot/dts/uniphier-ld6b.dtsi, it includes uniphier-pxs2.dtsi, which uses "operating-points-v2" property and whose cpufreq device is automatically created. Signed-off-by: Masahiro Yamada Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq-dt-platdev.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c index a753c50e9e41..de659fc564f4 100644 --- a/drivers/cpufreq/cpufreq-dt-platdev.c +++ b/drivers/cpufreq/cpufreq-dt-platdev.c @@ -83,8 +83,6 @@ static const struct of_device_id whitelist[] __initconst = { { .compatible = "rockchip,rk3368", }, { .compatible = "rockchip,rk3399", }, - { .compatible = "socionext,uniphier-ld6b", }, - { .compatible = "st-ericsson,u8500", }, { .compatible = "st-ericsson,u8540", }, { .compatible = "st-ericsson,u9500", }, From 86d806b55fb9a8b99c8a4802d27c771dabecc206 Mon Sep 17 00:00:00 2001 From: Arvind Yadav Date: Mon, 25 Sep 2017 15:10:11 +0530 Subject: [PATCH 18/88] cpufreq: powernow-k8: pr_err() strings should end with newlines pr_err() messages should terminated with a new-line to avoid other messages being concatenated onto the end. Signed-off-by: Arvind Yadav Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/powernow-k8.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c index 062d71434e47..b01e31db5f83 100644 --- a/drivers/cpufreq/powernow-k8.c +++ b/drivers/cpufreq/powernow-k8.c @@ -1043,7 +1043,7 @@ static int powernowk8_cpu_init(struct cpufreq_policy *pol) data = kzalloc(sizeof(*data), GFP_KERNEL); if (!data) { - pr_err("unable to alloc powernow_k8_data"); + pr_err("unable to alloc powernow_k8_data\n"); return -ENOMEM; } From 699b52528eff8863e2c104d22d444d229236ce62 Mon Sep 17 00:00:00 2001 From: Arvind Yadav Date: Mon, 25 Sep 2017 15:43:49 +0530 Subject: [PATCH 19/88] cpufreq: SPEAr: pr_err() strings should end with newlines pr_err() messages should terminated with a new-line to avoid other messages being concatenated onto the end. Signed-off-by: Arvind Yadav Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/spear-cpufreq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/spear-cpufreq.c b/drivers/cpufreq/spear-cpufreq.c index 4894924a3ca2..195f27f9c1cb 100644 --- a/drivers/cpufreq/spear-cpufreq.c +++ b/drivers/cpufreq/spear-cpufreq.c @@ -177,7 +177,7 @@ static int spear_cpufreq_probe(struct platform_device *pdev) np = of_cpu_device_node_get(0); if (!np) { - pr_err("No cpu node found"); + pr_err("No cpu node found\n"); return -ENODEV; } @@ -187,7 +187,7 @@ static int spear_cpufreq_probe(struct platform_device *pdev) prop = of_find_property(np, "cpufreq_tbl", NULL); if (!prop || !prop->value) { - pr_err("Invalid cpufreq_tbl"); + pr_err("Invalid cpufreq_tbl\n"); ret = -ENODEV; goto out_put_node; } From 05829d9431df1bf6de98679fbcfbad282c1c55a4 Mon Sep 17 00:00:00 2001 From: Zumeng Chen Date: Wed, 27 Sep 2017 15:08:17 +0800 Subject: [PATCH 20/88] cpufreq: ti-cpufreq: kfree opp_data when failure memory leakage was found by kmemleak. opp_data needs to be freed when failure, including fail_put_node. unreferenced object 0xccdd4c40 (size 64): comm "swapper", pid 1, jiffies 4294938465 (age 888.520s) hex dump (first 32 bytes): 00 7c 00 c1 98 69 d8 ce 00 24 03 ce 00 24 03 ce .|...i...$...$.. 20 35 23 c1 00 00 00 00 00 00 00 00 00 00 00 00 5#............. backtrace: [] kmem_cache_alloc_trace+0x2c4/0x3cc [] ti_cpufreq_probe+0x6c/0x334 [] platform_drv_probe+0x60/0xc0 [] driver_probe_device+0x218/0x2c4 [] __device_attach_driver+0xa8/0xdc [] bus_for_each_drv+0x70/0xa4 [] __device_attach+0xc0/0x124 [] device_initial_probe+0x1c/0x20 [] bus_probe_device+0x94/0x9c [] device_add+0x404/0x590 [] platform_device_add+0x11c/0x230 [] platform_device_register_full+0x10c/0x128 [] ti_cpufreq_init+0x44/0x50 [] do_one_initcall+0x54/0x180 [] kernel_init_freeable+0x270/0x33c [] kernel_init+0x18/0x124 Signed-off-by: Zumeng Chen Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/ti-cpufreq.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c index 4bf47de6101f..ffcddcd4c5e6 100644 --- a/drivers/cpufreq/ti-cpufreq.c +++ b/drivers/cpufreq/ti-cpufreq.c @@ -217,7 +217,8 @@ static int ti_cpufreq_init(void) opp_data->cpu_dev = get_cpu_device(0); if (!opp_data->cpu_dev) { pr_err("%s: Failed to get device for CPU0\n", __func__); - return -ENODEV; + ret = ENODEV; + goto free_opp_data; } opp_data->opp_node = dev_pm_opp_of_get_opp_desc_node(opp_data->cpu_dev); @@ -262,6 +263,8 @@ register_cpufreq_dt: fail_put_node: of_node_put(opp_data->opp_node); +free_opp_data: + kfree(opp_data); return ret; } From 64ec72a1ece37d9bc7ba8b11d6091ce7cb1d8eec Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Wed, 27 Sep 2017 22:01:34 -0700 Subject: [PATCH 21/88] PM: Use a more common logging style Convert printks to pr_. Miscellanea: o Use pr_fmt with "PM:" and remove "PM: " from format strings o Coalesce format strings and realign format arguments o Convert an embedded incorrect function name to "%s: ", __func__ o Convert a couple multi-line formats to multiple pr_ calls Signed-off-by: Joe Perches Acked-by: Pavel Machek Signed-off-by: Rafael J. Wysocki --- kernel/power/qos.c | 4 +- kernel/power/snapshot.c | 35 +++++------ kernel/power/swap.c | 128 ++++++++++++++++++---------------------- 3 files changed, 77 insertions(+), 90 deletions(-) diff --git a/kernel/power/qos.c b/kernel/power/qos.c index 97b0df71303e..9d7503910ce2 100644 --- a/kernel/power/qos.c +++ b/kernel/power/qos.c @@ -701,8 +701,8 @@ static int __init pm_qos_power_init(void) for (i = PM_QOS_CPU_DMA_LATENCY; i < PM_QOS_NUM_CLASSES; i++) { ret = register_pm_qos_misc(pm_qos_array[i], d); if (ret < 0) { - printk(KERN_ERR "pm_qos_param: %s setup failed\n", - pm_qos_array[i]->name); + pr_err("%s: %s setup failed\n", + __func__, pm_qos_array[i]->name); return ret; } } diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 0972a8e09d08..a917a301e201 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -10,6 +10,8 @@ * */ +#define pr_fmt(fmt) "PM: " fmt + #include #include #include @@ -967,7 +969,7 @@ void __init __register_nosave_region(unsigned long start_pfn, region->end_pfn = end_pfn; list_add_tail(®ion->list, &nosave_regions); Report: - printk(KERN_INFO "PM: Registered nosave memory: [mem %#010llx-%#010llx]\n", + pr_info("Registered nosave memory: [mem %#010llx-%#010llx]\n", (unsigned long long) start_pfn << PAGE_SHIFT, ((unsigned long long) end_pfn << PAGE_SHIFT) - 1); } @@ -1039,7 +1041,7 @@ static void mark_nosave_pages(struct memory_bitmap *bm) list_for_each_entry(region, &nosave_regions, list) { unsigned long pfn; - pr_debug("PM: Marking nosave pages: [mem %#010llx-%#010llx]\n", + pr_debug("Marking nosave pages: [mem %#010llx-%#010llx]\n", (unsigned long long) region->start_pfn << PAGE_SHIFT, ((unsigned long long) region->end_pfn << PAGE_SHIFT) - 1); @@ -1095,7 +1097,7 @@ int create_basic_memory_bitmaps(void) free_pages_map = bm2; mark_nosave_pages(forbidden_pages_map); - pr_debug("PM: Basic memory bitmaps created\n"); + pr_debug("Basic memory bitmaps created\n"); return 0; @@ -1131,7 +1133,7 @@ void free_basic_memory_bitmaps(void) memory_bm_free(bm2, PG_UNSAFE_CLEAR); kfree(bm2); - pr_debug("PM: Basic memory bitmaps freed\n"); + pr_debug("Basic memory bitmaps freed\n"); } void clear_free_pages(void) @@ -1152,7 +1154,7 @@ void clear_free_pages(void) pfn = memory_bm_next_pfn(bm); } memory_bm_position_reset(bm); - pr_info("PM: free pages cleared after restore\n"); + pr_info("free pages cleared after restore\n"); #endif /* PAGE_POISONING_ZERO */ } @@ -1690,7 +1692,7 @@ int hibernate_preallocate_memory(void) ktime_t start, stop; int error; - printk(KERN_INFO "PM: Preallocating image memory... "); + pr_info("Preallocating image memory... "); start = ktime_get(); error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY); @@ -1821,13 +1823,13 @@ int hibernate_preallocate_memory(void) out: stop = ktime_get(); - printk(KERN_CONT "done (allocated %lu pages)\n", pages); + pr_cont("done (allocated %lu pages)\n", pages); swsusp_show_speed(start, stop, pages, "Allocated"); return 0; err_out: - printk(KERN_CONT "\n"); + pr_cont("\n"); swsusp_free(); return -ENOMEM; } @@ -1867,8 +1869,8 @@ static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem) free += zone_page_state(zone, NR_FREE_PAGES); nr_pages += count_pages_for_highmem(nr_highmem); - pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n", - nr_pages, PAGES_FOR_IO, free); + pr_debug("Normal pages needed: %u + %u, available pages: %u\n", + nr_pages, PAGES_FOR_IO, free); return free > nr_pages + PAGES_FOR_IO; } @@ -1961,20 +1963,20 @@ asmlinkage __visible int swsusp_save(void) { unsigned int nr_pages, nr_highmem; - printk(KERN_INFO "PM: Creating hibernation image:\n"); + pr_info("Creating hibernation image:\n"); drain_local_pages(NULL); nr_pages = count_data_pages(); nr_highmem = count_highmem_pages(); - printk(KERN_INFO "PM: Need to copy %u pages\n", nr_pages + nr_highmem); + pr_info("Need to copy %u pages\n", nr_pages + nr_highmem); if (!enough_free_mem(nr_pages, nr_highmem)) { - printk(KERN_ERR "PM: Not enough free memory\n"); + pr_err("Not enough free memory\n"); return -ENOMEM; } if (swsusp_alloc(©_bm, nr_pages, nr_highmem)) { - printk(KERN_ERR "PM: Memory allocation failed\n"); + pr_err("Memory allocation failed\n"); return -ENOMEM; } @@ -1995,8 +1997,7 @@ asmlinkage __visible int swsusp_save(void) nr_copy_pages = nr_pages; nr_meta_pages = DIV_ROUND_UP(nr_pages * sizeof(long), PAGE_SIZE); - printk(KERN_INFO "PM: Hibernation image created (%d pages copied)\n", - nr_pages); + pr_info("Hibernation image created (%d pages copied)\n", nr_pages); return 0; } @@ -2170,7 +2171,7 @@ static int check_header(struct swsusp_info *info) if (!reason && info->num_physpages != get_num_physpages()) reason = "memory size"; if (reason) { - printk(KERN_ERR "PM: Image mismatch: %s\n", reason); + pr_err("Image mismatch: %s\n", reason); return -EPERM; } return 0; diff --git a/kernel/power/swap.c b/kernel/power/swap.c index d7cdc426ee38..293ead59eccc 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -12,6 +12,8 @@ * */ +#define pr_fmt(fmt) "PM: " fmt + #include #include #include @@ -241,9 +243,9 @@ static void hib_end_io(struct bio *bio) struct page *page = bio->bi_io_vec[0].bv_page; if (bio->bi_status) { - printk(KERN_ALERT "Read-error on swap-device (%u:%u:%Lu)\n", - MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)), - (unsigned long long)bio->bi_iter.bi_sector); + pr_alert("Read-error on swap-device (%u:%u:%Lu)\n", + MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)), + (unsigned long long)bio->bi_iter.bi_sector); } if (bio_data_dir(bio) == WRITE) @@ -273,8 +275,8 @@ static int hib_submit_io(int op, int op_flags, pgoff_t page_off, void *addr, bio_set_op_attrs(bio, op, op_flags); if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { - printk(KERN_ERR "PM: Adding page to bio failed at %llu\n", - (unsigned long long)bio->bi_iter.bi_sector); + pr_err("Adding page to bio failed at %llu\n", + (unsigned long long)bio->bi_iter.bi_sector); bio_put(bio); return -EFAULT; } @@ -319,7 +321,7 @@ static int mark_swapfiles(struct swap_map_handle *handle, unsigned int flags) error = hib_submit_io(REQ_OP_WRITE, REQ_SYNC, swsusp_resume_block, swsusp_header, NULL); } else { - printk(KERN_ERR "PM: Swap header not found!\n"); + pr_err("Swap header not found!\n"); error = -ENODEV; } return error; @@ -413,8 +415,7 @@ static int get_swap_writer(struct swap_map_handle *handle) ret = swsusp_swap_check(); if (ret) { if (ret != -ENOSPC) - printk(KERN_ERR "PM: Cannot find swap device, try " - "swapon -a.\n"); + pr_err("Cannot find swap device, try swapon -a\n"); return ret; } handle->cur = (struct swap_map_page *)get_zeroed_page(GFP_KERNEL); @@ -491,9 +492,9 @@ static int swap_writer_finish(struct swap_map_handle *handle, { if (!error) { flush_swap_writer(handle); - printk(KERN_INFO "PM: S"); + pr_info("S"); error = mark_swapfiles(handle, flags); - printk("|\n"); + pr_cont("|\n"); } if (error) @@ -542,7 +543,7 @@ static int save_image(struct swap_map_handle *handle, hib_init_batch(&hb); - printk(KERN_INFO "PM: Saving image data pages (%u pages)...\n", + pr_info("Saving image data pages (%u pages)...\n", nr_to_write); m = nr_to_write / 10; if (!m) @@ -557,8 +558,8 @@ static int save_image(struct swap_map_handle *handle, if (ret) break; if (!(nr_pages % m)) - printk(KERN_INFO "PM: Image saving progress: %3d%%\n", - nr_pages / m * 10); + pr_info("Image saving progress: %3d%%\n", + nr_pages / m * 10); nr_pages++; } err2 = hib_wait_io(&hb); @@ -566,7 +567,7 @@ static int save_image(struct swap_map_handle *handle, if (!ret) ret = err2; if (!ret) - printk(KERN_INFO "PM: Image saving done.\n"); + pr_info("Image saving done\n"); swsusp_show_speed(start, stop, nr_to_write, "Wrote"); return ret; } @@ -692,14 +693,14 @@ static int save_image_lzo(struct swap_map_handle *handle, page = (void *)__get_free_page(__GFP_RECLAIM | __GFP_HIGH); if (!page) { - printk(KERN_ERR "PM: Failed to allocate LZO page\n"); + pr_err("Failed to allocate LZO page\n"); ret = -ENOMEM; goto out_clean; } data = vmalloc(sizeof(*data) * nr_threads); if (!data) { - printk(KERN_ERR "PM: Failed to allocate LZO data\n"); + pr_err("Failed to allocate LZO data\n"); ret = -ENOMEM; goto out_clean; } @@ -708,7 +709,7 @@ static int save_image_lzo(struct swap_map_handle *handle, crc = kmalloc(sizeof(*crc), GFP_KERNEL); if (!crc) { - printk(KERN_ERR "PM: Failed to allocate crc\n"); + pr_err("Failed to allocate crc\n"); ret = -ENOMEM; goto out_clean; } @@ -726,8 +727,7 @@ static int save_image_lzo(struct swap_map_handle *handle, "image_compress/%u", thr); if (IS_ERR(data[thr].thr)) { data[thr].thr = NULL; - printk(KERN_ERR - "PM: Cannot start compression threads\n"); + pr_err("Cannot start compression threads\n"); ret = -ENOMEM; goto out_clean; } @@ -749,7 +749,7 @@ static int save_image_lzo(struct swap_map_handle *handle, crc->thr = kthread_run(crc32_threadfn, crc, "image_crc32"); if (IS_ERR(crc->thr)) { crc->thr = NULL; - printk(KERN_ERR "PM: Cannot start CRC32 thread\n"); + pr_err("Cannot start CRC32 thread\n"); ret = -ENOMEM; goto out_clean; } @@ -760,10 +760,9 @@ static int save_image_lzo(struct swap_map_handle *handle, */ handle->reqd_free_pages = reqd_free_pages(); - printk(KERN_INFO - "PM: Using %u thread(s) for compression.\n" - "PM: Compressing and saving image data (%u pages)...\n", - nr_threads, nr_to_write); + pr_info("Using %u thread(s) for compression\n", nr_threads); + pr_info("Compressing and saving image data (%u pages)...\n", + nr_to_write); m = nr_to_write / 10; if (!m) m = 1; @@ -783,10 +782,8 @@ static int save_image_lzo(struct swap_map_handle *handle, data_of(*snapshot), PAGE_SIZE); if (!(nr_pages % m)) - printk(KERN_INFO - "PM: Image saving progress: " - "%3d%%\n", - nr_pages / m * 10); + pr_info("Image saving progress: %3d%%\n", + nr_pages / m * 10); nr_pages++; } if (!off) @@ -813,15 +810,14 @@ static int save_image_lzo(struct swap_map_handle *handle, ret = data[thr].ret; if (ret < 0) { - printk(KERN_ERR "PM: LZO compression failed\n"); + pr_err("LZO compression failed\n"); goto out_finish; } if (unlikely(!data[thr].cmp_len || data[thr].cmp_len > lzo1x_worst_compress(data[thr].unc_len))) { - printk(KERN_ERR - "PM: Invalid LZO compressed length\n"); + pr_err("Invalid LZO compressed length\n"); ret = -1; goto out_finish; } @@ -857,7 +853,7 @@ out_finish: if (!ret) ret = err2; if (!ret) - printk(KERN_INFO "PM: Image saving done.\n"); + pr_info("Image saving done\n"); swsusp_show_speed(start, stop, nr_to_write, "Wrote"); out_clean: if (crc) { @@ -888,7 +884,7 @@ static int enough_swap(unsigned int nr_pages, unsigned int flags) unsigned int free_swap = count_swap_pages(root_swap, 1); unsigned int required; - pr_debug("PM: Free swap pages: %u\n", free_swap); + pr_debug("Free swap pages: %u\n", free_swap); required = PAGES_FOR_IO + nr_pages; return free_swap > required; @@ -915,12 +911,12 @@ int swsusp_write(unsigned int flags) pages = snapshot_get_image_size(); error = get_swap_writer(&handle); if (error) { - printk(KERN_ERR "PM: Cannot get swap writer\n"); + pr_err("Cannot get swap writer\n"); return error; } if (flags & SF_NOCOMPRESS_MODE) { if (!enough_swap(pages, flags)) { - printk(KERN_ERR "PM: Not enough free swap\n"); + pr_err("Not enough free swap\n"); error = -ENOSPC; goto out_finish; } @@ -1068,8 +1064,7 @@ static int load_image(struct swap_map_handle *handle, hib_init_batch(&hb); clean_pages_on_read = true; - printk(KERN_INFO "PM: Loading image data pages (%u pages)...\n", - nr_to_read); + pr_info("Loading image data pages (%u pages)...\n", nr_to_read); m = nr_to_read / 10; if (!m) m = 1; @@ -1087,8 +1082,8 @@ static int load_image(struct swap_map_handle *handle, if (ret) break; if (!(nr_pages % m)) - printk(KERN_INFO "PM: Image loading progress: %3d%%\n", - nr_pages / m * 10); + pr_info("Image loading progress: %3d%%\n", + nr_pages / m * 10); nr_pages++; } err2 = hib_wait_io(&hb); @@ -1096,7 +1091,7 @@ static int load_image(struct swap_map_handle *handle, if (!ret) ret = err2; if (!ret) { - printk(KERN_INFO "PM: Image loading done.\n"); + pr_info("Image loading done\n"); snapshot_write_finalize(snapshot); if (!snapshot_image_loaded(snapshot)) ret = -ENODATA; @@ -1190,14 +1185,14 @@ static int load_image_lzo(struct swap_map_handle *handle, page = vmalloc(sizeof(*page) * LZO_MAX_RD_PAGES); if (!page) { - printk(KERN_ERR "PM: Failed to allocate LZO page\n"); + pr_err("Failed to allocate LZO page\n"); ret = -ENOMEM; goto out_clean; } data = vmalloc(sizeof(*data) * nr_threads); if (!data) { - printk(KERN_ERR "PM: Failed to allocate LZO data\n"); + pr_err("Failed to allocate LZO data\n"); ret = -ENOMEM; goto out_clean; } @@ -1206,7 +1201,7 @@ static int load_image_lzo(struct swap_map_handle *handle, crc = kmalloc(sizeof(*crc), GFP_KERNEL); if (!crc) { - printk(KERN_ERR "PM: Failed to allocate crc\n"); + pr_err("Failed to allocate crc\n"); ret = -ENOMEM; goto out_clean; } @@ -1226,8 +1221,7 @@ static int load_image_lzo(struct swap_map_handle *handle, "image_decompress/%u", thr); if (IS_ERR(data[thr].thr)) { data[thr].thr = NULL; - printk(KERN_ERR - "PM: Cannot start decompression threads\n"); + pr_err("Cannot start decompression threads\n"); ret = -ENOMEM; goto out_clean; } @@ -1249,7 +1243,7 @@ static int load_image_lzo(struct swap_map_handle *handle, crc->thr = kthread_run(crc32_threadfn, crc, "image_crc32"); if (IS_ERR(crc->thr)) { crc->thr = NULL; - printk(KERN_ERR "PM: Cannot start CRC32 thread\n"); + pr_err("Cannot start CRC32 thread\n"); ret = -ENOMEM; goto out_clean; } @@ -1274,8 +1268,7 @@ static int load_image_lzo(struct swap_map_handle *handle, if (!page[i]) { if (i < LZO_CMP_PAGES) { ring_size = i; - printk(KERN_ERR - "PM: Failed to allocate LZO pages\n"); + pr_err("Failed to allocate LZO pages\n"); ret = -ENOMEM; goto out_clean; } else { @@ -1285,10 +1278,9 @@ static int load_image_lzo(struct swap_map_handle *handle, } want = ring_size = i; - printk(KERN_INFO - "PM: Using %u thread(s) for decompression.\n" - "PM: Loading and decompressing image data (%u pages)...\n", - nr_threads, nr_to_read); + pr_info("Using %u thread(s) for decompression\n", nr_threads); + pr_info("Loading and decompressing image data (%u pages)...\n", + nr_to_read); m = nr_to_read / 10; if (!m) m = 1; @@ -1348,8 +1340,7 @@ static int load_image_lzo(struct swap_map_handle *handle, if (unlikely(!data[thr].cmp_len || data[thr].cmp_len > lzo1x_worst_compress(LZO_UNC_SIZE))) { - printk(KERN_ERR - "PM: Invalid LZO compressed length\n"); + pr_err("Invalid LZO compressed length\n"); ret = -1; goto out_finish; } @@ -1400,16 +1391,14 @@ static int load_image_lzo(struct swap_map_handle *handle, ret = data[thr].ret; if (ret < 0) { - printk(KERN_ERR - "PM: LZO decompression failed\n"); + pr_err("LZO decompression failed\n"); goto out_finish; } if (unlikely(!data[thr].unc_len || data[thr].unc_len > LZO_UNC_SIZE || data[thr].unc_len & (PAGE_SIZE - 1))) { - printk(KERN_ERR - "PM: Invalid LZO uncompressed length\n"); + pr_err("Invalid LZO uncompressed length\n"); ret = -1; goto out_finish; } @@ -1420,10 +1409,8 @@ static int load_image_lzo(struct swap_map_handle *handle, data[thr].unc + off, PAGE_SIZE); if (!(nr_pages % m)) - printk(KERN_INFO - "PM: Image loading progress: " - "%3d%%\n", - nr_pages / m * 10); + pr_info("Image loading progress: %3d%%\n", + nr_pages / m * 10); nr_pages++; ret = snapshot_write_next(snapshot); @@ -1448,15 +1435,14 @@ out_finish: } stop = ktime_get(); if (!ret) { - printk(KERN_INFO "PM: Image loading done.\n"); + pr_info("Image loading done\n"); snapshot_write_finalize(snapshot); if (!snapshot_image_loaded(snapshot)) ret = -ENODATA; if (!ret) { if (swsusp_header->flags & SF_CRC32_MODE) { if(handle->crc32 != swsusp_header->crc32) { - printk(KERN_ERR - "PM: Invalid image CRC32!\n"); + pr_err("Invalid image CRC32!\n"); ret = -ENODATA; } } @@ -1513,9 +1499,9 @@ int swsusp_read(unsigned int *flags_p) swap_reader_finish(&handle); end: if (!error) - pr_debug("PM: Image successfully loaded\n"); + pr_debug("Image successfully loaded\n"); else - pr_debug("PM: Error %d resuming\n", error); + pr_debug("Error %d resuming\n", error); return error; } @@ -1552,13 +1538,13 @@ put: if (error) blkdev_put(hib_resume_bdev, FMODE_READ); else - pr_debug("PM: Image signature found, resuming\n"); + pr_debug("Image signature found, resuming\n"); } else { error = PTR_ERR(hib_resume_bdev); } if (error) - pr_debug("PM: Image not found (code %d)\n", error); + pr_debug("Image not found (code %d)\n", error); return error; } @@ -1570,7 +1556,7 @@ put: void swsusp_close(fmode_t mode) { if (IS_ERR(hib_resume_bdev)) { - pr_debug("PM: Image device not initialised\n"); + pr_debug("Image device not initialised\n"); return; } @@ -1594,7 +1580,7 @@ int swsusp_unmark(void) swsusp_resume_block, swsusp_header, NULL); } else { - printk(KERN_ERR "PM: Cannot find swsusp signature!\n"); + pr_err("Cannot find swsusp signature!\n"); error = -ENODEV; } From eb672c0239da9084fc8103b63ba5fdaec6aab8be Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 26 Sep 2017 22:45:44 +0200 Subject: [PATCH 22/88] PM: ARM: locomo: Drop suspend and resume bus type callbacks None of the locomo drivers in the tree implements the suspend and resume callbacks from struct locomo_driver, so drop them and drop the corresponding callbacks from locomo_bus_type. Signed-off-by: Rafael J. Wysocki Reviewed-by: Ulf Hansson --- arch/arm/common/locomo.c | 24 ------------------------ arch/arm/include/asm/hardware/locomo.h | 2 -- 2 files changed, 26 deletions(-) diff --git a/arch/arm/common/locomo.c b/arch/arm/common/locomo.c index 6c7b06854fce..51936bde1eb2 100644 --- a/arch/arm/common/locomo.c +++ b/arch/arm/common/locomo.c @@ -826,28 +826,6 @@ static int locomo_match(struct device *_dev, struct device_driver *_drv) return dev->devid == drv->devid; } -static int locomo_bus_suspend(struct device *dev, pm_message_t state) -{ - struct locomo_dev *ldev = LOCOMO_DEV(dev); - struct locomo_driver *drv = LOCOMO_DRV(dev->driver); - int ret = 0; - - if (drv && drv->suspend) - ret = drv->suspend(ldev, state); - return ret; -} - -static int locomo_bus_resume(struct device *dev) -{ - struct locomo_dev *ldev = LOCOMO_DEV(dev); - struct locomo_driver *drv = LOCOMO_DRV(dev->driver); - int ret = 0; - - if (drv && drv->resume) - ret = drv->resume(ldev); - return ret; -} - static int locomo_bus_probe(struct device *dev) { struct locomo_dev *ldev = LOCOMO_DEV(dev); @@ -875,8 +853,6 @@ struct bus_type locomo_bus_type = { .match = locomo_match, .probe = locomo_bus_probe, .remove = locomo_bus_remove, - .suspend = locomo_bus_suspend, - .resume = locomo_bus_resume, }; int locomo_driver_register(struct locomo_driver *driver) diff --git a/arch/arm/include/asm/hardware/locomo.h b/arch/arm/include/asm/hardware/locomo.h index 74e51d6bd93f..f8712e3c29cf 100644 --- a/arch/arm/include/asm/hardware/locomo.h +++ b/arch/arm/include/asm/hardware/locomo.h @@ -189,8 +189,6 @@ struct locomo_driver { unsigned int devid; int (*probe)(struct locomo_dev *); int (*remove)(struct locomo_dev *); - int (*suspend)(struct locomo_dev *, pm_message_t); - int (*resume)(struct locomo_dev *); }; #define LOCOMO_DRV(_d) container_of((_d), struct locomo_driver, drv) From 8055af0a4fddb45a8cd925fb9bc71f4b52628c9a Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Fri, 6 Oct 2017 09:08:34 +0200 Subject: [PATCH 23/88] ACPI / PM: Remove stale function header acpi_dev_pm_get_node() isn't used or implemented, so remove it. Signed-off-by: Ulf Hansson Acked-by: Pavel Machek Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 5 ----- 1 file changed, 5 deletions(-) diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 502af53ec012..3b89b4fe6812 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -868,17 +868,12 @@ int acpi_dev_runtime_suspend(struct device *dev); int acpi_dev_runtime_resume(struct device *dev); int acpi_subsys_runtime_suspend(struct device *dev); int acpi_subsys_runtime_resume(struct device *dev); -struct acpi_device *acpi_dev_pm_get_node(struct device *dev); int acpi_dev_pm_attach(struct device *dev, bool power_on); #else static inline int acpi_dev_runtime_suspend(struct device *dev) { return 0; } static inline int acpi_dev_runtime_resume(struct device *dev) { return 0; } static inline int acpi_subsys_runtime_suspend(struct device *dev) { return 0; } static inline int acpi_subsys_runtime_resume(struct device *dev) { return 0; } -static inline struct acpi_device *acpi_dev_pm_get_node(struct device *dev) -{ - return NULL; -} static inline int acpi_dev_pm_attach(struct device *dev, bool power_on) { return -ENODEV; From d741029a2390406d4d94279ae5b346831a9e61e6 Mon Sep 17 00:00:00 2001 From: Arvind Yadav Date: Thu, 21 Sep 2017 11:15:36 +0530 Subject: [PATCH 24/88] PM / OPP: Use snprintf() to avoid kasprintf() and kfree() Use snprintf() to avoid unnecessary initializations, avoid calling kfree(). Signed-off-by: Arvind Yadav Acked-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- drivers/opp/debugfs.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/opp/debugfs.c b/drivers/opp/debugfs.c index 81cf120fcf43..9318848f3c67 100644 --- a/drivers/opp/debugfs.c +++ b/drivers/opp/debugfs.c @@ -41,16 +41,15 @@ static bool opp_debug_create_supplies(struct dev_pm_opp *opp, { struct dentry *d; int i; - char *name; for (i = 0; i < opp_table->regulator_count; i++) { - name = kasprintf(GFP_KERNEL, "supply-%d", i); + char name[15]; + + snprintf(name, sizeof(name), "supply-%d", i); /* Create per-opp directory */ d = debugfs_create_dir(name, pdentry); - kfree(name); - if (!d) return false; From 035ed07208dc501d023873447113f3f178592156 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Fri, 29 Sep 2017 14:39:49 -0300 Subject: [PATCH 25/88] PM / OPP: Move error message to debug level On some i.MX6 platforms which do not have speed grading check, opp table will not be created in platform code, so cpufreq driver prints the following error message: cpu cpu0: dev_pm_opp_get_opp_count: OPP table not found (-19) However, this is not really an error in this case because the imx6q-cpufreq driver first calls dev_pm_opp_get_opp_count() and if it fails, it means that platform code does not provide OPP and then dev_pm_opp_of_add_table() will be called. In order to avoid such confusing error message, move it to debug level. It is up to the caller of dev_pm_opp_get_opp_count() to check its return value and decide if it will print an error or not. Signed-off-by: Fabio Estevam Signed-off-by: Rafael J. Wysocki --- drivers/opp/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index a6de32530693..0459b1204694 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -296,7 +296,7 @@ int dev_pm_opp_get_opp_count(struct device *dev) opp_table = _find_opp_table(dev); if (IS_ERR(opp_table)) { count = PTR_ERR(opp_table); - dev_err(dev, "%s: OPP table not found (%d)\n", + dev_dbg(dev, "%s: OPP table not found (%d)\n", __func__, count); return count; } From 7978db344719dab1e56d05e6fc04aaaddcde0a5e Mon Sep 17 00:00:00 2001 From: Tobias Jordan Date: Wed, 4 Oct 2017 11:35:03 +0530 Subject: [PATCH 26/88] PM / OPP: Add missing of_node_put(np) The for_each_available_child_of_node() loop in _of_add_opp_table_v2() doesn't drop the reference to "np" on errors. Fix that. Fixes: 274659029c9d (PM / OPP: Add support to parse "operating-points-v2" bindings) Cc: 4.3+ # 4.3+ Signed-off-by: Tobias Jordan [ VK: Improved commit log. ] Signed-off-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- drivers/opp/of.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 0b718886479b..87509cb69f79 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -397,6 +397,7 @@ static int _of_add_opp_table_v2(struct device *dev, struct device_node *opp_np) dev_err(dev, "%s: Failed to add OPP, %d\n", __func__, ret); _dev_pm_opp_remove_table(opp_table, dev, false); + of_node_put(np); goto put_opp_table; } } From 604a7aeb4325b8ecb23df163c89fc12248302a4e Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 5 Oct 2017 17:26:21 +0530 Subject: [PATCH 27/88] PM / OPP: Rename dev_pm_opp_register_put_opp_helper() The routine is named incorrectly since the first attempt as there is nothing like a put_opp() helper. We wanted to unregister the set_opp() helper here and so it should rather be named as dev_pm_opp_unregister_set_opp_helper(). Signed-off-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- drivers/opp/core.c | 6 +++--- include/linux/pm_opp.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 0459b1204694..80c21207e48c 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1476,13 +1476,13 @@ err: EXPORT_SYMBOL_GPL(dev_pm_opp_register_set_opp_helper); /** - * dev_pm_opp_register_put_opp_helper() - Releases resources blocked for + * dev_pm_opp_unregister_set_opp_helper() - Releases resources blocked for * set_opp helper * @opp_table: OPP table returned from dev_pm_opp_register_set_opp_helper(). * * Release resources blocked for platform specific set_opp helper. */ -void dev_pm_opp_register_put_opp_helper(struct opp_table *opp_table) +void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table) { if (!opp_table->set_opp) { pr_err("%s: Doesn't have custom set_opp helper set\n", @@ -1497,7 +1497,7 @@ void dev_pm_opp_register_put_opp_helper(struct opp_table *opp_table) dev_pm_opp_put_opp_table(opp_table); } -EXPORT_SYMBOL_GPL(dev_pm_opp_register_put_opp_helper); +EXPORT_SYMBOL_GPL(dev_pm_opp_unregister_set_opp_helper); /** * dev_pm_opp_add() - Add an OPP table from a table definitions diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 51ec727b4824..849d21dc4ca7 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -124,7 +124,7 @@ void dev_pm_opp_put_regulators(struct opp_table *opp_table); struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char * name); void dev_pm_opp_put_clkname(struct opp_table *opp_table); struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data)); -void dev_pm_opp_register_put_opp_helper(struct opp_table *opp_table); +void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table); int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq); int dev_pm_opp_set_sharing_cpus(struct device *cpu_dev, const struct cpumask *cpumask); int dev_pm_opp_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask); @@ -243,7 +243,7 @@ static inline struct opp_table *dev_pm_opp_register_set_opp_helper(struct device return ERR_PTR(-ENOTSUPP); } -static inline void dev_pm_opp_register_put_opp_helper(struct opp_table *opp_table) {} +static inline void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table) {} static inline struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name) { From 2b3d58a3adca9b7dec9bd289c5c0fda82eeebfa8 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Sat, 30 Sep 2017 12:16:46 -0300 Subject: [PATCH 28/88] cpufreq: imx6q: Move speed grading check to cpufreq driver On some i.MX6 SoCs (like i.MX6SL, i.MX6SX and i.MX6UL) that do not have speed grading check, opp table will not be created in platform code, so cpufreq driver prints the following error message: cpu cpu0: dev_pm_opp_get_opp_count: OPP table not found (-19) However, this is not really an error in this case because the imx6q-cpufreq driver first calls dev_pm_opp_get_opp_count() and if it fails, it means that platform code does not provide OPP and then dev_pm_opp_of_add_table() will be called. In order to avoid such confusing error message, move the speed grading check from platform code to the imx6q-cpufreq driver. This way the imx6q-cpufreq no longer has to check whether OPP table is supplied by platform code. Tested on a i.MX6Q and i.MX6UL based boards. Signed-off-by: Fabio Estevam Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- arch/arm/mach-imx/mach-imx6q.c | 88 +-------------------------------- drivers/cpufreq/imx6q-cpufreq.c | 85 +++++++++++++++++++++++-------- 2 files changed, 67 insertions(+), 106 deletions(-) diff --git a/arch/arm/mach-imx/mach-imx6q.c b/arch/arm/mach-imx/mach-imx6q.c index 45801b27ee5c..b5f89fdbbb4b 100644 --- a/arch/arm/mach-imx/mach-imx6q.c +++ b/arch/arm/mach-imx/mach-imx6q.c @@ -286,88 +286,6 @@ static void __init imx6q_init_machine(void) imx6q_axi_init(); } -#define OCOTP_CFG3 0x440 -#define OCOTP_CFG3_SPEED_SHIFT 16 -#define OCOTP_CFG3_SPEED_1P2GHZ 0x3 -#define OCOTP_CFG3_SPEED_996MHZ 0x2 -#define OCOTP_CFG3_SPEED_852MHZ 0x1 - -static void __init imx6q_opp_check_speed_grading(struct device *cpu_dev) -{ - struct device_node *np; - void __iomem *base; - u32 val; - - np = of_find_compatible_node(NULL, NULL, "fsl,imx6q-ocotp"); - if (!np) { - pr_warn("failed to find ocotp node\n"); - return; - } - - base = of_iomap(np, 0); - if (!base) { - pr_warn("failed to map ocotp\n"); - goto put_node; - } - - /* - * SPEED_GRADING[1:0] defines the max speed of ARM: - * 2b'11: 1200000000Hz; - * 2b'10: 996000000Hz; - * 2b'01: 852000000Hz; -- i.MX6Q Only, exclusive with 996MHz. - * 2b'00: 792000000Hz; - * We need to set the max speed of ARM according to fuse map. - */ - val = readl_relaxed(base + OCOTP_CFG3); - val >>= OCOTP_CFG3_SPEED_SHIFT; - val &= 0x3; - - if ((val != OCOTP_CFG3_SPEED_1P2GHZ) && cpu_is_imx6q()) - if (dev_pm_opp_disable(cpu_dev, 1200000000)) - pr_warn("failed to disable 1.2 GHz OPP\n"); - if (val < OCOTP_CFG3_SPEED_996MHZ) - if (dev_pm_opp_disable(cpu_dev, 996000000)) - pr_warn("failed to disable 996 MHz OPP\n"); - if (cpu_is_imx6q()) { - if (val != OCOTP_CFG3_SPEED_852MHZ) - if (dev_pm_opp_disable(cpu_dev, 852000000)) - pr_warn("failed to disable 852 MHz OPP\n"); - } - iounmap(base); -put_node: - of_node_put(np); -} - -static void __init imx6q_opp_init(void) -{ - struct device_node *np; - struct device *cpu_dev = get_cpu_device(0); - - if (!cpu_dev) { - pr_warn("failed to get cpu0 device\n"); - return; - } - np = of_node_get(cpu_dev->of_node); - if (!np) { - pr_warn("failed to find cpu0 node\n"); - return; - } - - if (dev_pm_opp_of_add_table(cpu_dev)) { - pr_warn("failed to init OPP table\n"); - goto put_node; - } - - imx6q_opp_check_speed_grading(cpu_dev); - -put_node: - of_node_put(np); -} - -static struct platform_device imx6q_cpufreq_pdev = { - .name = "imx6q-cpufreq", -}; - static void __init imx6q_init_late(void) { /* @@ -377,10 +295,8 @@ static void __init imx6q_init_late(void) if (imx_get_soc_revision() > IMX_CHIP_REVISION_1_1) imx6q_cpuidle_init(); - if (IS_ENABLED(CONFIG_ARM_IMX6Q_CPUFREQ)) { - imx6q_opp_init(); - platform_device_register(&imx6q_cpufreq_pdev); - } + if (IS_ENABLED(CONFIG_ARM_IMX6Q_CPUFREQ)) + platform_device_register_simple("imx6q-cpufreq", -1, NULL, 0); } static void __init imx6q_map_io(void) diff --git a/drivers/cpufreq/imx6q-cpufreq.c b/drivers/cpufreq/imx6q-cpufreq.c index 14466a9b01c0..628fe899cb48 100644 --- a/drivers/cpufreq/imx6q-cpufreq.c +++ b/drivers/cpufreq/imx6q-cpufreq.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -191,6 +192,57 @@ static struct cpufreq_driver imx6q_cpufreq_driver = { .suspend = cpufreq_generic_suspend, }; +#define OCOTP_CFG3 0x440 +#define OCOTP_CFG3_SPEED_SHIFT 16 +#define OCOTP_CFG3_SPEED_1P2GHZ 0x3 +#define OCOTP_CFG3_SPEED_996MHZ 0x2 +#define OCOTP_CFG3_SPEED_852MHZ 0x1 + +static void imx6q_opp_check_speed_grading(struct device *dev) +{ + struct device_node *np; + void __iomem *base; + u32 val; + + np = of_find_compatible_node(NULL, NULL, "fsl,imx6q-ocotp"); + if (!np) + return; + + base = of_iomap(np, 0); + if (!base) { + dev_err(dev, "failed to map ocotp\n"); + goto put_node; + } + + /* + * SPEED_GRADING[1:0] defines the max speed of ARM: + * 2b'11: 1200000000Hz; + * 2b'10: 996000000Hz; + * 2b'01: 852000000Hz; -- i.MX6Q Only, exclusive with 996MHz. + * 2b'00: 792000000Hz; + * We need to set the max speed of ARM according to fuse map. + */ + val = readl_relaxed(base + OCOTP_CFG3); + val >>= OCOTP_CFG3_SPEED_SHIFT; + val &= 0x3; + + if ((val != OCOTP_CFG3_SPEED_1P2GHZ) && + of_machine_is_compatible("fsl,imx6q")) + if (dev_pm_opp_disable(dev, 1200000000)) + dev_warn(dev, "failed to disable 1.2GHz OPP\n"); + if (val < OCOTP_CFG3_SPEED_996MHZ) + if (dev_pm_opp_disable(dev, 996000000)) + dev_warn(dev, "failed to disable 996MHz OPP\n"); + if (of_machine_is_compatible("fsl,imx6q")) { + if (val != OCOTP_CFG3_SPEED_852MHZ) + if (dev_pm_opp_disable(dev, 852000000)) + dev_warn(dev, "failed to disable 852MHz OPP\n"); + } + iounmap(base); +put_node: + of_node_put(np); +} + static int imx6q_cpufreq_probe(struct platform_device *pdev) { struct device_node *np; @@ -252,28 +304,21 @@ static int imx6q_cpufreq_probe(struct platform_device *pdev) goto put_reg; } - /* - * We expect an OPP table supplied by platform. - * Just, incase the platform did not supply the OPP - * table, it will try to get it. - */ + ret = dev_pm_opp_of_add_table(cpu_dev); + if (ret < 0) { + dev_err(cpu_dev, "failed to init OPP table: %d\n", ret); + goto put_reg; + } + + imx6q_opp_check_speed_grading(cpu_dev); + + /* Because we have added the OPPs here, we must free them */ + free_opp = true; num = dev_pm_opp_get_opp_count(cpu_dev); if (num < 0) { - ret = dev_pm_opp_of_add_table(cpu_dev); - if (ret < 0) { - dev_err(cpu_dev, "failed to init OPP table: %d\n", ret); - goto put_reg; - } - - /* Because we have added the OPPs here, we must free them */ - free_opp = true; - - num = dev_pm_opp_get_opp_count(cpu_dev); - if (num < 0) { - ret = num; - dev_err(cpu_dev, "no OPP table is found: %d\n", ret); - goto out_free_opp; - } + ret = num; + dev_err(cpu_dev, "no OPP table is found: %d\n", ret); + goto out_free_opp; } ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table); From 11f2c0d77ca856ce146a3e05e7bdfc0c5e6e0f61 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Wed, 4 Oct 2017 08:38:28 +0200 Subject: [PATCH 29/88] cpufreq: dt: Remove support for Exynos4212 SoCs Support for Exynos4212 SoCs has been removed by commit bca9085e0ae9 "ARM: dts: exynos: remove Exynos4212 support (dead code)", so there is no need to keep remaining dead code related to this SoC version. Signed-off-by: Marek Szyprowski Acked-by: Viresh Kumar Acked-by: Krzysztof Kozlowski Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq-dt-platdev.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c index de659fc564f4..ecc56e26f8f6 100644 --- a/drivers/cpufreq/cpufreq-dt-platdev.c +++ b/drivers/cpufreq/cpufreq-dt-platdev.c @@ -48,7 +48,6 @@ static const struct of_device_id whitelist[] __initconst = { { .compatible = "samsung,exynos3250", }, { .compatible = "samsung,exynos4210", }, - { .compatible = "samsung,exynos4212", }, { .compatible = "samsung,exynos5250", }, #ifndef CONFIG_BL_SWITCHER { .compatible = "samsung,exynos5800", }, From 9e9704ea5baf09e4c61be5901439da34c39995d0 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Fri, 6 Oct 2017 09:02:06 +0200 Subject: [PATCH 30/88] PM / Domains: Rename genpd internals from pm_genpd_* to genpd_* Most of the functions names has already moved the genpd naming rules, however let's make this complete to avoid any further confusions. Signed-off-by: Ulf Hansson Reviewed-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 104 +++++++++++++++++------------------- 1 file changed, 50 insertions(+), 54 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index e8ca5e2cf1e5..a6e4c8d7d837 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -749,11 +749,7 @@ late_initcall(genpd_power_off_unused); #if defined(CONFIG_PM_SLEEP) || defined(CONFIG_PM_GENERIC_DOMAINS_OF) -/** - * pm_genpd_present - Check if the given PM domain has been initialized. - * @genpd: PM domain to check. - */ -static bool pm_genpd_present(const struct generic_pm_domain *genpd) +static bool genpd_present(const struct generic_pm_domain *genpd) { const struct generic_pm_domain *gpd; @@ -863,7 +859,7 @@ static void genpd_sync_power_on(struct generic_pm_domain *genpd, bool use_lock, * @genpd: PM domain the device belongs to. * * There are two cases in which a device that can wake up the system from sleep - * states should be resumed by pm_genpd_prepare(): (1) if the device is enabled + * states should be resumed by genpd_prepare(): (1) if the device is enabled * to wake up the system and it has to remain active for this purpose while the * system is in the sleep state and (2) if the device is not enabled to wake up * the system from sleep states and it generally doesn't generate wakeup signals @@ -886,7 +882,7 @@ static bool resume_needed(struct device *dev, } /** - * pm_genpd_prepare - Start power transition of a device in a PM domain. + * genpd_prepare - Start power transition of a device in a PM domain. * @dev: Device to start the transition of. * * Start a power transition of a device (during a system-wide power transition) @@ -894,7 +890,7 @@ static bool resume_needed(struct device *dev, * an object of type struct generic_pm_domain representing a PM domain * consisting of I/O devices. */ -static int pm_genpd_prepare(struct device *dev) +static int genpd_prepare(struct device *dev) { struct generic_pm_domain *genpd; int ret; @@ -975,13 +971,13 @@ static int genpd_finish_suspend(struct device *dev, bool poweroff) } /** - * pm_genpd_suspend_noirq - Completion of suspend of device in an I/O PM domain. + * genpd_suspend_noirq - Completion of suspend of device in an I/O PM domain. * @dev: Device to suspend. * * Stop the device and remove power from the domain if all devices in it have * been stopped. */ -static int pm_genpd_suspend_noirq(struct device *dev) +static int genpd_suspend_noirq(struct device *dev) { dev_dbg(dev, "%s()\n", __func__); @@ -989,12 +985,12 @@ static int pm_genpd_suspend_noirq(struct device *dev) } /** - * pm_genpd_resume_noirq - Start of resume of device in an I/O PM domain. + * genpd_resume_noirq - Start of resume of device in an I/O PM domain. * @dev: Device to resume. * * Restore power to the device's PM domain, if necessary, and start the device. */ -static int pm_genpd_resume_noirq(struct device *dev) +static int genpd_resume_noirq(struct device *dev) { struct generic_pm_domain *genpd; int ret = 0; @@ -1024,7 +1020,7 @@ static int pm_genpd_resume_noirq(struct device *dev) } /** - * pm_genpd_freeze_noirq - Completion of freezing a device in an I/O PM domain. + * genpd_freeze_noirq - Completion of freezing a device in an I/O PM domain. * @dev: Device to freeze. * * Carry out a late freeze of a device under the assumption that its @@ -1032,7 +1028,7 @@ static int pm_genpd_resume_noirq(struct device *dev) * struct generic_pm_domain representing a power domain consisting of I/O * devices. */ -static int pm_genpd_freeze_noirq(struct device *dev) +static int genpd_freeze_noirq(struct device *dev) { const struct generic_pm_domain *genpd; int ret = 0; @@ -1054,13 +1050,13 @@ static int pm_genpd_freeze_noirq(struct device *dev) } /** - * pm_genpd_thaw_noirq - Early thaw of device in an I/O PM domain. + * genpd_thaw_noirq - Early thaw of device in an I/O PM domain. * @dev: Device to thaw. * * Start the device, unless power has been removed from the domain already * before the system transition. */ -static int pm_genpd_thaw_noirq(struct device *dev) +static int genpd_thaw_noirq(struct device *dev) { const struct generic_pm_domain *genpd; int ret = 0; @@ -1081,14 +1077,14 @@ static int pm_genpd_thaw_noirq(struct device *dev) } /** - * pm_genpd_poweroff_noirq - Completion of hibernation of device in an + * genpd_poweroff_noirq - Completion of hibernation of device in an * I/O PM domain. * @dev: Device to poweroff. * * Stop the device and remove power from the domain if all devices in it have * been stopped. */ -static int pm_genpd_poweroff_noirq(struct device *dev) +static int genpd_poweroff_noirq(struct device *dev) { dev_dbg(dev, "%s()\n", __func__); @@ -1096,13 +1092,13 @@ static int pm_genpd_poweroff_noirq(struct device *dev) } /** - * pm_genpd_restore_noirq - Start of restore of device in an I/O PM domain. + * genpd_restore_noirq - Start of restore of device in an I/O PM domain. * @dev: Device to resume. * * Make sure the domain will be in the same power state as before the * hibernation the system is resuming from and start the device if necessary. */ -static int pm_genpd_restore_noirq(struct device *dev) +static int genpd_restore_noirq(struct device *dev) { struct generic_pm_domain *genpd; int ret = 0; @@ -1139,7 +1135,7 @@ static int pm_genpd_restore_noirq(struct device *dev) } /** - * pm_genpd_complete - Complete power transition of a device in a power domain. + * genpd_complete - Complete power transition of a device in a power domain. * @dev: Device to complete the transition of. * * Complete a power transition of a device (during a system-wide power @@ -1147,7 +1143,7 @@ static int pm_genpd_restore_noirq(struct device *dev) * domain member of an object of type struct generic_pm_domain representing * a power domain consisting of I/O devices. */ -static void pm_genpd_complete(struct device *dev) +static void genpd_complete(struct device *dev) { struct generic_pm_domain *genpd; @@ -1180,7 +1176,7 @@ static void genpd_syscore_switch(struct device *dev, bool suspend) struct generic_pm_domain *genpd; genpd = dev_to_genpd(dev); - if (!pm_genpd_present(genpd)) + if (!genpd_present(genpd)) return; if (suspend) { @@ -1206,14 +1202,14 @@ EXPORT_SYMBOL_GPL(pm_genpd_syscore_poweron); #else /* !CONFIG_PM_SLEEP */ -#define pm_genpd_prepare NULL -#define pm_genpd_suspend_noirq NULL -#define pm_genpd_resume_noirq NULL -#define pm_genpd_freeze_noirq NULL -#define pm_genpd_thaw_noirq NULL -#define pm_genpd_poweroff_noirq NULL -#define pm_genpd_restore_noirq NULL -#define pm_genpd_complete NULL +#define genpd_prepare NULL +#define genpd_suspend_noirq NULL +#define genpd_resume_noirq NULL +#define genpd_freeze_noirq NULL +#define genpd_thaw_noirq NULL +#define genpd_poweroff_noirq NULL +#define genpd_restore_noirq NULL +#define genpd_complete NULL #endif /* CONFIG_PM_SLEEP */ @@ -1574,14 +1570,14 @@ int pm_genpd_init(struct generic_pm_domain *genpd, genpd->accounting_time = ktime_get(); genpd->domain.ops.runtime_suspend = genpd_runtime_suspend; genpd->domain.ops.runtime_resume = genpd_runtime_resume; - genpd->domain.ops.prepare = pm_genpd_prepare; - genpd->domain.ops.suspend_noirq = pm_genpd_suspend_noirq; - genpd->domain.ops.resume_noirq = pm_genpd_resume_noirq; - genpd->domain.ops.freeze_noirq = pm_genpd_freeze_noirq; - genpd->domain.ops.thaw_noirq = pm_genpd_thaw_noirq; - genpd->domain.ops.poweroff_noirq = pm_genpd_poweroff_noirq; - genpd->domain.ops.restore_noirq = pm_genpd_restore_noirq; - genpd->domain.ops.complete = pm_genpd_complete; + genpd->domain.ops.prepare = genpd_prepare; + genpd->domain.ops.suspend_noirq = genpd_suspend_noirq; + genpd->domain.ops.resume_noirq = genpd_resume_noirq; + genpd->domain.ops.freeze_noirq = genpd_freeze_noirq; + genpd->domain.ops.thaw_noirq = genpd_thaw_noirq; + genpd->domain.ops.poweroff_noirq = genpd_poweroff_noirq; + genpd->domain.ops.restore_noirq = genpd_restore_noirq; + genpd->domain.ops.complete = genpd_complete; if (genpd->flags & GENPD_FLAG_PM_CLK) { genpd->dev_ops.stop = pm_clk_suspend; @@ -1795,7 +1791,7 @@ int of_genpd_add_provider_simple(struct device_node *np, mutex_lock(&gpd_list_lock); - if (pm_genpd_present(genpd)) { + if (genpd_present(genpd)) { ret = genpd_add_provider(np, genpd_xlate_simple, genpd); if (!ret) { genpd->provider = &np->fwnode; @@ -1831,7 +1827,7 @@ int of_genpd_add_provider_onecell(struct device_node *np, for (i = 0; i < data->num_domains; i++) { if (!data->domains[i]) continue; - if (!pm_genpd_present(data->domains[i])) + if (!genpd_present(data->domains[i])) goto error; data->domains[i]->provider = &np->fwnode; @@ -2274,7 +2270,7 @@ EXPORT_SYMBOL_GPL(of_genpd_parse_idle_states); #include #include #include -static struct dentry *pm_genpd_debugfs_dir; +static struct dentry *genpd_debugfs_dir; /* * TODO: This function is a slightly modified version of rtpm_status_show @@ -2302,8 +2298,8 @@ static void rtpm_status_str(struct seq_file *s, struct device *dev) seq_puts(s, p); } -static int pm_genpd_summary_one(struct seq_file *s, - struct generic_pm_domain *genpd) +static int genpd_summary_one(struct seq_file *s, + struct generic_pm_domain *genpd) { static const char * const status_lookup[] = { [GPD_STATE_ACTIVE] = "on", @@ -2373,7 +2369,7 @@ static int genpd_summary_show(struct seq_file *s, void *data) return -ERESTARTSYS; list_for_each_entry(genpd, &gpd_list, gpd_list_node) { - ret = pm_genpd_summary_one(s, genpd); + ret = genpd_summary_one(s, genpd); if (ret) break; } @@ -2559,23 +2555,23 @@ define_genpd_debugfs_fops(active_time); define_genpd_debugfs_fops(total_idle_time); define_genpd_debugfs_fops(devices); -static int __init pm_genpd_debug_init(void) +static int __init genpd_debug_init(void) { struct dentry *d; struct generic_pm_domain *genpd; - pm_genpd_debugfs_dir = debugfs_create_dir("pm_genpd", NULL); + genpd_debugfs_dir = debugfs_create_dir("pm_genpd", NULL); - if (!pm_genpd_debugfs_dir) + if (!genpd_debugfs_dir) return -ENOMEM; d = debugfs_create_file("pm_genpd_summary", S_IRUGO, - pm_genpd_debugfs_dir, NULL, &genpd_summary_fops); + genpd_debugfs_dir, NULL, &genpd_summary_fops); if (!d) return -ENOMEM; list_for_each_entry(genpd, &gpd_list, gpd_list_node) { - d = debugfs_create_dir(genpd->name, pm_genpd_debugfs_dir); + d = debugfs_create_dir(genpd->name, genpd_debugfs_dir); if (!d) return -ENOMEM; @@ -2595,11 +2591,11 @@ static int __init pm_genpd_debug_init(void) return 0; } -late_initcall(pm_genpd_debug_init); +late_initcall(genpd_debug_init); -static void __exit pm_genpd_debug_exit(void) +static void __exit genpd_debug_exit(void) { - debugfs_remove_recursive(pm_genpd_debugfs_dir); + debugfs_remove_recursive(genpd_debugfs_dir); } -__exitcall(pm_genpd_debug_exit); +__exitcall(genpd_debug_exit); #endif /* CONFIG_DEBUG_FS */ From 0563bb7ba67eec7a87a9ccc04b80bb59de26c319 Mon Sep 17 00:00:00 2001 From: Jason Baron Date: Fri, 6 Oct 2017 13:19:45 -0400 Subject: [PATCH 31/88] intel_idle: replace conditionals with static_cpu_has(X86_FEATURE_ARAT) If the 'arat' cpu flag is set, then the conditionals in intel_idle() that guard calling tick_broadcast_enter()/exit() will never be true. Use static_cpu_has(X86_FEATURE_ARAT) to create a fast path to replace the conditional. Signed-off-by: Jason Baron Acked-by: Jacob Pan Signed-off-by: Rafael J. Wysocki --- drivers/idle/intel_idle.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 5dc7ea4b6bc4..5db5e3176f6a 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -913,8 +913,7 @@ static __cpuidle int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state = &drv->states[index]; unsigned long eax = flg2MWAIT(state->flags); unsigned int cstate; - - cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1; + bool uninitialized_var(tick); /* * NB: if CPUIDLE_FLAG_TLB_FLUSHED is set, this idle transition @@ -923,12 +922,19 @@ static __cpuidle int intel_idle(struct cpuidle_device *dev, * useful with this knowledge. */ - if (!(lapic_timer_reliable_states & (1 << (cstate)))) - tick_broadcast_enter(); + if (!static_cpu_has(X86_FEATURE_ARAT)) { + cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & + MWAIT_CSTATE_MASK) + 1; + tick = false; + if (!(lapic_timer_reliable_states & (1 << (cstate)))) { + tick = true; + tick_broadcast_enter(); + } + } mwait_idle_with_hints(eax, ecx); - if (!(lapic_timer_reliable_states & (1 << (cstate)))) + if (!static_cpu_has(X86_FEATURE_ARAT) && tick) tick_broadcast_exit(); return index; From e200052f826295ea606039d15fa518c401d64b56 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Wed, 6 Sep 2017 22:27:54 +0200 Subject: [PATCH 32/88] PM / AVS: Use %pS printk format for direct addresses Use the %pS instead of the %pF printk format specifier for printing symbols from direct addresses. This is needed for the ia64, ppc64 and parisc64 architectures. Signed-off-by: Helge Deller Acked-by: Nishanth Menon Signed-off-by: Rafael J. Wysocki --- drivers/power/avs/smartreflex.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/power/avs/smartreflex.c b/drivers/power/avs/smartreflex.c index 974fd684bab2..89bf4d6cb486 100644 --- a/drivers/power/avs/smartreflex.c +++ b/drivers/power/avs/smartreflex.c @@ -355,7 +355,7 @@ int sr_configure_errgen(struct omap_sr *sr) u8 senp_shift, senn_shift; if (!sr) { - pr_warn("%s: NULL omap_sr from %pF\n", + pr_warn("%s: NULL omap_sr from %pS\n", __func__, (void *)_RET_IP_); return -EINVAL; } @@ -422,7 +422,7 @@ int sr_disable_errgen(struct omap_sr *sr) u32 vpboundint_en, vpboundint_st; if (!sr) { - pr_warn("%s: NULL omap_sr from %pF\n", + pr_warn("%s: NULL omap_sr from %pS\n", __func__, (void *)_RET_IP_); return -EINVAL; } @@ -477,7 +477,7 @@ int sr_configure_minmax(struct omap_sr *sr) u8 senp_shift, senn_shift; if (!sr) { - pr_warn("%s: NULL omap_sr from %pF\n", + pr_warn("%s: NULL omap_sr from %pS\n", __func__, (void *)_RET_IP_); return -EINVAL; } @@ -562,7 +562,7 @@ int sr_enable(struct omap_sr *sr, unsigned long volt) int ret; if (!sr) { - pr_warn("%s: NULL omap_sr from %pF\n", + pr_warn("%s: NULL omap_sr from %pS\n", __func__, (void *)_RET_IP_); return -EINVAL; } @@ -614,7 +614,7 @@ int sr_enable(struct omap_sr *sr, unsigned long volt) void sr_disable(struct omap_sr *sr) { if (!sr) { - pr_warn("%s: NULL omap_sr from %pF\n", + pr_warn("%s: NULL omap_sr from %pS\n", __func__, (void *)_RET_IP_); return; } From 63705c406a8adbd6f26691148b09d466dd4d8d2f Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 10 Oct 2017 18:49:22 +0200 Subject: [PATCH 33/88] ACPI / PM: Combine two identical device resume routines Notice that acpi_dev_runtime_resume() and acpi_dev_resume_early() are actually literally identical after some more-or-less recent changes, so rename acpi_dev_runtime_resume() to acpi_dev_resume(), use it everywhere instead of acpi_dev_resume_early() and drop the latter. Signed-off-by: Rafael J. Wysocki Reviewed-by: Ulf Hansson --- drivers/acpi/acpi_lpss.c | 6 +++--- drivers/acpi/device_pm.c | 35 ++++++----------------------------- include/linux/acpi.h | 3 +-- 3 files changed, 10 insertions(+), 34 deletions(-) diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 032ae44710e5..81b6096c4177 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -693,7 +693,7 @@ static int acpi_lpss_activate(struct device *dev) struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev)); int ret; - ret = acpi_dev_runtime_resume(dev); + ret = acpi_dev_resume(dev); if (ret) return ret; @@ -737,7 +737,7 @@ static int acpi_lpss_resume_early(struct device *dev) struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev)); int ret; - ret = acpi_dev_resume_early(dev); + ret = acpi_dev_resume(dev); if (ret) return ret; @@ -872,7 +872,7 @@ static int acpi_lpss_runtime_resume(struct device *dev) if (lpss_quirks & LPSS_QUIRK_ALWAYS_POWER_ON && iosf_mbi_available()) lpss_iosf_exit_d3_state(); - ret = acpi_dev_runtime_resume(dev); + ret = acpi_dev_resume(dev); if (ret) return ret; diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index fbcc73f7a099..6eb51145dcf7 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -882,14 +882,13 @@ int acpi_dev_runtime_suspend(struct device *dev) EXPORT_SYMBOL_GPL(acpi_dev_runtime_suspend); /** - * acpi_dev_runtime_resume - Put device into the full-power state using ACPI. + * acpi_dev_resume - Put device into the full-power state using ACPI. * @dev: Device to put into the full-power state. * * Put the given device into the full-power state using the standard ACPI - * mechanism at run time. Set the power state of the device to ACPI D0 and - * disable remote wakeup. + * mechanism. Set the power state of the device to ACPI D0 and disable wakeup. */ -int acpi_dev_runtime_resume(struct device *dev) +int acpi_dev_resume(struct device *dev) { struct acpi_device *adev = ACPI_COMPANION(dev); int error; @@ -901,7 +900,7 @@ int acpi_dev_runtime_resume(struct device *dev) acpi_device_wakeup_disable(adev); return error; } -EXPORT_SYMBOL_GPL(acpi_dev_runtime_resume); +EXPORT_SYMBOL_GPL(acpi_dev_resume); /** * acpi_subsys_runtime_suspend - Suspend device using ACPI. @@ -926,7 +925,7 @@ EXPORT_SYMBOL_GPL(acpi_subsys_runtime_suspend); */ int acpi_subsys_runtime_resume(struct device *dev) { - int ret = acpi_dev_runtime_resume(dev); + int ret = acpi_dev_resume(dev); return ret ? ret : pm_generic_runtime_resume(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_runtime_resume); @@ -967,28 +966,6 @@ int acpi_dev_suspend_late(struct device *dev) } EXPORT_SYMBOL_GPL(acpi_dev_suspend_late); -/** - * acpi_dev_resume_early - Put device into the full-power state using ACPI. - * @dev: Device to put into the full-power state. - * - * Put the given device into the full-power state using the standard ACPI - * mechanism during system transition to the working state. Set the power - * state of the device to ACPI D0 and disable remote wakeup. - */ -int acpi_dev_resume_early(struct device *dev) -{ - struct acpi_device *adev = ACPI_COMPANION(dev); - int error; - - if (!adev) - return 0; - - error = acpi_dev_pm_full_power(adev); - acpi_device_wakeup_disable(adev); - return error; -} -EXPORT_SYMBOL_GPL(acpi_dev_resume_early); - /** * acpi_subsys_prepare - Prepare device for system transition to a sleep state. * @dev: Device to prepare. @@ -1057,7 +1034,7 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_late); */ int acpi_subsys_resume_early(struct device *dev) { - int ret = acpi_dev_resume_early(dev); + int ret = acpi_dev_resume(dev); return ret ? ret : pm_generic_resume_early(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_resume_early); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 3b89b4fe6812..d18c92d4ba19 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -865,7 +865,7 @@ static inline void arch_reserve_mem_area(acpi_physical_address addr, #if defined(CONFIG_ACPI) && defined(CONFIG_PM) int acpi_dev_runtime_suspend(struct device *dev); -int acpi_dev_runtime_resume(struct device *dev); +int acpi_dev_resume(struct device *dev); int acpi_subsys_runtime_suspend(struct device *dev); int acpi_subsys_runtime_resume(struct device *dev); int acpi_dev_pm_attach(struct device *dev, bool power_on); @@ -882,7 +882,6 @@ static inline int acpi_dev_pm_attach(struct device *dev, bool power_on) #if defined(CONFIG_ACPI) && defined(CONFIG_PM_SLEEP) int acpi_dev_suspend_late(struct device *dev); -int acpi_dev_resume_early(struct device *dev); int acpi_subsys_prepare(struct device *dev); void acpi_subsys_complete(struct device *dev); int acpi_subsys_suspend_late(struct device *dev); From e4da817d2acbd05217adc0dc821bc8361e86ee30 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Tue, 3 Oct 2017 09:11:06 +0200 Subject: [PATCH 34/88] ACPI / PM: Restore acpi_subsys_complete() Commit 58a1fbbb2ee8 (PM / PCI / ACPI: Kick devices that might have been reset by firmware), made PCI's and ACPI's ->complete() callbacks to be assigned to a new API called pm_complete_with_resume_check(), which was introduced in the same change. Later it turned out that using pm_complete_with_resume_check() wasn't good enough for PCI, as it needed additional PCI specific checks, before deciding whether runtime resuming the device is needed when running the ->complete() callback. This leaves ACPI as the only user of pm_complete_with_resume_check(). Therefore let's restore ACPI's acpi_subsys_complete(), which was dropped in commit 58a1fbbb2ee8 (PM / PCI / ACPI: Kick devices that might have been reset by firmware). This enables us to remove the pm_complete_with_resume_check() API in a following change, but it also enables ACPI to add more ACPI specific checks in acpi_subsys_complete() if that turns out to be necessary. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/acpi/acpi_lpss.c | 2 +- drivers/acpi/device_pm.c | 19 ++++++++++++++++++- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 81b6096c4177..97b753dd2e6e 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -894,7 +894,7 @@ static struct dev_pm_domain acpi_lpss_pm_domain = { #ifdef CONFIG_PM #ifdef CONFIG_PM_SLEEP .prepare = acpi_subsys_prepare, - .complete = pm_complete_with_resume_check, + .complete = acpi_subsys_complete, .suspend = acpi_subsys_suspend, .suspend_late = acpi_lpss_suspend_late, .resume_early = acpi_lpss_resume_early, diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index 6eb51145dcf7..d17fac453a30 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -996,6 +996,23 @@ int acpi_subsys_prepare(struct device *dev) } EXPORT_SYMBOL_GPL(acpi_subsys_prepare); +/** + * acpi_subsys_complete - Finalize device's resume during system resume. + * @dev: Device to handle. + */ +void acpi_subsys_complete(struct device *dev) +{ + pm_generic_complete(dev); + /* + * If the device had been runtime-suspended before the system went into + * the sleep state it is going out of and it has never been resumed till + * now, resume it in case the firmware powered it up. + */ + if (dev->power.direct_complete && pm_resume_via_firmware()) + pm_request_resume(dev); +} +EXPORT_SYMBOL_GPL(acpi_subsys_complete); + /** * acpi_subsys_suspend - Run the device driver's suspend callback. * @dev: Device to handle. @@ -1064,7 +1081,7 @@ static struct dev_pm_domain acpi_general_pm_domain = { .runtime_resume = acpi_subsys_runtime_resume, #ifdef CONFIG_PM_SLEEP .prepare = acpi_subsys_prepare, - .complete = pm_complete_with_resume_check, + .complete = acpi_subsys_complete, .suspend = acpi_subsys_suspend, .suspend_late = acpi_subsys_suspend_late, .resume_early = acpi_subsys_resume_early, From c2ebf788f927dcca72beead19fab5f5aba79a098 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Tue, 3 Oct 2017 09:11:08 +0200 Subject: [PATCH 35/88] ACPI / PM: Split code validating need for runtime resume in ->prepare() Move the code dealing with validation of whether runtime resuming the device is needed during system suspend. In this way it becomes more clear for what circumstances ACPI is prevented from trying the direct_complete path. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/acpi/device_pm.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index d17fac453a30..764b8dfa04aa 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -966,6 +966,27 @@ int acpi_dev_suspend_late(struct device *dev) } EXPORT_SYMBOL_GPL(acpi_dev_suspend_late); +static bool acpi_dev_needs_resume(struct device *dev, struct acpi_device *adev) +{ + u32 sys_target = acpi_target_system_state(); + int ret, state; + + if (device_may_wakeup(dev) != !!adev->wakeup.prepare_count) + return true; + + if (sys_target == ACPI_STATE_S0) + return false; + + if (adev->power.flags.dsw_present) + return true; + + ret = acpi_dev_pm_get_state(dev, adev, sys_target, NULL, &state); + if (ret) + return true; + + return state != adev->power.state; +} + /** * acpi_subsys_prepare - Prepare device for system transition to a sleep state. * @dev: Device to prepare. @@ -973,26 +994,16 @@ EXPORT_SYMBOL_GPL(acpi_dev_suspend_late); int acpi_subsys_prepare(struct device *dev) { struct acpi_device *adev = ACPI_COMPANION(dev); - u32 sys_target; - int ret, state; + int ret; ret = pm_generic_prepare(dev); if (ret < 0) return ret; - if (!adev || !pm_runtime_suspended(dev) - || device_may_wakeup(dev) != !!adev->wakeup.prepare_count) + if (!adev || !pm_runtime_suspended(dev)) return 0; - sys_target = acpi_target_system_state(); - if (sys_target == ACPI_STATE_S0) - return 1; - - if (adev->power.flags.dsw_present) - return 0; - - ret = acpi_dev_pm_get_state(dev, adev, sys_target, NULL, &state); - return !ret && state == adev->power.state; + return !acpi_dev_needs_resume(dev, adev); } EXPORT_SYMBOL_GPL(acpi_subsys_prepare); From eeb2d80d502af28e5660ff4bbe00f90ceb82c2db Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Thu, 5 Oct 2017 16:24:03 -0700 Subject: [PATCH 36/88] ACPI / LPIT: Add Low Power Idle Table (LPIT) support Add functionality to read LPIT table, which provides: - Sysfs interface to read residency counters via /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us Here the count "low_power_idle_cpu_residency_us" shows the time spent by CPU package in low power state. This is read via MSR interface, which points to MSR for PKG C10. Here the count "low_power_idle_system_residency_us" show the count the system was in low power state. This is read via MMIO interface. This is mapped to SLP_S0 residency on modern Intel systems. This residency is achieved only when CPU is in PKG C10 and all functional blocks are in low power state. It is possible that none of the above counters present or anyone of the counter present or all counters present. For example: On my Kabylake system both of the above counters present. After suspend to idle these counts updated and prints: 6916179 6998564 This counter can be read by tools like turbostat to display. Or it can be used to debug, if modern systems are reaching desired low power state. - Provides an interface to read residency counter memory address This address can be used to get the base address of PMC memory mapped IO. This is utilized by intel_pmc_core driver to print more debug information. In addition, to avoid code duplication to read iomem, removed the read of iomem from acpi_os_read_memory() in osl.c and made a common function acpi_os_read_iomem(). This new function is used for reading iomem in in both osl.c and acpi_lpit.c. Link: http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf Signed-off-by: Srinivas Pandruvada Signed-off-by: Rafael J. Wysocki --- Documentation/acpi/lpit.txt | 25 ++++++ drivers/acpi/Kconfig | 5 ++ drivers/acpi/Makefile | 1 + drivers/acpi/acpi_lpit.c | 162 ++++++++++++++++++++++++++++++++++++ drivers/acpi/internal.h | 6 ++ drivers/acpi/osl.c | 50 ++++++----- drivers/acpi/scan.c | 1 + include/acpi/acpiosxf.h | 2 + include/linux/acpi.h | 9 ++ 9 files changed, 241 insertions(+), 20 deletions(-) create mode 100644 Documentation/acpi/lpit.txt create mode 100644 drivers/acpi/acpi_lpit.c diff --git a/Documentation/acpi/lpit.txt b/Documentation/acpi/lpit.txt new file mode 100644 index 000000000000..b426398d2e97 --- /dev/null +++ b/Documentation/acpi/lpit.txt @@ -0,0 +1,25 @@ +To enumerate platform Low Power Idle states, Intel platforms are using +“Low Power Idle Table” (LPIT). More details about this table can be +downloaded from: +http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf + +Residencies for each low power state can be read via FFH +(Function fixed hardware) or a memory mapped interface. + +On platforms supporting S0ix sleep states, there can be two types of +residencies: +- CPU PKG C10 (Read via FFH interface) +- Platform Controller Hub (PCH) SLP_S0 (Read via memory mapped interface) + +The following attributes are added dynamically to the cpuidle +sysfs attribute group: + /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us + /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us + +The "low_power_idle_cpu_residency_us" attribute shows time spent +by the CPU package in PKG C10 + +The "low_power_idle_system_residency_us" attribute shows SLP_S0 +residency, or system time spent with the SLP_S0# signal asserted. +This is the lowest possible system power state, achieved only when CPU is in +PKG C10 and all functional blocks in PCH are in a low power state. diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 1ce52f84dc23..4bfef0f78cde 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -80,6 +80,11 @@ endif config ACPI_SPCR_TABLE bool +config ACPI_LPIT + bool + depends on X86_64 + default y + config ACPI_SLEEP bool depends on SUSPEND || HIBERNATION diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 90265ab4437a..6a19bd7aba21 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -56,6 +56,7 @@ acpi-$(CONFIG_DEBUG_FS) += debugfs.o acpi-$(CONFIG_ACPI_NUMA) += numa.o acpi-$(CONFIG_ACPI_PROCFS_POWER) += cm_sbs.o acpi-y += acpi_lpat.o +acpi-$(CONFIG_ACPI_LPIT) += acpi_lpit.o acpi-$(CONFIG_ACPI_GENERIC_GSI) += irq.o acpi-$(CONFIG_ACPI_WATCHDOG) += acpi_watchdog.o diff --git a/drivers/acpi/acpi_lpit.c b/drivers/acpi/acpi_lpit.c new file mode 100644 index 000000000000..e94e478dd18b --- /dev/null +++ b/drivers/acpi/acpi_lpit.c @@ -0,0 +1,162 @@ + +/* + * acpi_lpit.c - LPIT table processing functions + * + * Copyright (C) 2017 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version + * 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include +#include + +struct lpit_residency_info { + struct acpi_generic_address gaddr; + u64 frequency; + void __iomem *iomem_addr; +}; + +/* Storage for an memory mapped and FFH based entries */ +static struct lpit_residency_info residency_info_mem; +static struct lpit_residency_info residency_info_ffh; + +static int lpit_read_residency_counter_us(u64 *counter, bool io_mem) +{ + int err; + + if (io_mem) { + u64 count = 0; + int error; + + error = acpi_os_read_iomem(residency_info_mem.iomem_addr, &count, + residency_info_mem.gaddr.bit_width); + if (error) + return error; + + *counter = div64_u64(count * 1000000ULL, residency_info_mem.frequency); + return 0; + } + + err = rdmsrl_safe(residency_info_ffh.gaddr.address, counter); + if (!err) { + u64 mask = GENMASK_ULL(residency_info_ffh.gaddr.bit_offset + + residency_info_ffh.gaddr. bit_width - 1, + residency_info_ffh.gaddr.bit_offset); + + *counter &= mask; + *counter >>= residency_info_ffh.gaddr.bit_offset; + *counter = div64_u64(*counter * 1000000ULL, residency_info_ffh.frequency); + return 0; + } + + return -ENODATA; +} + +static ssize_t low_power_idle_system_residency_us_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + u64 counter; + int ret; + + ret = lpit_read_residency_counter_us(&counter, true); + if (ret) + return ret; + + return sprintf(buf, "%llu\n", counter); +} +static DEVICE_ATTR_RO(low_power_idle_system_residency_us); + +static ssize_t low_power_idle_cpu_residency_us_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + u64 counter; + int ret; + + ret = lpit_read_residency_counter_us(&counter, false); + if (ret) + return ret; + + return sprintf(buf, "%llu\n", counter); +} +static DEVICE_ATTR_RO(low_power_idle_cpu_residency_us); + +int lpit_read_residency_count_address(u64 *address) +{ + if (!residency_info_mem.gaddr.address) + return -EINVAL; + + *address = residency_info_mem.gaddr.address; + + return 0; +} + +static void lpit_update_residency(struct lpit_residency_info *info, + struct acpi_lpit_native *lpit_native) +{ + info->frequency = lpit_native->counter_frequency ? + lpit_native->counter_frequency : tsc_khz * 1000; + if (!info->frequency) + info->frequency = 1; + + info->gaddr = lpit_native->residency_counter; + if (info->gaddr.space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) { + info->iomem_addr = ioremap_nocache(info->gaddr.address, + info->gaddr.bit_width / 8); + if (!info->iomem_addr) + return; + + /* Silently fail, if cpuidle attribute group is not present */ + sysfs_add_file_to_group(&cpu_subsys.dev_root->kobj, + &dev_attr_low_power_idle_system_residency_us.attr, + "cpuidle"); + } else if (info->gaddr.space_id == ACPI_ADR_SPACE_FIXED_HARDWARE) { + /* Silently fail, if cpuidle attribute group is not present */ + sysfs_add_file_to_group(&cpu_subsys.dev_root->kobj, + &dev_attr_low_power_idle_cpu_residency_us.attr, + "cpuidle"); + } +} + +static void lpit_process(u64 begin, u64 end) +{ + while (begin + sizeof(struct acpi_lpit_native) < end) { + struct acpi_lpit_native *lpit_native = (struct acpi_lpit_native *)begin; + + if (!lpit_native->header.type && !lpit_native->header.flags) { + if (lpit_native->residency_counter.space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY && + !residency_info_mem.gaddr.address) { + lpit_update_residency(&residency_info_mem, lpit_native); + } else if (lpit_native->residency_counter.space_id == ACPI_ADR_SPACE_FIXED_HARDWARE && + !residency_info_ffh.gaddr.address) { + lpit_update_residency(&residency_info_ffh, lpit_native); + } + } + begin += lpit_native->header.length; + } +} + +void acpi_init_lpit(void) +{ + acpi_status status; + u64 lpit_begin; + struct acpi_table_lpit *lpit; + + status = acpi_get_table(ACPI_SIG_LPIT, 0, (struct acpi_table_header **)&lpit); + + if (ACPI_FAILURE(status)) + return; + + lpit_begin = (u64)lpit + sizeof(*lpit); + lpit_process(lpit_begin, lpit_begin + lpit->header.length); +} diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h index 4361c4415b4f..fc8c43e76707 100644 --- a/drivers/acpi/internal.h +++ b/drivers/acpi/internal.h @@ -248,4 +248,10 @@ void acpi_watchdog_init(void); static inline void acpi_watchdog_init(void) {} #endif +#ifdef CONFIG_ACPI_LPIT +void acpi_init_lpit(void); +#else +static inline void acpi_init_lpit(void) { } +#endif + #endif /* _ACPI_INTERNAL_H_ */ diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index db78d353bab1..3bb46cb24a99 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -663,26 +663,8 @@ acpi_status acpi_os_write_port(acpi_io_address port, u32 value, u32 width) EXPORT_SYMBOL(acpi_os_write_port); -acpi_status -acpi_os_read_memory(acpi_physical_address phys_addr, u64 *value, u32 width) +int acpi_os_read_iomem(void __iomem *virt_addr, u64 *value, u32 width) { - void __iomem *virt_addr; - unsigned int size = width / 8; - bool unmap = false; - u64 dummy; - - rcu_read_lock(); - virt_addr = acpi_map_vaddr_lookup(phys_addr, size); - if (!virt_addr) { - rcu_read_unlock(); - virt_addr = acpi_os_ioremap(phys_addr, size); - if (!virt_addr) - return AE_BAD_ADDRESS; - unmap = true; - } - - if (!value) - value = &dummy; switch (width) { case 8: @@ -698,9 +680,37 @@ acpi_os_read_memory(acpi_physical_address phys_addr, u64 *value, u32 width) *(u64 *) value = readq(virt_addr); break; default: - BUG(); + return -EINVAL; } + return 0; +} + +acpi_status +acpi_os_read_memory(acpi_physical_address phys_addr, u64 *value, u32 width) +{ + void __iomem *virt_addr; + unsigned int size = width / 8; + bool unmap = false; + u64 dummy; + int error; + + rcu_read_lock(); + virt_addr = acpi_map_vaddr_lookup(phys_addr, size); + if (!virt_addr) { + rcu_read_unlock(); + virt_addr = acpi_os_ioremap(phys_addr, size); + if (!virt_addr) + return AE_BAD_ADDRESS; + unmap = true; + } + + if (!value) + value = &dummy; + + error = acpi_os_read_iomem(virt_addr, value, width); + BUG_ON(error); + if (unmap) iounmap(virt_addr); else diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 602f8ff212f2..81367edc8a10 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2122,6 +2122,7 @@ int __init acpi_scan_init(void) acpi_int340x_thermal_init(); acpi_amba_init(); acpi_watchdog_init(); + acpi_init_lpit(); acpi_scan_add_handler(&generic_device_handler); diff --git a/include/acpi/acpiosxf.h b/include/acpi/acpiosxf.h index c66eb8ffa454..d5c0f5153c4e 100644 --- a/include/acpi/acpiosxf.h +++ b/include/acpi/acpiosxf.h @@ -287,6 +287,8 @@ acpi_status acpi_os_write_port(acpi_io_address address, u32 value, u32 width); /* * Platform and hardware-independent physical memory interfaces */ +int acpi_os_read_iomem(void __iomem *virt_addr, u64 *value, u32 width); + #ifndef ACPI_USE_ALTERNATE_PROTOTYPE_acpi_os_read_memory acpi_status acpi_os_read_memory(acpi_physical_address address, u64 *value, u32 width); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index d18c92d4ba19..2b1738f840ab 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1248,4 +1248,13 @@ int acpi_irq_get(acpi_handle handle, unsigned int index, struct resource *res) } #endif +#ifdef CONFIG_ACPI_LPIT +int lpit_read_residency_count_address(u64 *address); +#else +static inline int lpit_read_residency_count_address(u64 *address) +{ + return -EINVAL; +} +#endif + #endif /*_LINUX_ACPI_H*/ From 0e708fc602531b8355b5de6ea7c98f09129b223f Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Tue, 3 Oct 2017 09:11:07 +0200 Subject: [PATCH 37/88] PM / sleep: Remove pm_complete_with_resume_check() According to recent changes for ACPI, the are longer any users of pm_complete_with_resume_check(), thus let's drop it. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/generic_ops.c | 23 ----------------------- include/linux/pm.h | 1 - 2 files changed, 24 deletions(-) diff --git a/drivers/base/power/generic_ops.c b/drivers/base/power/generic_ops.c index 07c3c4a9522d..b2ed606265a8 100644 --- a/drivers/base/power/generic_ops.c +++ b/drivers/base/power/generic_ops.c @@ -9,7 +9,6 @@ #include #include #include -#include #ifdef CONFIG_PM /** @@ -298,26 +297,4 @@ void pm_generic_complete(struct device *dev) if (drv && drv->pm && drv->pm->complete) drv->pm->complete(dev); } - -/** - * pm_complete_with_resume_check - Complete a device power transition. - * @dev: Device to handle. - * - * Complete a device power transition during a system-wide power transition and - * optionally schedule a runtime resume of the device if the system resume in - * progress has been initated by the platform firmware and the device had its - * power.direct_complete flag set. - */ -void pm_complete_with_resume_check(struct device *dev) -{ - pm_generic_complete(dev); - /* - * If the device had been runtime-suspended before the system went into - * the sleep state it is going out of and it has never been resumed till - * now, resume it in case the firmware powered it up. - */ - if (dev->power.direct_complete && pm_resume_via_firmware()) - pm_request_resume(dev); -} -EXPORT_SYMBOL_GPL(pm_complete_with_resume_check); #endif /* CONFIG_PM_SLEEP */ diff --git a/include/linux/pm.h b/include/linux/pm.h index 47ded8aa8a5d..a0ceeccf2846 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -736,7 +736,6 @@ extern int pm_generic_poweroff_noirq(struct device *dev); extern int pm_generic_poweroff_late(struct device *dev); extern int pm_generic_poweroff(struct device *dev); extern void pm_generic_complete(struct device *dev); -extern void pm_complete_with_resume_check(struct device *dev); #else /* !CONFIG_PM_SLEEP */ From 42f6284ae602469762ee721ec31ddfc6170e00bc Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 12 Oct 2017 15:07:23 +0530 Subject: [PATCH 38/88] PM / Domains: Add support to select performance-state of domains Some platforms have the capability to configure the performance state of PM domains. This patch enhances the genpd core to support such platforms. The performance levels (within the genpd core) are identified by positive integer values, a lower value represents lower performance state. This patch adds a new genpd API, which is called by user drivers (like OPP framework): - int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state); This updates the performance state constraint of the device on its PM domain. On success, the genpd will have its performance state set to a value which is >= "state" passed to this routine. The genpd core calls the genpd->set_performance_state() callback, if implemented, else -ENODEV is returned to the caller. The PM domain drivers need to implement the following callback if they want to support performance states. - int (*set_performance_state)(struct generic_pm_domain *genpd, unsigned int state); This is called internally by the genpd core on several occasions. The genpd core passes the genpd pointer and the aggregate of the performance states of the devices supported by that genpd to this callback. This callback must update the performance state of the genpd (in a platform dependent way). The power domains can avoid supplying above callback, if they don't support setting performance-states. Currently we aren't propagating performance state changes of a subdomain to its masters as we don't have hardware that needs it right now. Over that, the performance states of subdomain and its masters may not have one-to-one mapping and would require additional information. We can get back to this once we have hardware that needs it. Tested-by: Rajendra Nayak Signed-off-by: Viresh Kumar Acked-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 98 +++++++++++++++++++++++++++++++++++++ include/linux/pm_domain.h | 12 +++++ 2 files changed, 110 insertions(+) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index a6e4c8d7d837..7e01ae364d78 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -237,6 +237,95 @@ static void genpd_update_accounting(struct generic_pm_domain *genpd) static inline void genpd_update_accounting(struct generic_pm_domain *genpd) {} #endif +/** + * dev_pm_genpd_set_performance_state- Set performance state of device's power + * domain. + * + * @dev: Device for which the performance-state needs to be set. + * @state: Target performance state of the device. This can be set as 0 when the + * device doesn't have any performance state constraints left (And so + * the device wouldn't participate anymore to find the target + * performance state of the genpd). + * + * It is assumed that the users guarantee that the genpd wouldn't be detached + * while this routine is getting called. + * + * Returns 0 on success and negative error values on failures. + */ +int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state) +{ + struct generic_pm_domain *genpd; + struct generic_pm_domain_data *gpd_data, *pd_data; + struct pm_domain_data *pdd; + unsigned int prev; + int ret = 0; + + genpd = dev_to_genpd(dev); + if (IS_ERR(genpd)) + return -ENODEV; + + if (unlikely(!genpd->set_performance_state)) + return -EINVAL; + + if (unlikely(!dev->power.subsys_data || + !dev->power.subsys_data->domain_data)) { + WARN_ON(1); + return -EINVAL; + } + + genpd_lock(genpd); + + gpd_data = to_gpd_data(dev->power.subsys_data->domain_data); + prev = gpd_data->performance_state; + gpd_data->performance_state = state; + + /* New requested state is same as Max requested state */ + if (state == genpd->performance_state) + goto unlock; + + /* New requested state is higher than Max requested state */ + if (state > genpd->performance_state) + goto update_state; + + /* Traverse all devices within the domain */ + list_for_each_entry(pdd, &genpd->dev_list, list_node) { + pd_data = to_gpd_data(pdd); + + if (pd_data->performance_state > state) + state = pd_data->performance_state; + } + + if (state == genpd->performance_state) + goto unlock; + + /* + * We aren't propagating performance state changes of a subdomain to its + * masters as we don't have hardware that needs it. Over that, the + * performance states of subdomain and its masters may not have + * one-to-one mapping and would require additional information. We can + * get back to this once we have hardware that needs it. For that + * reason, we don't have to consider performance state of the subdomains + * of genpd here. + */ + +update_state: + if (genpd_status_on(genpd)) { + ret = genpd->set_performance_state(genpd, state); + if (ret) { + gpd_data->performance_state = prev; + goto unlock; + } + } + + genpd->performance_state = state; + +unlock: + genpd_unlock(genpd); + + return ret; +} +EXPORT_SYMBOL_GPL(dev_pm_genpd_set_performance_state); + static int _genpd_power_on(struct generic_pm_domain *genpd, bool timed) { unsigned int state_idx = genpd->state_idx; @@ -256,6 +345,15 @@ static int _genpd_power_on(struct generic_pm_domain *genpd, bool timed) return ret; elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start)); + + if (unlikely(genpd->set_performance_state)) { + ret = genpd->set_performance_state(genpd, genpd->performance_state); + if (ret) { + pr_warn("%s: Failed to set performance state %d (%d)\n", + genpd->name, genpd->performance_state, ret); + } + } + if (elapsed_ns <= genpd->states[state_idx].power_on_latency_ns) return ret; diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 84f423d5633e..9af0356bd69c 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -64,8 +64,11 @@ struct generic_pm_domain { unsigned int device_count; /* Number of devices */ unsigned int suspended_count; /* System suspend device counter */ unsigned int prepared_count; /* Suspend counter of prepared devices */ + unsigned int performance_state; /* Aggregated max performance state */ int (*power_off)(struct generic_pm_domain *domain); int (*power_on)(struct generic_pm_domain *domain); + int (*set_performance_state)(struct generic_pm_domain *genpd, + unsigned int state); struct gpd_dev_ops dev_ops; s64 max_off_time_ns; /* Maximum allowed "suspended" time. */ bool max_off_time_changed; @@ -121,6 +124,7 @@ struct generic_pm_domain_data { struct pm_domain_data base; struct gpd_timing_data td; struct notifier_block nb; + unsigned int performance_state; void *data; }; @@ -148,6 +152,8 @@ extern int pm_genpd_remove_subdomain(struct generic_pm_domain *genpd, extern int pm_genpd_init(struct generic_pm_domain *genpd, struct dev_power_governor *gov, bool is_off); extern int pm_genpd_remove(struct generic_pm_domain *genpd); +extern int dev_pm_genpd_set_performance_state(struct device *dev, + unsigned int state); extern struct dev_power_governor simple_qos_governor; extern struct dev_power_governor pm_domain_always_on_gov; @@ -188,6 +194,12 @@ static inline int pm_genpd_remove(struct generic_pm_domain *genpd) return -ENOTSUPP; } +static inline int dev_pm_genpd_set_performance_state(struct device *dev, + unsigned int state) +{ + return -ENOTSUPP; +} + #define simple_qos_governor (*(struct dev_power_governor *)(NULL)) #define pm_domain_always_on_gov (*(struct dev_power_governor *)(NULL)) #endif From 9867999f3a85b52f96ef05fca00cc8128eed01ce Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Thu, 12 Oct 2017 11:32:23 +0100 Subject: [PATCH 39/88] PM / OPP: add missing of_node_put() for of_get_cpu_node() Commit 762792913f8c (PM / OPP: Fix get sharing CPUs when hotplug is used) moved away from using cpu_dev->of_node because of some limitations. However, commit 7467c9d95989 (of: return of_get_cpu_node from of_cpu_device_node_get if CPUs are not registered) added support to fall back to of_get_cpu_node() if called if CPUs are not registered yet. Add the missing of_node_put() for the CPU device nodes. Also go back to using of_cpu_device_node_get() in dev_pm_opp_of_get_sharing_cpus() to avoid scanning the device tree again. Acked-by: Viresh Kumar Fixes: 762792913f8c (PM / OPP: Fix get sharing CPUs when hotplug is used) Signed-off-by: Sudeep Holla Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- drivers/opp/of.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 87509cb69f79..cb716aa2f44b 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -16,7 +16,7 @@ #include #include #include -#include +#include #include #include @@ -604,7 +604,7 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, if (cpu == cpu_dev->id) continue; - cpu_np = of_get_cpu_node(cpu, NULL); + cpu_np = of_cpu_device_node_get(cpu); if (!cpu_np) { dev_err(cpu_dev, "%s: failed to get cpu%d node\n", __func__, cpu); @@ -614,6 +614,7 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, /* Get OPP descriptor node */ tmp_np = _opp_of_get_opp_desc_node(cpu_np); + of_node_put(cpu_np); if (!tmp_np) { pr_err("%pOF: Couldn't find opp node\n", cpu_np); ret = -ENOENT; From 009acd196fc860045bf7b2c3f5812f0f5efb2782 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Wed, 11 Oct 2017 12:54:14 +0530 Subject: [PATCH 40/88] PM / OPP: Support updating performance state of device's power domain The genpd framework now provides an API to request device's power domain to update its performance state. Use that interface from the OPP core for devices whose power domains support performance states. Note that this commit doesn't add any mechanism by which performance states are made available to the OPP core. That would be done by a later commit. Note that the current implementation is restricted to the case where the device doesn't have separate regulators for itself. We shouldn't over engineer the code before we have real use case for them. We can always come back and add more code to support such cases later on. Tested-by: Rajendra Nayak Signed-off-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/opp/core.c | 57 ++++++++++++++++++++++++++++++++++++++++++- drivers/opp/debugfs.c | 3 +++ drivers/opp/opp.h | 4 +++ 3 files changed, 63 insertions(+), 1 deletion(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 80c21207e48c..0ce8069d6843 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include "opp.h" @@ -535,6 +536,44 @@ _generic_set_opp_clk_only(struct device *dev, struct clk *clk, return ret; } +static inline int +_generic_set_opp_domain(struct device *dev, struct clk *clk, + unsigned long old_freq, unsigned long freq, + unsigned int old_pstate, unsigned int new_pstate) +{ + int ret; + + /* Scaling up? Scale domain performance state before frequency */ + if (freq > old_freq) { + ret = dev_pm_genpd_set_performance_state(dev, new_pstate); + if (ret) + return ret; + } + + ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); + if (ret) + goto restore_domain_state; + + /* Scaling down? Scale domain performance state after frequency */ + if (freq < old_freq) { + ret = dev_pm_genpd_set_performance_state(dev, new_pstate); + if (ret) + goto restore_freq; + } + + return 0; + +restore_freq: + if (_generic_set_opp_clk_only(dev, clk, freq, old_freq)) + dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n", + __func__, old_freq); +restore_domain_state: + if (freq > old_freq) + dev_pm_genpd_set_performance_state(dev, old_pstate); + + return ret; +} + static int _generic_set_opp_regulator(const struct opp_table *opp_table, struct device *dev, unsigned long old_freq, @@ -653,7 +692,16 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) /* Only frequency scaling */ if (!opp_table->regulators) { - ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); + /* + * We don't support devices with both regulator and + * domain performance-state for now. + */ + if (opp_table->genpd_performance_state) + ret = _generic_set_opp_domain(dev, clk, old_freq, freq, + IS_ERR(old_opp) ? 0 : old_opp->pstate, + opp->pstate); + else + ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); } else if (!opp_table->set_opp) { ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq, IS_ERR(old_opp) ? NULL : old_opp->supplies, @@ -1706,6 +1754,13 @@ void _dev_pm_opp_remove_table(struct opp_table *opp_table, struct device *dev, if (remove_all || !opp->dynamic) dev_pm_opp_put(opp); } + + /* + * The OPP table is getting removed, drop the performance state + * constraints. + */ + if (opp_table->genpd_performance_state) + dev_pm_genpd_set_performance_state(dev, 0); } else { _remove_opp_dev(_find_opp_dev(dev, opp_table), opp_table); } diff --git a/drivers/opp/debugfs.c b/drivers/opp/debugfs.c index 9318848f3c67..b03c03576a62 100644 --- a/drivers/opp/debugfs.c +++ b/drivers/opp/debugfs.c @@ -99,6 +99,9 @@ int opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table) if (!debugfs_create_bool("suspend", S_IRUGO, d, &opp->suspend)) return -ENOMEM; + if (!debugfs_create_u32("performance_state", S_IRUGO, d, &opp->pstate)) + return -ENOMEM; + if (!debugfs_create_ulong("rate_hz", S_IRUGO, d, &opp->rate)) return -ENOMEM; diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h index 166eef990599..e8f767ab5814 100644 --- a/drivers/opp/opp.h +++ b/drivers/opp/opp.h @@ -58,6 +58,7 @@ extern struct list_head opp_tables; * @dynamic: not-created from static DT entries. * @turbo: true if turbo (boost) OPP * @suspend: true if suspend OPP + * @pstate: Device's power domain's performance state. * @rate: Frequency in hertz * @supplies: Power supplies voltage/current values * @clock_latency_ns: Latency (in nanoseconds) of switching to this OPP's @@ -76,6 +77,7 @@ struct dev_pm_opp { bool dynamic; bool turbo; bool suspend; + unsigned int pstate; unsigned long rate; struct dev_pm_opp_supply *supplies; @@ -135,6 +137,7 @@ enum opp_table_access { * @clk: Device's clock handle * @regulators: Supply regulators * @regulator_count: Number of power supply regulators + * @genpd_performance_state: Device's power domain support performance state. * @set_opp: Platform specific set_opp callback * @set_opp_data: Data to be passed to set_opp callback * @dentry: debugfs dentry pointer of the real device directory (not links). @@ -170,6 +173,7 @@ struct opp_table { struct clk *clk; struct regulator **regulators; unsigned int regulator_count; + bool genpd_performance_state; int (*set_opp)(struct dev_pm_set_opp_data *data); struct dev_pm_set_opp_data *set_opp_data; From b6aa98364f842f943495408895627702ad7ad44b Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Wed, 11 Oct 2017 12:54:15 +0530 Subject: [PATCH 41/88] PM / OPP: Add dev_pm_opp_{un}register_get_pstate_helper() This adds the dev_pm_opp_{un}register_get_pstate_helper() helper routines which will be used to set the get_pstate() callback for a device. This callback will be later called internally by the OPP core to get performance state corresponding to an OPP. This is required temporarily until the time we have proper DT bindings to include the performance state information. Signed-off-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/opp/core.c | 78 ++++++++++++++++++++++++++++++++++++++++++ drivers/opp/opp.h | 2 ++ include/linux/pm_opp.h | 10 ++++++ 3 files changed, 90 insertions(+) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 0ce8069d6843..92fa94a6dcc1 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1036,6 +1036,9 @@ int _opp_add(struct device *dev, struct dev_pm_opp *new_opp, return ret; } + if (opp_table->get_pstate) + new_opp->pstate = opp_table->get_pstate(dev, new_opp->rate); + list_add(&new_opp->node, head); mutex_unlock(&opp_table->lock); @@ -1547,6 +1550,81 @@ void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table) } EXPORT_SYMBOL_GPL(dev_pm_opp_unregister_set_opp_helper); +/** + * dev_pm_opp_register_get_pstate_helper() - Register get_pstate() helper. + * @dev: Device for which the helper is getting registered. + * @get_pstate: Helper. + * + * TODO: Remove this callback after the same information is available via Device + * Tree. + * + * This allows a platform to initialize the performance states of individual + * OPPs for its devices, until we get similar information directly from DT. + * + * This must be called before the OPPs are initialized for the device. + */ +struct opp_table *dev_pm_opp_register_get_pstate_helper(struct device *dev, + int (*get_pstate)(struct device *dev, unsigned long rate)) +{ + struct opp_table *opp_table; + int ret; + + if (!get_pstate) + return ERR_PTR(-EINVAL); + + opp_table = dev_pm_opp_get_opp_table(dev); + if (!opp_table) + return ERR_PTR(-ENOMEM); + + /* This should be called before OPPs are initialized */ + if (WARN_ON(!list_empty(&opp_table->opp_list))) { + ret = -EBUSY; + goto err; + } + + /* Already have genpd_performance_state set */ + if (WARN_ON(opp_table->genpd_performance_state)) { + ret = -EBUSY; + goto err; + } + + opp_table->genpd_performance_state = true; + opp_table->get_pstate = get_pstate; + + return opp_table; + +err: + dev_pm_opp_put_opp_table(opp_table); + + return ERR_PTR(ret); +} +EXPORT_SYMBOL_GPL(dev_pm_opp_register_get_pstate_helper); + +/** + * dev_pm_opp_unregister_get_pstate_helper() - Releases resources blocked for + * get_pstate() helper + * @opp_table: OPP table returned from dev_pm_opp_register_get_pstate_helper(). + * + * Release resources blocked for platform specific get_pstate() helper. + */ +void dev_pm_opp_unregister_get_pstate_helper(struct opp_table *opp_table) +{ + if (!opp_table->genpd_performance_state) { + pr_err("%s: Doesn't have performance states set\n", + __func__); + return; + } + + /* Make sure there are no concurrent readers while updating opp_table */ + WARN_ON(!list_empty(&opp_table->opp_list)); + + opp_table->genpd_performance_state = false; + opp_table->get_pstate = NULL; + + dev_pm_opp_put_opp_table(opp_table); +} +EXPORT_SYMBOL_GPL(dev_pm_opp_unregister_get_pstate_helper); + /** * dev_pm_opp_add() - Add an OPP table from a table definitions * @dev: device for which we do this operation diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h index e8f767ab5814..4d00061648a3 100644 --- a/drivers/opp/opp.h +++ b/drivers/opp/opp.h @@ -140,6 +140,7 @@ enum opp_table_access { * @genpd_performance_state: Device's power domain support performance state. * @set_opp: Platform specific set_opp callback * @set_opp_data: Data to be passed to set_opp callback + * @get_pstate: Platform specific get_pstate callback * @dentry: debugfs dentry pointer of the real device directory (not links). * @dentry_name: Name of the real dentry. * @@ -177,6 +178,7 @@ struct opp_table { int (*set_opp)(struct dev_pm_set_opp_data *data); struct dev_pm_set_opp_data *set_opp_data; + int (*get_pstate)(struct device *dev, unsigned long rate); #ifdef CONFIG_DEBUG_FS struct dentry *dentry; diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 849d21dc4ca7..6c2d2e88f066 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -125,6 +125,8 @@ struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char * name); void dev_pm_opp_put_clkname(struct opp_table *opp_table); struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data)); void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table); +struct opp_table *dev_pm_opp_register_get_pstate_helper(struct device *dev, int (*get_pstate)(struct device *dev, unsigned long rate)); +void dev_pm_opp_unregister_get_pstate_helper(struct opp_table *opp_table); int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq); int dev_pm_opp_set_sharing_cpus(struct device *cpu_dev, const struct cpumask *cpumask); int dev_pm_opp_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask); @@ -245,6 +247,14 @@ static inline struct opp_table *dev_pm_opp_register_set_opp_helper(struct device static inline void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table) {} +static inline struct opp_table *dev_pm_opp_register_get_pstate_helper(struct device *dev, + int (*get_pstate)(struct device *dev, unsigned long rate)) +{ + return ERR_PTR(-ENOTSUPP); +} + +static inline void dev_pm_opp_unregister_get_pstate_helper(struct opp_table *opp_table) {} + static inline struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name) { return ERR_PTR(-ENOTSUPP); From 248aefdcc3a7e0cfbd014946b4dead63e750e71b Mon Sep 17 00:00:00 2001 From: Zumeng Chen Date: Tue, 10 Oct 2017 21:27:20 +0800 Subject: [PATCH 42/88] cpufreq: ti-cpufreq: add missing of_node_put() call of_node_put to release the refcount of np. Signed-off-by: Zumeng Chen Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/ti-cpufreq.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c index ffcddcd4c5e6..923317f03b4b 100644 --- a/drivers/cpufreq/ti-cpufreq.c +++ b/drivers/cpufreq/ti-cpufreq.c @@ -205,6 +205,7 @@ static int ti_cpufreq_init(void) np = of_find_node_by_path("/"); match = of_match_node(ti_cpufreq_of_match, np); + of_node_put(np); if (!match) return -ENODEV; From 9bc70e6919f8cab80d5b240493007e4cce85559c Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Thu, 12 Oct 2017 17:41:03 -0500 Subject: [PATCH 43/88] cpufreq: speedstep-lib: mark expected switch fall-through In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/speedstep-lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/speedstep-lib.c b/drivers/cpufreq/speedstep-lib.c index ccab452a4ef5..8085ec9000d1 100644 --- a/drivers/cpufreq/speedstep-lib.c +++ b/drivers/cpufreq/speedstep-lib.c @@ -367,7 +367,7 @@ unsigned int speedstep_detect_processor(void) } else return SPEEDSTEP_CPU_PIII_C; } - + /* fall through */ default: return 0; } From 0f87855d969a87f02048ff5ced7503465d5ab2f1 Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Tue, 10 Oct 2017 13:47:55 +0800 Subject: [PATCH 44/88] ARM: cpuidle: Correct driver unregistration if init fails If cpuidle init fails, the code misses to unregister the driver for current CPU. Furthermore, we also need to rollback to cancel all previous CPUs registration; but the code retrieves driver handler by using function cpuidle_get_driver(), this function returns back current CPU driver handler but not previous CPU's handler, which leads to the failure handling code cannot unregister previous CPUs driver. This commit fixes two mentioned issues, it adds error handling path 'goto out_unregister_drv' for current CPU driver unregistration; and it is to replace cpuidle_get_driver() with cpuidle_get_cpu_driver(), the later function can retrieve driver handler for previous CPUs according to the CPU device handler so can unregister the driver properly. This patch also adds extra error handling paths 'goto out_kfree_dev' and 'goto out_kfree_drv' and adjusts the freeing sentences for previous CPUs; so make the code more readable for freeing 'dev' and 'drv' structures. Suggested-by: Daniel Lezcano Signed-off-by: Leo Yan Fixes: d50a7d8acd78 (ARM: cpuidle: Support asymmetric idle definition) Acked-by: Daniel Lezcano Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/cpuidle-arm.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/cpuidle/cpuidle-arm.c b/drivers/cpuidle/cpuidle-arm.c index 52a75053ee03..f47c54546752 100644 --- a/drivers/cpuidle/cpuidle-arm.c +++ b/drivers/cpuidle/cpuidle-arm.c @@ -104,13 +104,13 @@ static int __init arm_idle_init(void) ret = dt_init_idle_driver(drv, arm_idle_state_match, 1); if (ret <= 0) { ret = ret ? : -ENODEV; - goto init_fail; + goto out_kfree_drv; } ret = cpuidle_register_driver(drv); if (ret) { pr_err("Failed to register cpuidle driver\n"); - goto init_fail; + goto out_kfree_drv; } /* @@ -128,14 +128,14 @@ static int __init arm_idle_init(void) if (ret) { pr_err("CPU %d failed to init idle CPU ops\n", cpu); - goto out_fail; + goto out_unregister_drv; } dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) { pr_err("Failed to allocate cpuidle device\n"); ret = -ENOMEM; - goto out_fail; + goto out_unregister_drv; } dev->cpu = cpu; @@ -143,21 +143,25 @@ static int __init arm_idle_init(void) if (ret) { pr_err("Failed to register cpuidle device for CPU %d\n", cpu); - kfree(dev); - goto out_fail; + goto out_kfree_dev; } } return 0; -init_fail: + +out_kfree_dev: + kfree(dev); +out_unregister_drv: + cpuidle_unregister_driver(drv); +out_kfree_drv: kfree(drv); out_fail: while (--cpu >= 0) { dev = per_cpu(cpuidle_devices, cpu); + drv = cpuidle_get_cpu_driver(dev); cpuidle_unregister_device(dev); - kfree(dev); - drv = cpuidle_get_driver(); cpuidle_unregister_driver(drv); + kfree(dev); kfree(drv); } From 7943bfaeb6bbbf595df4bd4087f5b890761c4898 Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Tue, 10 Oct 2017 13:47:56 +0800 Subject: [PATCH 45/88] ARM: cpuidle: Refactor rollback operations if init fails If init fails, we need execute two levels rollback operations: the first level is for the failed CPU rollback operations, the second level is to iterate all succeeded CPUs to cancel their registration; currently the code uses one function to finish these two levels rollback operations. This commit is to refactor rollback operations, so it adds a new function arm_idle_init_cpu() to encapsulate one specified CPU driver registration and rollback the first level operations; and use function arm_idle_init() to iterate all CPUs and finish the second level's rollback operations. Suggested-by: Daniel Lezcano Signed-off-by: Leo Yan Acked-by: Daniel Lezcano Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/cpuidle-arm.c | 131 +++++++++++++++++++--------------- 1 file changed, 75 insertions(+), 56 deletions(-) diff --git a/drivers/cpuidle/cpuidle-arm.c b/drivers/cpuidle/cpuidle-arm.c index f47c54546752..ddee1b601b89 100644 --- a/drivers/cpuidle/cpuidle-arm.c +++ b/drivers/cpuidle/cpuidle-arm.c @@ -72,79 +72,74 @@ static const struct of_device_id arm_idle_state_match[] __initconst = { }; /* - * arm_idle_init + * arm_idle_init_cpu * * Registers the arm specific cpuidle driver with the cpuidle * framework. It relies on core code to parse the idle states * and initialize them using driver data structures accordingly. */ -static int __init arm_idle_init(void) +static int __init arm_idle_init_cpu(int cpu) { - int cpu, ret; + int ret; struct cpuidle_driver *drv; struct cpuidle_device *dev; - for_each_possible_cpu(cpu) { + drv = kmemdup(&arm_idle_driver, sizeof(*drv), GFP_KERNEL); + if (!drv) + return -ENOMEM; - drv = kmemdup(&arm_idle_driver, sizeof(*drv), GFP_KERNEL); - if (!drv) { - ret = -ENOMEM; - goto out_fail; - } + drv->cpumask = (struct cpumask *)cpumask_of(cpu); - drv->cpumask = (struct cpumask *)cpumask_of(cpu); + /* + * Initialize idle states data, starting at index 1. This + * driver is DT only, if no DT idle states are detected (ret + * == 0) let the driver initialization fail accordingly since + * there is no reason to initialize the idle driver if only + * wfi is supported. + */ + ret = dt_init_idle_driver(drv, arm_idle_state_match, 1); + if (ret <= 0) { + ret = ret ? : -ENODEV; + goto out_kfree_drv; + } - /* - * Initialize idle states data, starting at index 1. This - * driver is DT only, if no DT idle states are detected (ret - * == 0) let the driver initialization fail accordingly since - * there is no reason to initialize the idle driver if only - * wfi is supported. - */ - ret = dt_init_idle_driver(drv, arm_idle_state_match, 1); - if (ret <= 0) { - ret = ret ? : -ENODEV; - goto out_kfree_drv; - } + ret = cpuidle_register_driver(drv); + if (ret) { + pr_err("Failed to register cpuidle driver\n"); + goto out_kfree_drv; + } - ret = cpuidle_register_driver(drv); - if (ret) { - pr_err("Failed to register cpuidle driver\n"); - goto out_kfree_drv; - } + /* + * Call arch CPU operations in order to initialize + * idle states suspend back-end specific data + */ + ret = arm_cpuidle_init(cpu); - /* - * Call arch CPU operations in order to initialize - * idle states suspend back-end specific data - */ - ret = arm_cpuidle_init(cpu); + /* + * Skip the cpuidle device initialization if the reported + * failure is a HW misconfiguration/breakage (-ENXIO). + */ + if (ret == -ENXIO) + return 0; - /* - * Skip the cpuidle device initialization if the reported - * failure is a HW misconfiguration/breakage (-ENXIO). - */ - if (ret == -ENXIO) - continue; + if (ret) { + pr_err("CPU %d failed to init idle CPU ops\n", cpu); + goto out_unregister_drv; + } - if (ret) { - pr_err("CPU %d failed to init idle CPU ops\n", cpu); - goto out_unregister_drv; - } + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) { + pr_err("Failed to allocate cpuidle device\n"); + ret = -ENOMEM; + goto out_unregister_drv; + } + dev->cpu = cpu; - dev = kzalloc(sizeof(*dev), GFP_KERNEL); - if (!dev) { - pr_err("Failed to allocate cpuidle device\n"); - ret = -ENOMEM; - goto out_unregister_drv; - } - dev->cpu = cpu; - - ret = cpuidle_register_device(dev); - if (ret) { - pr_err("Failed to register cpuidle device for CPU %d\n", - cpu); - goto out_kfree_dev; - } + ret = cpuidle_register_device(dev); + if (ret) { + pr_err("Failed to register cpuidle device for CPU %d\n", + cpu); + goto out_kfree_dev; } return 0; @@ -155,6 +150,30 @@ out_unregister_drv: cpuidle_unregister_driver(drv); out_kfree_drv: kfree(drv); + return ret; +} + +/* + * arm_idle_init - Initializes arm cpuidle driver + * + * Initializes arm cpuidle driver for all CPUs, if any CPU fails + * to register cpuidle driver then rollback to cancel all CPUs + * registeration. + */ +static int __init arm_idle_init(void) +{ + int cpu, ret; + struct cpuidle_driver *drv; + struct cpuidle_device *dev; + + for_each_possible_cpu(cpu) { + ret = arm_idle_init_cpu(cpu); + if (ret) + goto out_fail; + } + + return 0; + out_fail: while (--cpu >= 0) { dev = per_cpu(cpuidle_devices, cpu); From 20f97caf1120bd02e8ff4adbad3b44b63626feb5 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 13 Oct 2017 15:27:24 +0200 Subject: [PATCH 46/88] PM / QoS: Drop PM_QOS_FLAG_REMOTE_WAKEUP The PM QoS flag PM_QOS_FLAG_REMOTE_WAKEUP is not used consistently and the vast majority of code simply assumes that remote wakeup should be enabled for devices in runtime suspend if they can generate wakeup signals, so drop it. Signed-off-by: Rafael J. Wysocki Acked-by: Ulf Hansson Reviewed-by: Mika Westerberg --- Documentation/ABI/testing/sysfs-devices-power | 16 ----------- Documentation/power/pm_qos_interface.txt | 13 ++++----- drivers/acpi/device_pm.c | 6 ++-- drivers/base/power/domain.c | 4 +-- drivers/base/power/sysfs.c | 28 ------------------- include/linux/pm_qos.h | 1 - 6 files changed, 9 insertions(+), 59 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power index 676fdf5f2a99..f4b24c327665 100644 --- a/Documentation/ABI/testing/sysfs-devices-power +++ b/Documentation/ABI/testing/sysfs-devices-power @@ -258,19 +258,3 @@ Description: This attribute has no effect on system-wide suspend/resume and hibernation. - -What: /sys/devices/.../power/pm_qos_remote_wakeup -Date: September 2012 -Contact: Rafael J. Wysocki -Description: - The /sys/devices/.../power/pm_qos_remote_wakeup attribute - is used for manipulating the PM QoS "remote wakeup required" - flag. If set, this flag indicates to the kernel that the - device is a source of user events that have to be signaled from - its low-power states. - - Not all drivers support this attribute. If it isn't supported, - it is not present. - - This attribute has no effect on system-wide suspend/resume and - hibernation. diff --git a/Documentation/power/pm_qos_interface.txt b/Documentation/power/pm_qos_interface.txt index 21d2d48f87a2..19c5f7b1a7ba 100644 --- a/Documentation/power/pm_qos_interface.txt +++ b/Documentation/power/pm_qos_interface.txt @@ -98,8 +98,7 @@ Values are updated in response to changes of the request list. The target values of resume latency and active state latency tolerance are simply the minimum of the request values held in the parameter list elements. The PM QoS flags aggregate value is a gather (bitwise OR) of all list elements' -values. Two device PM QoS flags are defined currently: PM_QOS_FLAG_NO_POWER_OFF -and PM_QOS_FLAG_REMOTE_WAKEUP. +values. One device PM QoS flag is defined currently: PM_QOS_FLAG_NO_POWER_OFF. Note: The aggregated target values are implemented in such a way that reading the aggregated value does not require any locking mechanism. @@ -153,14 +152,14 @@ PM QoS list of resume latency constraints and remove sysfs attribute pm_qos_resume_latency_us from the device's power directory. int dev_pm_qos_expose_flags(device, value) -Add a request to the device's PM QoS list of flags and create sysfs attributes -pm_qos_no_power_off and pm_qos_remote_wakeup under the device's power directory -allowing user space to change these flags' value. +Add a request to the device's PM QoS list of flags and create sysfs attribute +pm_qos_no_power_off under the device's power directory allowing user space to +change the value of the PM_QOS_FLAG_NO_POWER_OFF flag. void dev_pm_qos_hide_flags(device) Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list -of flags and remove sysfs attributes pm_qos_no_power_off and pm_qos_remote_wakeup -under the device's power directory. +of flags and remove sysfs attribute pm_qos_no_power_off from the device's power +directory. Notification mechanisms: The per-device PM QoS framework has a per-device notification tree. diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index fbcc73f7a099..e8c820129797 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -581,8 +581,7 @@ static int acpi_dev_pm_get_state(struct device *dev, struct acpi_device *adev, d_min = ret; wakeup = device_may_wakeup(dev) && adev->wakeup.flags.valid && adev->wakeup.sleep_state >= target_state; - } else if (dev_pm_qos_flags(dev, PM_QOS_FLAG_REMOTE_WAKEUP) != - PM_QOS_FLAGS_NONE) { + } else { wakeup = adev->wakeup.flags.valid; } @@ -865,8 +864,7 @@ int acpi_dev_runtime_suspend(struct device *dev) if (!adev) return 0; - remote_wakeup = dev_pm_qos_flags(dev, PM_QOS_FLAG_REMOTE_WAKEUP) > - PM_QOS_FLAGS_NONE; + remote_wakeup = acpi_device_can_wakeup(adev); if (remote_wakeup) { error = acpi_device_wakeup_enable(adev, ACPI_STATE_S0); if (error) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index e8ca5e2cf1e5..e6414e9998bb 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -346,9 +346,7 @@ static int genpd_power_off(struct generic_pm_domain *genpd, bool one_dev_on, list_for_each_entry(pdd, &genpd->dev_list, list_node) { enum pm_qos_flags_status stat; - stat = dev_pm_qos_flags(pdd->dev, - PM_QOS_FLAG_NO_POWER_OFF - | PM_QOS_FLAG_REMOTE_WAKEUP); + stat = dev_pm_qos_flags(pdd->dev, PM_QOS_FLAG_NO_POWER_OFF); if (stat > PM_QOS_FLAGS_NONE) return -EBUSY; diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c index 156ab57bca77..29bf28fef136 100644 --- a/drivers/base/power/sysfs.c +++ b/drivers/base/power/sysfs.c @@ -309,33 +309,6 @@ static ssize_t pm_qos_no_power_off_store(struct device *dev, static DEVICE_ATTR(pm_qos_no_power_off, 0644, pm_qos_no_power_off_show, pm_qos_no_power_off_store); -static ssize_t pm_qos_remote_wakeup_show(struct device *dev, - struct device_attribute *attr, - char *buf) -{ - return sprintf(buf, "%d\n", !!(dev_pm_qos_requested_flags(dev) - & PM_QOS_FLAG_REMOTE_WAKEUP)); -} - -static ssize_t pm_qos_remote_wakeup_store(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t n) -{ - int ret; - - if (kstrtoint(buf, 0, &ret)) - return -EINVAL; - - if (ret != 0 && ret != 1) - return -EINVAL; - - ret = dev_pm_qos_update_flags(dev, PM_QOS_FLAG_REMOTE_WAKEUP, ret); - return ret < 0 ? ret : n; -} - -static DEVICE_ATTR(pm_qos_remote_wakeup, 0644, - pm_qos_remote_wakeup_show, pm_qos_remote_wakeup_store); - #ifdef CONFIG_PM_SLEEP static const char _enabled[] = "enabled"; static const char _disabled[] = "disabled"; @@ -671,7 +644,6 @@ static const struct attribute_group pm_qos_latency_tolerance_attr_group = { static struct attribute *pm_qos_flags_attrs[] = { &dev_attr_pm_qos_no_power_off.attr, - &dev_attr_pm_qos_remote_wakeup.attr, NULL, }; static const struct attribute_group pm_qos_flags_attr_group = { diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h index 032b55909145..51f0d7e0b15f 100644 --- a/include/linux/pm_qos.h +++ b/include/linux/pm_qos.h @@ -39,7 +39,6 @@ enum pm_qos_flags_status { #define PM_QOS_LATENCY_ANY ((s32)(~(__u32)0 >> 1)) #define PM_QOS_FLAG_NO_POWER_OFF (1 << 0) -#define PM_QOS_FLAG_REMOTE_WAKEUP (1 << 1) struct pm_qos_request { struct plist_node node; From cbe25ce37d6c2623b5ac09128987e98848a54c6c Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sat, 14 Oct 2017 17:43:15 +0200 Subject: [PATCH 47/88] ACPI / PM: Combine device suspend routines On top of a previous change getting rid of the PM QoS flag PM_QOS_FLAG_REMOTE_WAKEUP, combine two ACPI device suspend routines, acpi_dev_runtime_suspend() and acpi_dev_suspend_late(), into one, acpi_dev_suspend(), to eliminate some code duplication. It also avoids enabling wakeup for devices handled by the ACPI LPSS middle layer on driver removal. Signed-off-by: Rafael J. Wysocki Reviewed-by: Ulf Hansson --- drivers/acpi/acpi_lpss.c | 6 ++-- drivers/acpi/device_pm.c | 61 +++++++++------------------------------- include/linux/acpi.h | 3 +- 3 files changed, 18 insertions(+), 52 deletions(-) diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 97b753dd2e6e..8ec19e7c7b61 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -713,7 +713,7 @@ static int acpi_lpss_activate(struct device *dev) static void acpi_lpss_dismiss(struct device *dev) { - acpi_dev_runtime_suspend(dev); + acpi_dev_suspend(dev, false); } #ifdef CONFIG_PM_SLEEP @@ -729,7 +729,7 @@ static int acpi_lpss_suspend_late(struct device *dev) if (pdata->dev_desc->flags & LPSS_SAVE_CTX) acpi_lpss_save_ctx(dev, pdata); - return acpi_dev_suspend_late(dev); + return acpi_dev_suspend(dev, device_may_wakeup(dev)); } static int acpi_lpss_resume_early(struct device *dev) @@ -847,7 +847,7 @@ static int acpi_lpss_runtime_suspend(struct device *dev) if (pdata->dev_desc->flags & LPSS_SAVE_CTX) acpi_lpss_save_ctx(dev, pdata); - ret = acpi_dev_runtime_suspend(dev); + ret = acpi_dev_suspend(dev, true); /* * This call must be last in the sequence, otherwise PMC will return diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index d74000acb658..17e8eb93a76c 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -847,37 +847,39 @@ static int acpi_dev_pm_full_power(struct acpi_device *adev) } /** - * acpi_dev_runtime_suspend - Put device into a low-power state using ACPI. + * acpi_dev_suspend - Put device into a low-power state using ACPI. * @dev: Device to put into a low-power state. + * @wakeup: Whether or not to enable wakeup for the device. * - * Put the given device into a runtime low-power state using the standard ACPI + * Put the given device into a low-power state using the standard ACPI * mechanism. Set up remote wakeup if desired, choose the state to put the * device into (this checks if remote wakeup is expected to work too), and set * the power state of the device. */ -int acpi_dev_runtime_suspend(struct device *dev) +int acpi_dev_suspend(struct device *dev, bool wakeup) { struct acpi_device *adev = ACPI_COMPANION(dev); - bool remote_wakeup; + u32 target_state = acpi_target_system_state(); int error; if (!adev) return 0; - remote_wakeup = acpi_device_can_wakeup(adev); - if (remote_wakeup) { - error = acpi_device_wakeup_enable(adev, ACPI_STATE_S0); + if (wakeup && acpi_device_can_wakeup(adev)) { + error = acpi_device_wakeup_enable(adev, target_state); if (error) return -EAGAIN; + } else { + wakeup = false; } - error = acpi_dev_pm_low_power(dev, adev, ACPI_STATE_S0); - if (error && remote_wakeup) + error = acpi_dev_pm_low_power(dev, adev, target_state); + if (error && wakeup) acpi_device_wakeup_disable(adev); return error; } -EXPORT_SYMBOL_GPL(acpi_dev_runtime_suspend); +EXPORT_SYMBOL_GPL(acpi_dev_suspend); /** * acpi_dev_resume - Put device into the full-power state using ACPI. @@ -910,7 +912,7 @@ EXPORT_SYMBOL_GPL(acpi_dev_resume); int acpi_subsys_runtime_suspend(struct device *dev) { int ret = pm_generic_runtime_suspend(dev); - return ret ? ret : acpi_dev_runtime_suspend(dev); + return ret ? ret : acpi_dev_suspend(dev, true); } EXPORT_SYMBOL_GPL(acpi_subsys_runtime_suspend); @@ -929,41 +931,6 @@ int acpi_subsys_runtime_resume(struct device *dev) EXPORT_SYMBOL_GPL(acpi_subsys_runtime_resume); #ifdef CONFIG_PM_SLEEP -/** - * acpi_dev_suspend_late - Put device into a low-power state using ACPI. - * @dev: Device to put into a low-power state. - * - * Put the given device into a low-power state during system transition to a - * sleep state using the standard ACPI mechanism. Set up system wakeup if - * desired, choose the state to put the device into (this checks if system - * wakeup is expected to work too), and set the power state of the device. - */ -int acpi_dev_suspend_late(struct device *dev) -{ - struct acpi_device *adev = ACPI_COMPANION(dev); - u32 target_state; - bool wakeup; - int error; - - if (!adev) - return 0; - - target_state = acpi_target_system_state(); - wakeup = device_may_wakeup(dev) && acpi_device_can_wakeup(adev); - if (wakeup) { - error = acpi_device_wakeup_enable(adev, target_state); - if (error) - return error; - } - - error = acpi_dev_pm_low_power(dev, adev, target_state); - if (error && wakeup) - acpi_device_wakeup_disable(adev); - - return error; -} -EXPORT_SYMBOL_GPL(acpi_dev_suspend_late); - static bool acpi_dev_needs_resume(struct device *dev, struct acpi_device *adev) { u32 sys_target = acpi_target_system_state(); @@ -1046,7 +1013,7 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend); int acpi_subsys_suspend_late(struct device *dev) { int ret = pm_generic_suspend_late(dev); - return ret ? ret : acpi_dev_suspend_late(dev); + return ret ? ret : acpi_dev_suspend(dev, device_may_wakeup(dev)); } EXPORT_SYMBOL_GPL(acpi_subsys_suspend_late); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 2b1738f840ab..0ada2a948b44 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -864,7 +864,7 @@ static inline void arch_reserve_mem_area(acpi_physical_address addr, #endif #if defined(CONFIG_ACPI) && defined(CONFIG_PM) -int acpi_dev_runtime_suspend(struct device *dev); +int acpi_dev_suspend(struct device *dev, bool wakeup); int acpi_dev_resume(struct device *dev); int acpi_subsys_runtime_suspend(struct device *dev); int acpi_subsys_runtime_resume(struct device *dev); @@ -889,7 +889,6 @@ int acpi_subsys_resume_early(struct device *dev); int acpi_subsys_suspend(struct device *dev); int acpi_subsys_freeze(struct device *dev); #else -static inline int acpi_dev_suspend_late(struct device *dev) { return 0; } static inline int acpi_dev_resume_early(struct device *dev) { return 0; } static inline int acpi_subsys_prepare(struct device *dev) { return 0; } static inline void acpi_subsys_complete(struct device *dev) {} From 7e95d9134e3851de57075254737bc434462bb293 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 19 Oct 2017 01:18:57 +0200 Subject: [PATCH 48/88] PM: docs: Fix formatting typo in devices.rst There is one word too many under formatting markup in one place in device.rst, so fix it. Signed-off-by: Rafael J. Wysocki --- Documentation/driver-api/pm/devices.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst index a0dc2879a152..b8f1e3bdb743 100644 --- a/Documentation/driver-api/pm/devices.rst +++ b/Documentation/driver-api/pm/devices.rst @@ -274,7 +274,7 @@ sleep states and the hibernation state ("suspend-to-disk"). Each phase involves executing callbacks for every device before the next phase begins. Not all buses or classes support all these callbacks and not all drivers use all the callbacks. The various phases always run after tasks have been frozen and -before they are unfrozen. Furthermore, the ``*_noirq phases`` run at a time +before they are unfrozen. Furthermore, the ``*_noirq`` phases run at a time when IRQ handlers have been disabled (except for those marked with the IRQF_NO_SUSPEND flag). From b082ddd8a6a3aa0399763bfb58fc7bdd84c95713 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 13 Oct 2017 15:25:39 +0200 Subject: [PATCH 49/88] PM / core: Fix kerneldoc comments of four functions Fix the kerneldoc comments of __device_suspend_noirq(), __device_suspend_late() and __device_suspend() where the function names in kerneldoc don't match the actual names of the functions. Also fix the device_resume_noirq() kerneldoc comment which mentions "early resume" instead of "noirq resume" incorrectly. Signed-off-by: Rafael J. Wysocki Reviewed-by: Ulf Hansson --- drivers/base/power/main.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 12abcf6084a5..9bbbbb13a9db 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -528,7 +528,7 @@ static void dpm_watchdog_clear(struct dpm_watchdog *wd) /*------------------------- Resume routines -------------------------*/ /** - * device_resume_noirq - Execute an "early resume" callback for given device. + * device_resume_noirq - Execute a "noirq resume" callback for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. * @async: If true, the device is being resumed asynchronously. @@ -1077,7 +1077,7 @@ static pm_message_t resume_event(pm_message_t sleep_state) } /** - * device_suspend_noirq - Execute a "late suspend" callback for given device. + * __device_suspend_noirq - Execute a "noirq suspend" callback for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. * @async: If true, the device is being suspended asynchronously. @@ -1237,7 +1237,7 @@ int dpm_suspend_noirq(pm_message_t state) } /** - * device_suspend_late - Execute a "late suspend" callback for given device. + * __device_suspend_late - Execute a "late suspend" callback for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. * @async: If true, the device is being suspended asynchronously. @@ -1439,7 +1439,7 @@ static void dpm_clear_suppliers_direct_complete(struct device *dev) } /** - * device_suspend - Execute "suspend" callbacks for given device. + * __device_suspend - Execute "suspend" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. * @async: If true, the device is being suspended asynchronously. From d9278077385fd9207c00104fe6797283a099b061 Mon Sep 17 00:00:00 2001 From: Robert Jarzmik Date: Sat, 14 Oct 2017 23:51:02 +0200 Subject: [PATCH 50/88] cpufreq: pxa: convert to clock API As the clock settings have been introduced into the clock pxa drivers, which are now available to change the CPU clock by themselves, remove the clock handling from this driver, and rely on pxa clock drivers. Signed-off-by: Robert Jarzmik Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/pxa2xx-cpufreq.c | 191 +++++++------------------------ 1 file changed, 39 insertions(+), 152 deletions(-) diff --git a/drivers/cpufreq/pxa2xx-cpufreq.c b/drivers/cpufreq/pxa2xx-cpufreq.c index ce345bf34d5d..06b024a3e474 100644 --- a/drivers/cpufreq/pxa2xx-cpufreq.c +++ b/drivers/cpufreq/pxa2xx-cpufreq.c @@ -58,56 +58,40 @@ module_param(pxa27x_maxfreq, uint, 0); MODULE_PARM_DESC(pxa27x_maxfreq, "Set the pxa27x maxfreq in MHz" "(typically 624=>pxa270, 416=>pxa271, 520=>pxa272)"); +struct pxa_cpufreq_data { + struct clk *clk_core; +}; +static struct pxa_cpufreq_data pxa_cpufreq_data; + struct pxa_freqs { unsigned int khz; - unsigned int membus; - unsigned int cccr; - unsigned int div2; - unsigned int cclkcfg; int vmin; int vmax; }; -/* Define the refresh period in mSec for the SDRAM and the number of rows */ -#define SDRAM_TREF 64 /* standard 64ms SDRAM */ -static unsigned int sdram_rows; - -#define CCLKCFG_TURBO 0x1 -#define CCLKCFG_FCS 0x2 -#define CCLKCFG_HALFTURBO 0x4 -#define CCLKCFG_FASTBUS 0x8 -#define MDREFR_DB2_MASK (MDREFR_K2DB2 | MDREFR_K1DB2) -#define MDREFR_DRI_MASK 0xFFF - -#define MDCNFG_DRAC2(mdcnfg) (((mdcnfg) >> 21) & 0x3) -#define MDCNFG_DRAC0(mdcnfg) (((mdcnfg) >> 5) & 0x3) - /* * PXA255 definitions */ -/* Use the run mode frequencies for the CPUFREQ_POLICY_PERFORMANCE policy */ -#define CCLKCFG CCLKCFG_TURBO | CCLKCFG_FCS - static const struct pxa_freqs pxa255_run_freqs[] = { - /* CPU MEMBUS CCCR DIV2 CCLKCFG run turbo PXbus SDRAM */ - { 99500, 99500, 0x121, 1, CCLKCFG, -1, -1}, /* 99, 99, 50, 50 */ - {132700, 132700, 0x123, 1, CCLKCFG, -1, -1}, /* 133, 133, 66, 66 */ - {199100, 99500, 0x141, 0, CCLKCFG, -1, -1}, /* 199, 199, 99, 99 */ - {265400, 132700, 0x143, 1, CCLKCFG, -1, -1}, /* 265, 265, 133, 66 */ - {331800, 165900, 0x145, 1, CCLKCFG, -1, -1}, /* 331, 331, 166, 83 */ - {398100, 99500, 0x161, 0, CCLKCFG, -1, -1}, /* 398, 398, 196, 99 */ + /* CPU MEMBUS run turbo PXbus SDRAM */ + { 99500, -1, -1}, /* 99, 99, 50, 50 */ + {132700, -1, -1}, /* 133, 133, 66, 66 */ + {199100, -1, -1}, /* 199, 199, 99, 99 */ + {265400, -1, -1}, /* 265, 265, 133, 66 */ + {331800, -1, -1}, /* 331, 331, 166, 83 */ + {398100, -1, -1}, /* 398, 398, 196, 99 */ }; /* Use the turbo mode frequencies for the CPUFREQ_POLICY_POWERSAVE policy */ static const struct pxa_freqs pxa255_turbo_freqs[] = { - /* CPU MEMBUS CCCR DIV2 CCLKCFG run turbo PXbus SDRAM */ - { 99500, 99500, 0x121, 1, CCLKCFG, -1, -1}, /* 99, 99, 50, 50 */ - {199100, 99500, 0x221, 0, CCLKCFG, -1, -1}, /* 99, 199, 50, 99 */ - {298500, 99500, 0x321, 0, CCLKCFG, -1, -1}, /* 99, 287, 50, 99 */ - {298600, 99500, 0x1c1, 0, CCLKCFG, -1, -1}, /* 199, 287, 99, 99 */ - {398100, 99500, 0x241, 0, CCLKCFG, -1, -1}, /* 199, 398, 99, 99 */ + /* CPU run turbo PXbus SDRAM */ + { 99500, -1, -1}, /* 99, 99, 50, 50 */ + {199100, -1, -1}, /* 99, 199, 50, 99 */ + {298500, -1, -1}, /* 99, 287, 50, 99 */ + {298600, -1, -1}, /* 199, 287, 99, 99 */ + {398100, -1, -1}, /* 199, 398, 99, 99 */ }; #define NUM_PXA25x_RUN_FREQS ARRAY_SIZE(pxa255_run_freqs) @@ -122,47 +106,14 @@ static unsigned int pxa255_turbo_table; module_param(pxa255_turbo_table, uint, 0); MODULE_PARM_DESC(pxa255_turbo_table, "Selects the frequency table (0 = run table, !0 = turbo table)"); -/* - * PXA270 definitions - * - * For the PXA27x: - * Control variables are A, L, 2N for CCCR; B, HT, T for CLKCFG. - * - * A = 0 => memory controller clock from table 3-7, - * A = 1 => memory controller clock = system bus clock - * Run mode frequency = 13 MHz * L - * Turbo mode frequency = 13 MHz * L * N - * System bus frequency = 13 MHz * L / (B + 1) - * - * In CCCR: - * A = 1 - * L = 16 oscillator to run mode ratio - * 2N = 6 2 * (turbo mode to run mode ratio) - * - * In CCLKCFG: - * B = 1 Fast bus mode - * HT = 0 Half-Turbo mode - * T = 1 Turbo mode - * - * For now, just support some of the combinations in table 3-7 of - * PXA27x Processor Family Developer's Manual to simplify frequency - * change sequences. - */ -#define PXA27x_CCCR(A, L, N2) (A << 25 | N2 << 7 | L) -#define CCLKCFG2(B, HT, T) \ - (CCLKCFG_FCS | \ - ((B) ? CCLKCFG_FASTBUS : 0) | \ - ((HT) ? CCLKCFG_HALFTURBO : 0) | \ - ((T) ? CCLKCFG_TURBO : 0)) - static struct pxa_freqs pxa27x_freqs[] = { - {104000, 104000, PXA27x_CCCR(1, 8, 2), 0, CCLKCFG2(1, 0, 1), 900000, 1705000 }, - {156000, 104000, PXA27x_CCCR(1, 8, 3), 0, CCLKCFG2(1, 0, 1), 1000000, 1705000 }, - {208000, 208000, PXA27x_CCCR(0, 16, 2), 1, CCLKCFG2(0, 0, 1), 1180000, 1705000 }, - {312000, 208000, PXA27x_CCCR(1, 16, 3), 1, CCLKCFG2(1, 0, 1), 1250000, 1705000 }, - {416000, 208000, PXA27x_CCCR(1, 16, 4), 1, CCLKCFG2(1, 0, 1), 1350000, 1705000 }, - {520000, 208000, PXA27x_CCCR(1, 16, 5), 1, CCLKCFG2(1, 0, 1), 1450000, 1705000 }, - {624000, 208000, PXA27x_CCCR(1, 16, 6), 1, CCLKCFG2(1, 0, 1), 1550000, 1705000 } + {104000, 900000, 1705000 }, + {156000, 1000000, 1705000 }, + {208000, 1180000, 1705000 }, + {312000, 1250000, 1705000 }, + {416000, 1350000, 1705000 }, + {520000, 1450000, 1705000 }, + {624000, 1550000, 1705000 } }; #define NUM_PXA27x_FREQS ARRAY_SIZE(pxa27x_freqs) @@ -241,51 +192,29 @@ static void pxa27x_guess_max_freq(void) } } -static void init_sdram_rows(void) -{ - uint32_t mdcnfg = __raw_readl(MDCNFG); - unsigned int drac2 = 0, drac0 = 0; - - if (mdcnfg & (MDCNFG_DE2 | MDCNFG_DE3)) - drac2 = MDCNFG_DRAC2(mdcnfg); - - if (mdcnfg & (MDCNFG_DE0 | MDCNFG_DE1)) - drac0 = MDCNFG_DRAC0(mdcnfg); - - sdram_rows = 1 << (11 + max(drac0, drac2)); -} - -static u32 mdrefr_dri(unsigned int freq) -{ - u32 interval = freq * SDRAM_TREF / sdram_rows; - - return (interval - (cpu_is_pxa27x() ? 31 : 0)) / 32; -} - static unsigned int pxa_cpufreq_get(unsigned int cpu) { - return get_clk_frequency_khz(0); + struct pxa_cpufreq_data *data = cpufreq_get_driver_data(); + + return (unsigned int) clk_get_rate(data->clk_core) / 1000; } static int pxa_set_target(struct cpufreq_policy *policy, unsigned int idx) { struct cpufreq_frequency_table *pxa_freqs_table; const struct pxa_freqs *pxa_freq_settings; - unsigned long flags; - unsigned int new_freq_cpu, new_freq_mem; - unsigned int unused, preset_mdrefr, postset_mdrefr, cclkcfg; + struct pxa_cpufreq_data *data = cpufreq_get_driver_data(); + unsigned int new_freq_cpu; int ret = 0; /* Get the current policy */ find_freq_tables(&pxa_freqs_table, &pxa_freq_settings); new_freq_cpu = pxa_freq_settings[idx].khz; - new_freq_mem = pxa_freq_settings[idx].membus; if (freq_debug) - pr_debug("Changing CPU frequency to %d Mhz, (SDRAM %d Mhz)\n", - new_freq_cpu / 1000, (pxa_freq_settings[idx].div2) ? - (new_freq_mem / 2000) : (new_freq_mem / 1000)); + pr_debug("Changing CPU frequency from %d Mhz to %d Mhz\n", + policy->cur / 1000, new_freq_cpu / 1000); if (vcc_core && new_freq_cpu > policy->cur) { ret = pxa_cpufreq_change_voltage(&pxa_freq_settings[idx]); @@ -293,53 +222,7 @@ static int pxa_set_target(struct cpufreq_policy *policy, unsigned int idx) return ret; } - /* Calculate the next MDREFR. If we're slowing down the SDRAM clock - * we need to preset the smaller DRI before the change. If we're - * speeding up we need to set the larger DRI value after the change. - */ - preset_mdrefr = postset_mdrefr = __raw_readl(MDREFR); - if ((preset_mdrefr & MDREFR_DRI_MASK) > mdrefr_dri(new_freq_mem)) { - preset_mdrefr = (preset_mdrefr & ~MDREFR_DRI_MASK); - preset_mdrefr |= mdrefr_dri(new_freq_mem); - } - postset_mdrefr = - (postset_mdrefr & ~MDREFR_DRI_MASK) | mdrefr_dri(new_freq_mem); - - /* If we're dividing the memory clock by two for the SDRAM clock, this - * must be set prior to the change. Clearing the divide must be done - * after the change. - */ - if (pxa_freq_settings[idx].div2) { - preset_mdrefr |= MDREFR_DB2_MASK; - postset_mdrefr |= MDREFR_DB2_MASK; - } else { - postset_mdrefr &= ~MDREFR_DB2_MASK; - } - - local_irq_save(flags); - - /* Set new the CCCR and prepare CCLKCFG */ - writel(pxa_freq_settings[idx].cccr, CCCR); - cclkcfg = pxa_freq_settings[idx].cclkcfg; - - asm volatile(" \n\ - ldr r4, [%1] /* load MDREFR */ \n\ - b 2f \n\ - .align 5 \n\ -1: \n\ - str %3, [%1] /* preset the MDREFR */ \n\ - mcr p14, 0, %2, c6, c0, 0 /* set CCLKCFG[FCS] */ \n\ - str %4, [%1] /* postset the MDREFR */ \n\ - \n\ - b 3f \n\ -2: b 1b \n\ -3: nop \n\ - " - : "=&r" (unused) - : "r" (MDREFR), "r" (cclkcfg), - "r" (preset_mdrefr), "r" (postset_mdrefr) - : "r4", "r5"); - local_irq_restore(flags); + clk_set_rate(data->clk_core, new_freq_cpu * 1000); /* * Even if voltage setting fails, we don't report it, as the frequency @@ -369,8 +252,6 @@ static int pxa_cpufreq_init(struct cpufreq_policy *policy) pxa_cpufreq_init_voltages(); - init_sdram_rows(); - /* set default policy and cpuinfo */ policy->cpuinfo.transition_latency = 1000; /* FIXME: 1 ms, assumed */ @@ -429,11 +310,17 @@ static struct cpufreq_driver pxa_cpufreq_driver = { .init = pxa_cpufreq_init, .get = pxa_cpufreq_get, .name = "PXA2xx", + .driver_data = &pxa_cpufreq_data, }; static int __init pxa_cpu_init(void) { int ret = -ENODEV; + + pxa_cpufreq_data.clk_core = clk_get_sys(NULL, "core"); + if (IS_ERR(pxa_cpufreq_data.clk_core)) + return PTR_ERR(pxa_cpufreq_data.clk_core); + if (cpu_is_pxa25x() || cpu_is_pxa27x()) ret = cpufreq_register_driver(&pxa_cpufreq_driver); return ret; From 96428e98aebe5db8a164711f102808651c7f518d Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 16 Oct 2017 16:20:55 -0700 Subject: [PATCH 51/88] PM / core: Convert timers to use timer_setup() In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Removes test of .data field, since that will be going away. Signed-off-by: Kees Cook Signed-off-by: Rafael J. Wysocki --- drivers/base/power/runtime.c | 7 +++---- drivers/base/power/wakeup.c | 11 +++++------ 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 7bcf80fa9ada..1cea431c0adf 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -894,9 +894,9 @@ static void pm_runtime_work(struct work_struct *work) * * Check if the time is right and queue a suspend request. */ -static void pm_suspend_timer_fn(unsigned long data) +static void pm_suspend_timer_fn(struct timer_list *t) { - struct device *dev = (struct device *)data; + struct device *dev = from_timer(dev, t, power.suspend_timer); unsigned long flags; unsigned long expires; @@ -1499,8 +1499,7 @@ void pm_runtime_init(struct device *dev) INIT_WORK(&dev->power.work, pm_runtime_work); dev->power.timer_expires = 0; - setup_timer(&dev->power.suspend_timer, pm_suspend_timer_fn, - (unsigned long)dev); + timer_setup(&dev->power.suspend_timer, pm_suspend_timer_fn, 0); init_waitqueue_head(&dev->power.wait_queue); } diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index cdd6f256da59..680ee1d36ac9 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -54,7 +54,7 @@ static unsigned int saved_count; static DEFINE_SPINLOCK(events_lock); -static void pm_wakeup_timer_fn(unsigned long data); +static void pm_wakeup_timer_fn(struct timer_list *t); static LIST_HEAD(wakeup_sources); @@ -176,7 +176,7 @@ void wakeup_source_add(struct wakeup_source *ws) return; spin_lock_init(&ws->lock); - setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws); + timer_setup(&ws->timer, pm_wakeup_timer_fn, 0); ws->active = false; ws->last_time = ktime_get(); @@ -481,8 +481,7 @@ static bool wakeup_source_not_registered(struct wakeup_source *ws) * Use timer struct to check if the given source is initialized * by wakeup_source_add. */ - return ws->timer.function != pm_wakeup_timer_fn || - ws->timer.data != (unsigned long)ws; + return ws->timer.function != (TIMER_FUNC_TYPE)pm_wakeup_timer_fn; } /* @@ -724,9 +723,9 @@ EXPORT_SYMBOL_GPL(pm_relax); * in @data if it is currently active and its timer has not been canceled and * the expiration time of the timer is not in future. */ -static void pm_wakeup_timer_fn(unsigned long data) +static void pm_wakeup_timer_fn(struct timer_list *t) { - struct wakeup_source *ws = (struct wakeup_source *)data; + struct wakeup_source *ws = from_timer(ws, t, timer); unsigned long flags; spin_lock_irqsave(&ws->lock, flags); From a192aa923b66a435aae56983c4912ee150bc9b32 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 16 Oct 2017 03:29:55 +0200 Subject: [PATCH 52/88] ACPI / LPSS: Consolidate runtime PM and system sleep handling Move the LPSS-specific code from acpi_lpss_runtime_suspend() and acpi_lpss_runtime_resume() into separate functions, acpi_lpss_suspend() and acpi_lpss_resume(), respectively, and make acpi_lpss_suspend_late() and acpi_lpss_resume_early() use them too in order to unify the runtime PM and system sleep handling in the LPSS driver. Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman Reviewed-by: Ulf Hansson --- drivers/acpi/acpi_lpss.c | 76 ++++++++++++++++++---------------------- 1 file changed, 34 insertions(+), 42 deletions(-) diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 8ec19e7c7b61..04d32bdb5a95 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -716,40 +716,6 @@ static void acpi_lpss_dismiss(struct device *dev) acpi_dev_suspend(dev, false); } -#ifdef CONFIG_PM_SLEEP -static int acpi_lpss_suspend_late(struct device *dev) -{ - struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev)); - int ret; - - ret = pm_generic_suspend_late(dev); - if (ret) - return ret; - - if (pdata->dev_desc->flags & LPSS_SAVE_CTX) - acpi_lpss_save_ctx(dev, pdata); - - return acpi_dev_suspend(dev, device_may_wakeup(dev)); -} - -static int acpi_lpss_resume_early(struct device *dev) -{ - struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev)); - int ret; - - ret = acpi_dev_resume(dev); - if (ret) - return ret; - - acpi_lpss_d3_to_d0_delay(pdata); - - if (pdata->dev_desc->flags & LPSS_SAVE_CTX) - acpi_lpss_restore_ctx(dev, pdata); - - return pm_generic_resume_early(dev); -} -#endif /* CONFIG_PM_SLEEP */ - /* IOSF SB for LPSS island */ #define LPSS_IOSF_UNIT_LPIOEP 0xA0 #define LPSS_IOSF_UNIT_LPIO1 0xAB @@ -835,19 +801,15 @@ static void lpss_iosf_exit_d3_state(void) mutex_unlock(&lpss_iosf_mutex); } -static int acpi_lpss_runtime_suspend(struct device *dev) +static int acpi_lpss_suspend(struct device *dev, bool wakeup) { struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev)); int ret; - ret = pm_generic_runtime_suspend(dev); - if (ret) - return ret; - if (pdata->dev_desc->flags & LPSS_SAVE_CTX) acpi_lpss_save_ctx(dev, pdata); - ret = acpi_dev_suspend(dev, true); + ret = acpi_dev_suspend(dev, wakeup); /* * This call must be last in the sequence, otherwise PMC will return @@ -860,7 +822,7 @@ static int acpi_lpss_runtime_suspend(struct device *dev) return ret; } -static int acpi_lpss_runtime_resume(struct device *dev) +static int acpi_lpss_resume(struct device *dev) { struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev)); int ret; @@ -881,7 +843,37 @@ static int acpi_lpss_runtime_resume(struct device *dev) if (pdata->dev_desc->flags & LPSS_SAVE_CTX) acpi_lpss_restore_ctx(dev, pdata); - return pm_generic_runtime_resume(dev); + return 0; +} + +#ifdef CONFIG_PM_SLEEP +static int acpi_lpss_suspend_late(struct device *dev) +{ + int ret = pm_generic_suspend_late(dev); + + return ret ? ret : acpi_lpss_suspend(dev, device_may_wakeup(dev)); +} + +static int acpi_lpss_resume_early(struct device *dev) +{ + int ret = acpi_lpss_resume(dev); + + return ret ? ret : pm_generic_resume_early(dev); +} +#endif /* CONFIG_PM_SLEEP */ + +static int acpi_lpss_runtime_suspend(struct device *dev) +{ + int ret = pm_generic_runtime_suspend(dev); + + return ret ? ret : acpi_lpss_suspend(dev, true); +} + +static int acpi_lpss_runtime_resume(struct device *dev) +{ + int ret = acpi_lpss_resume(dev); + + return ret ? ret : pm_generic_runtime_resume(dev); } #endif /* CONFIG_PM */ From ab8f58ad72c4d1abe59216362ddb8bfa428c9071 Mon Sep 17 00:00:00 2001 From: Chanwoo Choi Date: Mon, 23 Oct 2017 10:32:06 +0900 Subject: [PATCH 53/88] PM / devfreq: Set min/max_freq when adding the devfreq device Prior to that, the min/max_freq of the devfreq device are always zero before the user changes the min/max_freq through sysfs entries. It might make the confusion for the min/max_freq. This patch initializes the available min/max_freq by using the OPP during adding the devfreq device. Signed-off-by: Chanwoo Choi Signed-off-by: MyungJoo Ham --- drivers/devfreq/devfreq.c | 42 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index a1c4ee818614..6a6f88bccdee 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -69,6 +69,34 @@ static struct devfreq *find_device_devfreq(struct device *dev) return ERR_PTR(-ENODEV); } +static unsigned long find_available_min_freq(struct devfreq *devfreq) +{ + struct dev_pm_opp *opp; + unsigned long min_freq = 0; + + opp = dev_pm_opp_find_freq_ceil(devfreq->dev.parent, &min_freq); + if (IS_ERR(opp)) + min_freq = 0; + else + dev_pm_opp_put(opp); + + return min_freq; +} + +static unsigned long find_available_max_freq(struct devfreq *devfreq) +{ + struct dev_pm_opp *opp; + unsigned long max_freq = ULONG_MAX; + + opp = dev_pm_opp_find_freq_floor(devfreq->dev.parent, &max_freq); + if (IS_ERR(opp)) + max_freq = 0; + else + dev_pm_opp_put(opp); + + return max_freq; +} + /** * devfreq_get_freq_level() - Lookup freq_table for the frequency * @devfreq: the devfreq instance @@ -559,6 +587,20 @@ struct devfreq *devfreq_add_device(struct device *dev, mutex_lock(&devfreq->lock); } + devfreq->min_freq = find_available_min_freq(devfreq); + if (!devfreq->min_freq) { + mutex_unlock(&devfreq->lock); + err = -EINVAL; + goto err_dev; + } + + devfreq->max_freq = find_available_max_freq(devfreq); + if (!devfreq->max_freq) { + mutex_unlock(&devfreq->lock); + err = -EINVAL; + goto err_dev; + } + dev_set_name(&devfreq->dev, "devfreq%d", atomic_inc_return(&devfreq_no)); err = device_register(&devfreq->dev); From 1051e2c304b5cf17d4117505985f8128c5c64fd9 Mon Sep 17 00:00:00 2001 From: Chanwoo Choi Date: Mon, 23 Oct 2017 10:32:07 +0900 Subject: [PATCH 54/88] Revert "PM / devfreq: Add show_one macro to delete the duplicate code" This reverts commit 3104fa3081126c9bda35793af5f335d0ee0d5818. The {min|max}_freq_show() show the stored value of the struct devfreq. But, if the drivers/thermal/devfreq_cooling.c disables the specific frequency value, {min|max}_freq_show() have to check this situation before showing the stored value. So, this patch revert the macro in order to add the additional codes. Signed-off-by: Chanwoo Choi Signed-off-by: MyungJoo Ham --- drivers/devfreq/devfreq.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index 6a6f88bccdee..b6ba24e5db0d 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -1124,6 +1124,12 @@ unlock: return ret; } +static ssize_t min_freq_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + return sprintf(buf, "%lu\n", to_devfreq(dev)->min_freq); +} + static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { @@ -1150,17 +1156,13 @@ unlock: mutex_unlock(&df->lock); return ret; } - -#define show_one(name) \ -static ssize_t name##_show \ -(struct device *dev, struct device_attribute *attr, char *buf) \ -{ \ - return sprintf(buf, "%lu\n", to_devfreq(dev)->name); \ -} -show_one(min_freq); -show_one(max_freq); - static DEVICE_ATTR_RW(min_freq); + +static ssize_t max_freq_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + return sprintf(buf, "%lu\n", to_devfreq(dev)->max_freq); +} static DEVICE_ATTR_RW(max_freq); static ssize_t available_frequencies_show(struct device *d, From f1d981eaecf8ace68ec1d15bf05f28a4887ea6fb Mon Sep 17 00:00:00 2001 From: Chanwoo Choi Date: Mon, 23 Oct 2017 10:32:08 +0900 Subject: [PATCH 55/88] PM / devfreq: Use the available min/max frequency The commit a76caf55e5b35 ("thermal: Add devfreq cooling") is able to disable OPP as a cooling device. In result, both update_devfreq() and {min|max}_freq_show() have to consider the 'opp->available' status of each OPP. So, this patch adds the 'scaling_{min|max}_freq' to struct devfreq in order to indicate the available mininum and maximum frequency by adjusting OPP interface such as dev_pm_opp_{disable|enable}(). The 'scaling_{min|max}_freq' are used for on both update_devfreq() and {min|max}_freq_show(). Signed-off-by: Chanwoo Choi Signed-off-by: MyungJoo Ham --- drivers/devfreq/devfreq.c | 40 +++++++++++++++++++++++++++++++-------- include/linux/devfreq.h | 4 ++++ 2 files changed, 36 insertions(+), 8 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index b6ba24e5db0d..ee3e7cee30b6 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -28,6 +28,9 @@ #include #include "governor.h" +#define MAX(a,b) ((a > b) ? a : b) +#define MIN(a,b) ((a < b) ? a : b) + static struct class *devfreq_class; /* @@ -255,7 +258,7 @@ static int devfreq_notify_transition(struct devfreq *devfreq, int update_devfreq(struct devfreq *devfreq) { struct devfreq_freqs freqs; - unsigned long freq, cur_freq; + unsigned long freq, cur_freq, min_freq, max_freq; int err = 0; u32 flags = 0; @@ -273,19 +276,21 @@ int update_devfreq(struct devfreq *devfreq) return err; /* - * Adjust the frequency with user freq and QoS. + * Adjust the frequency with user freq, QoS and available freq. * * List from the highest priority * max_freq * min_freq */ + max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq); + min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq); - if (devfreq->min_freq && freq < devfreq->min_freq) { - freq = devfreq->min_freq; + if (min_freq && freq < min_freq) { + freq = min_freq; flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */ } - if (devfreq->max_freq && freq > devfreq->max_freq) { - freq = devfreq->max_freq; + if (max_freq && freq > max_freq) { + freq = max_freq; flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */ } @@ -494,6 +499,19 @@ static int devfreq_notifier_call(struct notifier_block *nb, unsigned long type, int ret; mutex_lock(&devfreq->lock); + + devfreq->scaling_min_freq = find_available_min_freq(devfreq); + if (!devfreq->scaling_min_freq) { + mutex_unlock(&devfreq->lock); + return -EINVAL; + } + + devfreq->scaling_max_freq = find_available_max_freq(devfreq); + if (!devfreq->scaling_max_freq) { + mutex_unlock(&devfreq->lock); + return -EINVAL; + } + ret = update_devfreq(devfreq); mutex_unlock(&devfreq->lock); @@ -593,6 +611,7 @@ struct devfreq *devfreq_add_device(struct device *dev, err = -EINVAL; goto err_dev; } + devfreq->scaling_min_freq = devfreq->min_freq; devfreq->max_freq = find_available_max_freq(devfreq); if (!devfreq->max_freq) { @@ -600,6 +619,7 @@ struct devfreq *devfreq_add_device(struct device *dev, err = -EINVAL; goto err_dev; } + devfreq->scaling_max_freq = devfreq->max_freq; dev_set_name(&devfreq->dev, "devfreq%d", atomic_inc_return(&devfreq_no)); @@ -1127,7 +1147,9 @@ unlock: static ssize_t min_freq_show(struct device *dev, struct device_attribute *attr, char *buf) { - return sprintf(buf, "%lu\n", to_devfreq(dev)->min_freq); + struct devfreq *df = to_devfreq(dev); + + return sprintf(buf, "%lu\n", MAX(df->scaling_min_freq, df->min_freq)); } static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr, @@ -1161,7 +1183,9 @@ static DEVICE_ATTR_RW(min_freq); static ssize_t max_freq_show(struct device *dev, struct device_attribute *attr, char *buf) { - return sprintf(buf, "%lu\n", to_devfreq(dev)->max_freq); + struct devfreq *df = to_devfreq(dev); + + return sprintf(buf, "%lu\n", MIN(df->scaling_max_freq, df->max_freq)); } static DEVICE_ATTR_RW(max_freq); diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h index 597294e0cc40..997a9eb34191 100644 --- a/include/linux/devfreq.h +++ b/include/linux/devfreq.h @@ -120,6 +120,8 @@ struct devfreq_dev_profile { * touch this. * @min_freq: Limit minimum frequency requested by user (0: none) * @max_freq: Limit maximum frequency requested by user (0: none) + * @scaling_min_freq: Limit minimum frequency requested by OPP interface + * @scaling_max_freq: Limit maximum frequency requested by OPP interface * @stop_polling: devfreq polling status of a device. * @total_trans: Number of devfreq transitions * @trans_table: Statistics of devfreq transitions @@ -153,6 +155,8 @@ struct devfreq { unsigned long min_freq; unsigned long max_freq; + unsigned long scaling_min_freq; + unsigned long scaling_max_freq; bool stop_polling; /* information for device frequency transition */ From ea572f816032bef9ff2641a439a45651a20eab73 Mon Sep 17 00:00:00 2001 From: Chanwoo Choi Date: Mon, 23 Oct 2017 10:32:09 +0900 Subject: [PATCH 56/88] PM / devfreq: Change return type of devfreq_set_freq_table() This patch changes the return type of devfreq_set_freq_table() from 'void' to 'int' in order to check whether it fails or not. And This patch just removes the 'devfreq' prefix and the description of function. Because the helper functions are only used by the devfreq. Signed-off-by: Chanwoo Choi Signed-off-by: MyungJoo Ham --- drivers/devfreq/devfreq.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index ee3e7cee30b6..b2920cd2b78e 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -116,11 +116,7 @@ static int devfreq_get_freq_level(struct devfreq *devfreq, unsigned long freq) return -EINVAL; } -/** - * devfreq_set_freq_table() - Initialize freq_table for the frequency - * @devfreq: the devfreq instance - */ -static void devfreq_set_freq_table(struct devfreq *devfreq) +static int set_freq_table(struct devfreq *devfreq) { struct devfreq_dev_profile *profile = devfreq->profile; struct dev_pm_opp *opp; @@ -130,7 +126,7 @@ static void devfreq_set_freq_table(struct devfreq *devfreq) /* Initialize the freq_table from OPP table */ count = dev_pm_opp_get_opp_count(devfreq->dev.parent); if (count <= 0) - return; + return -EINVAL; profile->max_state = count; profile->freq_table = devm_kcalloc(devfreq->dev.parent, @@ -139,7 +135,7 @@ static void devfreq_set_freq_table(struct devfreq *devfreq) GFP_KERNEL); if (!profile->freq_table) { profile->max_state = 0; - return; + return -ENOMEM; } for (i = 0, freq = 0; i < profile->max_state; i++, freq++) { @@ -147,11 +143,13 @@ static void devfreq_set_freq_table(struct devfreq *devfreq) if (IS_ERR(opp)) { devm_kfree(devfreq->dev.parent, profile->freq_table); profile->max_state = 0; - return; + return PTR_ERR(opp); } dev_pm_opp_put(opp); profile->freq_table[i] = freq; } + + return 0; } /** @@ -601,7 +599,9 @@ struct devfreq *devfreq_add_device(struct device *dev, if (!devfreq->profile->max_state && !devfreq->profile->freq_table) { mutex_unlock(&devfreq->lock); - devfreq_set_freq_table(devfreq); + err = set_freq_table(devfreq); + if (err < 0) + goto err_out; mutex_lock(&devfreq->lock); } From 416b46a2627ae8de1466f90787dede6f9c5a1bfa Mon Sep 17 00:00:00 2001 From: Chanwoo Choi Date: Mon, 23 Oct 2017 10:32:10 +0900 Subject: [PATCH 57/88] PM / devfreq: Show the all available frequencies The commit a76caf55e5b35 ("thermal: Add devfreq cooling") allows the devfreq device to use the cooling device. When the cooling down are required, the devfreq_cooling.c disables the OPP entry with the dev_pm_opp_disable(). In result, 'available_frequencies'[1] sysfs node never came to show the all available frequencies. [1] /sys/class/devfreq/.../available_frequencies So, this patch uses the 'freq_table' in the 'struct devfreq_dev_profile' in order to show the all available frequencies. - If 'freq_table' is NULL, devfreq core initializes them by using OPP values. - If 'freq_table' is initialized, devfreq core just uses the 'freq_table'. And this patch adds some comment about the sort way of 'freq_table'. Signed-off-by: Chanwoo Choi Signed-off-by: MyungJoo Ham --- drivers/devfreq/devfreq.c | 16 +++++----------- include/linux/devfreq.h | 5 +++-- 2 files changed, 8 insertions(+), 13 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index b2920cd2b78e..381f92e5e794 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -1194,22 +1194,16 @@ static ssize_t available_frequencies_show(struct device *d, char *buf) { struct devfreq *df = to_devfreq(d); - struct device *dev = df->dev.parent; - struct dev_pm_opp *opp; ssize_t count = 0; - unsigned long freq = 0; + int i; - do { - opp = dev_pm_opp_find_freq_ceil(dev, &freq); - if (IS_ERR(opp)) - break; + mutex_lock(&df->lock); - dev_pm_opp_put(opp); + for (i = 0; i < df->profile->max_state; i++) count += scnprintf(&buf[count], (PAGE_SIZE - count - 2), - "%lu ", freq); - freq++; - } while (1); + "%lu ", df->profile->freq_table[i]); + mutex_unlock(&df->lock); /* Truncate the trailing space */ if (count) count--; diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h index 997a9eb34191..19520625ea94 100644 --- a/include/linux/devfreq.h +++ b/include/linux/devfreq.h @@ -84,8 +84,9 @@ struct devfreq_dev_status { * from devfreq_remove_device() call. If the user * has registered devfreq->nb at a notifier-head, * this is the time to unregister it. - * @freq_table: Optional list of frequencies to support statistics. - * @max_state: The size of freq_table. + * @freq_table: Optional list of frequencies to support statistics + * and freq_table must be generated in ascending order. + * @max_state: The size of freq_table. */ struct devfreq_dev_profile { unsigned long initial_freq; From ccc4c3bcbb7de3cb61723f7584c01c3bde6cfbbb Mon Sep 17 00:00:00 2001 From: Chanwoo Choi Date: Mon, 23 Oct 2017 10:32:11 +0900 Subject: [PATCH 58/88] PM / devfreq: Remove unneeded conditional statement The freq_table array of each devfreq device is always not NULL. In result, it is unneeded to check whether profile->freq_table is NULL or not. Signed-off-by: Chanwoo Choi Signed-off-by: MyungJoo Ham --- drivers/devfreq/devfreq.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index 381f92e5e794..78fb496ecb4e 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -311,10 +311,9 @@ int update_devfreq(struct devfreq *devfreq) freqs.new = freq; devfreq_notify_transition(devfreq, &freqs, DEVFREQ_POSTCHANGE); - if (devfreq->profile->freq_table) - if (devfreq_update_status(devfreq, freq)) - dev_err(&devfreq->dev, - "Couldn't update frequency transition information.\n"); + if (devfreq_update_status(devfreq, freq)) + dev_err(&devfreq->dev, + "Couldn't update frequency transition information.\n"); devfreq->previous_freq = freq; return err; From aa7c352f9841ab3fee5bf1de127a45e6310124a6 Mon Sep 17 00:00:00 2001 From: Chanwoo Choi Date: Mon, 23 Oct 2017 10:32:12 +0900 Subject: [PATCH 59/88] PM / devfreq: Define the constant governor name Prior to that, the devfreq device uses the governor name when adding the itself. In order to prevent the mistake used the wrong governor name, this patch defines the governor name as a constant and then uses them instead of using the string directly. Signed-off-by: Chanwoo Choi Signed-off-by: MyungJoo Ham Cc: Kukjin Kim Cc: Krzysztof Kozlowski Cc: linux-samsung-soc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org --- drivers/devfreq/exynos-bus.c | 5 +++-- drivers/devfreq/governor_passive.c | 2 +- drivers/devfreq/governor_performance.c | 2 +- drivers/devfreq/governor_powersave.c | 2 +- drivers/devfreq/governor_simpleondemand.c | 2 +- drivers/devfreq/governor_userspace.c | 2 +- drivers/devfreq/rk3399_dmc.c | 2 +- include/linux/devfreq.h | 7 +++++++ 8 files changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/devfreq/exynos-bus.c b/drivers/devfreq/exynos-bus.c index 49f68929e024..c25658b26598 100644 --- a/drivers/devfreq/exynos-bus.c +++ b/drivers/devfreq/exynos-bus.c @@ -436,7 +436,8 @@ static int exynos_bus_probe(struct platform_device *pdev) ondemand_data->downdifferential = 5; /* Add devfreq device to monitor and handle the exynos bus */ - bus->devfreq = devm_devfreq_add_device(dev, profile, "simple_ondemand", + bus->devfreq = devm_devfreq_add_device(dev, profile, + DEVFREQ_GOV_SIMPLE_ONDEMAND, ondemand_data); if (IS_ERR(bus->devfreq)) { dev_err(dev, "failed to add devfreq device\n"); @@ -488,7 +489,7 @@ passive: passive_data->parent = parent_devfreq; /* Add devfreq device for exynos bus with passive governor */ - bus->devfreq = devm_devfreq_add_device(dev, profile, "passive", + bus->devfreq = devm_devfreq_add_device(dev, profile, DEVFREQ_GOV_PASSIVE, passive_data); if (IS_ERR(bus->devfreq)) { dev_err(dev, diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c index 673ad8cc9a1d..3bc29acbd54e 100644 --- a/drivers/devfreq/governor_passive.c +++ b/drivers/devfreq/governor_passive.c @@ -183,7 +183,7 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq, } static struct devfreq_governor devfreq_passive = { - .name = "passive", + .name = DEVFREQ_GOV_PASSIVE, .immutable = 1, .get_target_freq = devfreq_passive_get_target_freq, .event_handler = devfreq_passive_event_handler, diff --git a/drivers/devfreq/governor_performance.c b/drivers/devfreq/governor_performance.c index c72f942f30a8..4d23ecfbd948 100644 --- a/drivers/devfreq/governor_performance.c +++ b/drivers/devfreq/governor_performance.c @@ -42,7 +42,7 @@ static int devfreq_performance_handler(struct devfreq *devfreq, } static struct devfreq_governor devfreq_performance = { - .name = "performance", + .name = DEVFREQ_GOV_PERFORMANCE, .get_target_freq = devfreq_performance_func, .event_handler = devfreq_performance_handler, }; diff --git a/drivers/devfreq/governor_powersave.c b/drivers/devfreq/governor_powersave.c index 0c6bed567e6d..0c42f23249ef 100644 --- a/drivers/devfreq/governor_powersave.c +++ b/drivers/devfreq/governor_powersave.c @@ -39,7 +39,7 @@ static int devfreq_powersave_handler(struct devfreq *devfreq, } static struct devfreq_governor devfreq_powersave = { - .name = "powersave", + .name = DEVFREQ_GOV_POWERSAVE, .get_target_freq = devfreq_powersave_func, .event_handler = devfreq_powersave_handler, }; diff --git a/drivers/devfreq/governor_simpleondemand.c b/drivers/devfreq/governor_simpleondemand.c index ae72ba5e78df..28e0f2de7100 100644 --- a/drivers/devfreq/governor_simpleondemand.c +++ b/drivers/devfreq/governor_simpleondemand.c @@ -125,7 +125,7 @@ static int devfreq_simple_ondemand_handler(struct devfreq *devfreq, } static struct devfreq_governor devfreq_simple_ondemand = { - .name = "simple_ondemand", + .name = DEVFREQ_GOV_SIMPLE_ONDEMAND, .get_target_freq = devfreq_simple_ondemand_func, .event_handler = devfreq_simple_ondemand_handler, }; diff --git a/drivers/devfreq/governor_userspace.c b/drivers/devfreq/governor_userspace.c index 77028c27593c..080607c3f34d 100644 --- a/drivers/devfreq/governor_userspace.c +++ b/drivers/devfreq/governor_userspace.c @@ -87,7 +87,7 @@ static struct attribute *dev_entries[] = { NULL, }; static const struct attribute_group dev_attr_group = { - .name = "userspace", + .name = DEVFREQ_GOV_USERSPACE, .attrs = dev_entries, }; diff --git a/drivers/devfreq/rk3399_dmc.c b/drivers/devfreq/rk3399_dmc.c index 1b89ebbad02c..5dfbfa3cc878 100644 --- a/drivers/devfreq/rk3399_dmc.c +++ b/drivers/devfreq/rk3399_dmc.c @@ -431,7 +431,7 @@ static int rk3399_dmcfreq_probe(struct platform_device *pdev) data->devfreq = devm_devfreq_add_device(dev, &rk3399_devfreq_dmc_profile, - "simple_ondemand", + DEVFREQ_GOV_SIMPLE_ONDEMAND, &data->ondemand_data); if (IS_ERR(data->devfreq)) return PTR_ERR(data->devfreq); diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h index 19520625ea94..3aae5b3af87c 100644 --- a/include/linux/devfreq.h +++ b/include/linux/devfreq.h @@ -19,6 +19,13 @@ #define DEVFREQ_NAME_LEN 16 +/* DEVFREQ governor name */ +#define DEVFREQ_GOV_SIMPLE_ONDEMAND "simple_ondemand" +#define DEVFREQ_GOV_PERFORMANCE "performance" +#define DEVFREQ_GOV_POWERSAVE "powersave" +#define DEVFREQ_GOV_USERSPACE "userspace" +#define DEVFREQ_GOV_PASSIVE "passive" + /* DEVFREQ notifier interface */ #define DEVFREQ_TRANSITION_NOTIFIER (0) From 9da779c324db87ca340e0eb1259c949874f17bed Mon Sep 17 00:00:00 2001 From: Prarit Bhargava Date: Wed, 25 Oct 2017 09:51:32 -0400 Subject: [PATCH 60/88] cpupower: Fix no-rounding MHz frequency output 'cpupower frequency-info -ln' returns kHz values on systems with MHz range minimum CPU frequency range. For example, on a 800MHz to 4.20GHz system the command returns hardware limits: 800000 MHz - 4.200000 GHz The code that causes this error can be removed. The next else if clause will handle the output correctly such that hardware limits: 800.000 MHz - 4.200000 GHz is displayed correctly. [v2]: Remove two lines instead of fixing broken code. Signed-off-by: Prarit Bhargava Cc: Thomas Renninger Cc: Stafford Horne Cc: Shuah Khan Reviewed-by: Stafford Horne Signed-off-by: Shuah Khan --- tools/power/cpupower/utils/cpufreq-info.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c index 3e701f0e9c14..df43cd45d810 100644 --- a/tools/power/cpupower/utils/cpufreq-info.c +++ b/tools/power/cpupower/utils/cpufreq-info.c @@ -93,8 +93,6 @@ static void print_speed(unsigned long speed) if (speed > 1000000) printf("%u.%06u GHz", ((unsigned int) speed/1000000), ((unsigned int) speed%1000000)); - else if (speed > 100000) - printf("%u MHz", (unsigned int) speed); else if (speed > 1000) printf("%u.%03u MHz", ((unsigned int) speed/1000), (unsigned int) (speed%1000)); From 10f2fe6efa5c3fc91ec6b700d3fc530845f5c1ab Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Thu, 2 Nov 2017 13:19:47 -0600 Subject: [PATCH 61/88] MAINTAINERS: add maintainer for tools/power/cpupower Based on discussions with Rafael J. Wysocki, cpupower is need of an active maintainer. I decided to on take the task of maintaining this tool. Patches will flow through the pm sub-systems to the mainline. Suggested-by: Rafael J. Wysocki Signed-off-by: Shuah Khan Acked-by: Thomas Renninger Signed-off-by: Rafael J. Wysocki --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index af0cb69f6a3e..9fd3ce23095a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3636,6 +3636,8 @@ F: drivers/cpufreq/arm_big_little_dt.c CPU POWER MONITORING SUBSYSTEM M: Thomas Renninger +M: Shuah Khan +M: Shuah Khan L: linux-pm@vger.kernel.org S: Maintained F: tools/power/cpupower/ From 08810a4119aaebf6318f209ec5dd9828e969cba4 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 25 Oct 2017 14:12:29 +0200 Subject: [PATCH 62/88] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags The motivation for this change is to provide a way to work around a problem with the direct-complete mechanism used for avoiding system suspend/resume handling for devices in runtime suspend. The problem is that some middle layer code (the PCI bus type and the ACPI PM domain in particular) returns positive values from its system suspend ->prepare callbacks regardless of whether the driver's ->prepare returns a positive value or 0, which effectively prevents drivers from being able to control the direct-complete feature. Some drivers need that control, however, and the PCI bus type has grown its own flag to deal with this issue, but since it is not limited to PCI, it is better to address it by adding driver flags at the core level. To that end, add a driver_flags field to struct dev_pm_info for flags that can be set by device drivers at the probe time to inform the PM core and/or bus types, PM domains and so on on the capabilities and/or preferences of device drivers. Also add two static inline helpers for setting that field and testing it against a given set of flags and make the driver core clear it automatically on driver remove and probe failures. Define and document two PM driver flags related to the direct- complete feature: NEVER_SKIP and SMART_PREPARE that can be used, respectively, to indicate to the PM core that the direct-complete mechanism should never be used for the device and to inform the middle layer code (bus types, PM domains etc) that it can only request the PM core to use the direct-complete mechanism for the device (by returning a positive value from its ->prepare callback) if it also has been requested by the driver. While at it, make the core check pm_runtime_suspended() when setting power.direct_complete so that it doesn't need to be checked by ->prepare callbacks. Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman Acked-by: Bjorn Helgaas Reviewed-by: Ulf Hansson --- Documentation/driver-api/pm/devices.rst | 14 ++++++++++++++ Documentation/power/pci.txt | 19 +++++++++++++++++++ drivers/acpi/device_pm.c | 13 +++++++++---- drivers/base/dd.c | 2 ++ drivers/base/power/main.c | 4 +++- drivers/pci/pci-driver.c | 5 ++++- include/linux/device.h | 10 ++++++++++ include/linux/pm.h | 20 ++++++++++++++++++++ 8 files changed, 81 insertions(+), 6 deletions(-) diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst index 4a18ef9997c0..8add5b302a89 100644 --- a/Documentation/driver-api/pm/devices.rst +++ b/Documentation/driver-api/pm/devices.rst @@ -354,6 +354,20 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``. is because all such devices are initially set to runtime-suspended with runtime PM disabled. + This feature also can be controlled by device drivers by using the + ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power + management flags. [Typically, they are set at the time the driver is + probed against the device in question by passing them to the + :c:func:`dev_pm_set_driver_flags` helper function.] If the first of + these flags is set, the PM core will not apply the direct-complete + procedure described above to the given device and, consequenty, to any + of its ancestors. The second flag, when set, informs the middle layer + code (bus types, device types, PM domains, classes) that it should take + the return value of the ``->prepare`` callback provided by the driver + into account and it may only return a positive value from its own + ``->prepare`` callback if the driver's one also has returned a positive + value. + 2. The ``->suspend`` methods should quiesce the device to stop it from performing I/O. They also may save the device registers and put it into the appropriate low-power state, depending on the bus type the device is diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index a1b7f7158930..ab4e7d0540c1 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt @@ -961,6 +961,25 @@ dev_pm_ops to indicate that one suspend routine is to be pointed to by the .suspend(), .freeze(), and .poweroff() members and one resume routine is to be pointed to by the .resume(), .thaw(), and .restore() members. +3.1.19. Driver Flags for Power Management + +The PM core allows device drivers to set flags that influence the handling of +power management for the devices by the core itself and by middle layer code +including the PCI bus type. The flags should be set once at the driver probe +time with the help of the dev_pm_set_driver_flags() function and they should not +be updated directly afterwards. + +The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete +mechanism allowing device suspend/resume callbacks to be skipped if the device +is in runtime suspend when the system suspend starts. That also affects all of +the ancestors of the device, so this flag should only be used if absolutely +necessary. + +The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a +positive value from pci_pm_prepare() if the ->prepare callback provided by the +driver of the device returns a positive value. That allows the driver to opt +out from using the direct-complete mechanism dynamically. + 3.2. Device Runtime Power Management ------------------------------------ In addition to providing device power management callbacks PCI device drivers diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index 17e8eb93a76c..b4dcc6144e6b 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -959,11 +959,16 @@ static bool acpi_dev_needs_resume(struct device *dev, struct acpi_device *adev) int acpi_subsys_prepare(struct device *dev) { struct acpi_device *adev = ACPI_COMPANION(dev); - int ret; - ret = pm_generic_prepare(dev); - if (ret < 0) - return ret; + if (dev->driver && dev->driver->pm && dev->driver->pm->prepare) { + int ret = dev->driver->pm->prepare(dev); + + if (ret < 0) + return ret; + + if (!ret && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE)) + return 0; + } if (!adev || !pm_runtime_suspended(dev)) return 0; diff --git a/drivers/base/dd.c b/drivers/base/dd.c index ad44b40fe284..45575e134696 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -464,6 +464,7 @@ pinctrl_bind_failed: if (dev->pm_domain && dev->pm_domain->dismiss) dev->pm_domain->dismiss(dev); pm_runtime_reinit(dev); + dev_pm_set_driver_flags(dev, 0); switch (ret) { case -EPROBE_DEFER: @@ -869,6 +870,7 @@ static void __device_release_driver(struct device *dev, struct device *parent) if (dev->pm_domain && dev->pm_domain->dismiss) dev->pm_domain->dismiss(dev); pm_runtime_reinit(dev); + dev_pm_set_driver_flags(dev, 0); klist_remove(&dev->p->knode_driver); device_pm_check_callbacks(dev); diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 9bbbbb13a9db..c0135cd95ada 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1700,7 +1700,9 @@ unlock: * applies to suspend transitions, however. */ spin_lock_irq(&dev->power.lock); - dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND; + dev->power.direct_complete = state.event == PM_EVENT_SUSPEND && + pm_runtime_suspended(dev) && ret > 0 && + !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP); spin_unlock_irq(&dev->power.lock); return 0; } diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 11bd267fc137..68a32703b30a 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -689,8 +689,11 @@ static int pci_pm_prepare(struct device *dev) if (drv && drv->pm && drv->pm->prepare) { int error = drv->pm->prepare(dev); - if (error) + if (error < 0) return error; + + if (!error && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE)) + return 0; } return pci_dev_keep_suspended(to_pci_dev(dev)); } diff --git a/include/linux/device.h b/include/linux/device.h index c32e6f974d4a..fb9451599aca 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1070,6 +1070,16 @@ static inline void dev_pm_syscore_device(struct device *dev, bool val) #endif } +static inline void dev_pm_set_driver_flags(struct device *dev, u32 flags) +{ + dev->power.driver_flags = flags; +} + +static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags) +{ + return !!(dev->power.driver_flags & flags); +} + static inline void device_lock(struct device *dev) { mutex_lock(&dev->mutex); diff --git a/include/linux/pm.h b/include/linux/pm.h index a0ceeccf2846..f10bad831bfa 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -550,6 +550,25 @@ struct pm_subsys_data { #endif }; +/* + * Driver flags to control system suspend/resume behavior. + * + * These flags can be set by device drivers at the probe time. They need not be + * cleared by the drivers as the driver core will take care of that. + * + * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device. + * SMART_PREPARE: Check the return value of the driver's ->prepare callback. + * + * Setting SMART_PREPARE instructs bus types and PM domains which may want + * system suspend/resume callbacks to be skipped for the device to return 0 from + * their ->prepare callbacks if the driver's ->prepare callback returns 0 (in + * other words, the system suspend/resume callbacks can only be skipped for the + * device if its driver doesn't object against that). This flag has no effect + * if NEVER_SKIP is set. + */ +#define DPM_FLAG_NEVER_SKIP BIT(0) +#define DPM_FLAG_SMART_PREPARE BIT(1) + struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; @@ -561,6 +580,7 @@ struct dev_pm_info { bool is_late_suspended:1; bool early_init:1; /* Owned by the PM core */ bool direct_complete:1; /* Owned by the PM core */ + u32 driver_flags; spinlock_t lock; #ifdef CONFIG_PM_SLEEP struct list_head entry; From c2eac4d3a115e2f511844e7bcf73f4e877fbf5da Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 25 Oct 2017 14:16:46 +0200 Subject: [PATCH 63/88] PCI / PM: Use the NEVER_SKIP driver flag Replace the PCI-specific flag PCI_DEV_FLAGS_NEEDS_RESUME with the PM core's DPM_FLAG_NEVER_SKIP one everywhere and drop it. Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman Acked-by: Bjorn Helgaas Reviewed-by: Ulf Hansson --- drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/misc/mei/pci-me.c | 2 +- drivers/misc/mei/pci-txe.c | 2 +- drivers/pci/pci.c | 3 +-- include/linux/pci.h | 7 +------ 5 files changed, 5 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 9f45cfeae775..f124de3a0668 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1304,7 +1304,7 @@ int i915_driver_load(struct pci_dev *pdev, const struct pci_device_id *ent) * becaue the HDA driver may require us to enable the audio power * domain during system suspend. */ - pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME; + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP); ret = i915_driver_init_early(dev_priv, ent); if (ret < 0) diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c index 4ff40d319676..f17e4b435fa9 100644 --- a/drivers/misc/mei/pci-me.c +++ b/drivers/misc/mei/pci-me.c @@ -223,7 +223,7 @@ static int mei_me_probe(struct pci_dev *pdev, const struct pci_device_id *ent) * MEI requires to resume from runtime suspend mode * in order to perform link reset flow upon system suspend. */ - pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME; + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP); /* * For not wake-able HW runtime pm framework diff --git a/drivers/misc/mei/pci-txe.c b/drivers/misc/mei/pci-txe.c index e38a5f144373..f911a08e3579 100644 --- a/drivers/misc/mei/pci-txe.c +++ b/drivers/misc/mei/pci-txe.c @@ -141,7 +141,7 @@ static int mei_txe_probe(struct pci_dev *pdev, const struct pci_device_id *ent) * MEI requires to resume from runtime suspend mode * in order to perform link reset flow upon system suspend. */ - pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME; + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP); /* * For not wake-able HW runtime pm framework diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 6078dfc11b11..374f5686e2bc 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2166,8 +2166,7 @@ bool pci_dev_keep_suspended(struct pci_dev *pci_dev) if (!pm_runtime_suspended(dev) || pci_target_state(pci_dev, wakeup) != pci_dev->current_state - || platform_pci_need_resume(pci_dev) - || (pci_dev->dev_flags & PCI_DEV_FLAGS_NEEDS_RESUME)) + || platform_pci_need_resume(pci_dev)) return false; /* diff --git a/include/linux/pci.h b/include/linux/pci.h index f4f8ee5a7362..4b65fa4fb94e 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -205,13 +205,8 @@ enum pci_dev_flags { PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9), /* Do not use FLR even if device advertises PCI_AF_CAP */ PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10), - /* - * Resume before calling the driver's system suspend hooks, disabling - * the direct_complete optimization. - */ - PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11), /* Don't use Relaxed Ordering for TLPs directed at this device */ - PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12), + PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11), }; enum pci_irq_reroute_variant { From 0eab11c9ae3b3cc5dd76f20b81d0247647a6e96f Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 26 Oct 2017 12:12:08 +0200 Subject: [PATCH 64/88] PM / core: Add SMART_SUSPEND driver flag Define and document a SMART_SUSPEND flag to instruct bus types and PM domains that the system suspend callbacks provided by the driver can cope with runtime-suspended devices, so from the driver's perspective it should be safe to leave devices in runtime suspend during system suspend. Setting that flag may also cause middle-layer code (bus types, PM domains etc.) to skip invocations of the ->suspend_late and ->suspend_noirq callbacks provided by the driver if the device is in runtime suspend at the beginning of the "late" phase of the system-wide suspend transition, in which case the driver's system-wide resume callbacks may be invoked back-to-back with its ->runtime_suspend callback, so the driver has to be able to cope with that too. Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman Reviewed-by: Ulf Hansson --- Documentation/driver-api/pm/devices.rst | 20 ++++++++++++++++++++ drivers/base/power/main.c | 3 +++ include/linux/pm.h | 8 ++++++++ 3 files changed, 31 insertions(+) diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst index 8add5b302a89..574dadd06dec 100644 --- a/Documentation/driver-api/pm/devices.rst +++ b/Documentation/driver-api/pm/devices.rst @@ -766,6 +766,26 @@ the state of devices (possibly except for resuming them from runtime suspend) from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before* invoking device drivers' ``->suspend`` callbacks (or equivalent). +Some bus types and PM domains have a policy to resume all devices from runtime +suspend upfront in their ``->suspend`` callbacks, but that may not be really +necessary if the driver of the device can cope with runtime-suspended devices. +The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in +:c:member:`power.driver_flags` at the probe time, by passing it to the +:c:func:`dev_pm_set_driver_flags` helper. That also may cause middle-layer code +(bus types, PM domains etc.) to skip the ``->suspend_late`` and +``->suspend_noirq`` callbacks provided by the driver if the device remains in +runtime suspend at the beginning of the ``suspend_late`` phase of system-wide +suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM +has been disabled for it, under the assumption that its state should not change +after that point until the system-wide transition is over. If that happens, the +driver's system-wide resume callbacks, if present, may still be invoked during +the subsequent system-wide resume transition and the device's runtime power +management status may be set to "active" before enabling runtime PM for it, +so the driver must be prepared to cope with the invocation of its system-wide +resume callbacks back-to-back with its ``->runtime_suspend`` one (without the +intervening ``->runtime_resume`` and so on) and the final state of the device +must reflect the "active" status for runtime PM in that case. + During system-wide resume from a sleep state it's easiest to put devices into the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. Refer to that document for more information regarding this particular issue as diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index c0135cd95ada..8d9024017645 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1652,6 +1652,9 @@ static int device_prepare(struct device *dev, pm_message_t state) if (dev->power.syscore) return 0; + WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) && + !pm_runtime_enabled(dev)); + /* * If a device's parent goes into runtime suspend at the wrong time, * it won't be possible to resume the device. To prevent this we diff --git a/include/linux/pm.h b/include/linux/pm.h index f10bad831bfa..43b5418e05bb 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -558,6 +558,7 @@ struct pm_subsys_data { * * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device. * SMART_PREPARE: Check the return value of the driver's ->prepare callback. + * SMART_SUSPEND: No need to resume the device from runtime suspend. * * Setting SMART_PREPARE instructs bus types and PM domains which may want * system suspend/resume callbacks to be skipped for the device to return 0 from @@ -565,9 +566,16 @@ struct pm_subsys_data { * other words, the system suspend/resume callbacks can only be skipped for the * device if its driver doesn't object against that). This flag has no effect * if NEVER_SKIP is set. + * + * Setting SMART_SUSPEND instructs bus types and PM domains which may want to + * runtime resume the device upfront during system suspend that doing so is not + * necessary from the driver's perspective. It also may cause them to skip + * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by + * the driver if they decide to leave the device in runtime suspend. */ #define DPM_FLAG_NEVER_SKIP BIT(0) #define DPM_FLAG_SMART_PREPARE BIT(1) +#define DPM_FLAG_SMART_SUSPEND BIT(2) struct dev_pm_info { pm_message_t power_state; From 302666d8a55ce7eb5fb0bd9fbd9437d74e0ce77c Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 26 Oct 2017 12:12:16 +0200 Subject: [PATCH 65/88] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks The only user of non-empty pcibios_pm_ops is s390 and it only uses "noirq" callbacks, so drop the invocations of the other pcibios_pm_ops callbacks from the PCI PM code. That will allow subsequent changes to be somewhat simpler. Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman Acked-by: Bjorn Helgaas Reviewed-by: Ulf Hansson --- drivers/pci/pci-driver.c | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 68a32703b30a..c1aeeb10539e 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -922,9 +922,6 @@ static int pci_pm_freeze(struct device *dev) return error; } - if (pcibios_pm_ops.freeze) - return pcibios_pm_ops.freeze(dev); - return 0; } @@ -986,12 +983,6 @@ static int pci_pm_thaw(struct device *dev) const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; int error = 0; - if (pcibios_pm_ops.thaw) { - error = pcibios_pm_ops.thaw(dev); - if (error) - return error; - } - if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_resume(dev); @@ -1036,9 +1027,6 @@ static int pci_pm_poweroff(struct device *dev) Fixup: pci_fixup_device(pci_fixup_suspend, pci_dev); - if (pcibios_pm_ops.poweroff) - return pcibios_pm_ops.poweroff(dev); - return 0; } @@ -1111,12 +1099,6 @@ static int pci_pm_restore(struct device *dev) const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; int error = 0; - if (pcibios_pm_ops.restore) { - error = pcibios_pm_ops.restore(dev); - if (error) - return error; - } - /* * This is necessary for the hibernation error path in which restore is * called without restoring the standard config registers of the device. From c4b65157aeefad29b2351a00a010e8c40ce7fd0e Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 26 Oct 2017 12:12:22 +0200 Subject: [PATCH 66/88] PCI / PM: Take SMART_SUSPEND driver flag into account Make the PCI bus type take DPM_FLAG_SMART_SUSPEND into account in its system-wide PM callbacks and make sure that all code that should not run in parallel with pci_pm_runtime_resume() is executed in the "late" phases of system suspend, freeze and poweroff transitions. [Note that the pm_runtime_suspended() check in pci_dev_keep_suspended() is an optimization, because if is not passed, all of the subsequent checks may be skipped and some of them are much more overhead in general.] Also use the observation that if the device is in runtime suspend at the beginning of the "late" phase of a system-wide suspend-like transition, its state cannot change going forward (runtime PM is disabled for it at that time) until the transition is over and the subsequent system-wide PM callbacks should be skipped for it (as they generally assume the device to not be suspended), so add checks for that in pci_pm_suspend_late/noirq(), pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq(). Moreover, if pci_pm_resume_noirq() or pci_pm_restore_noirq() is called during the subsequent system-wide resume transition and if the device was left in runtime suspend previously, its runtime PM status needs to be changed to "active" as it is going to be put into the full-power state, so add checks for that too to these functions. In turn, if pci_pm_thaw_noirq() runs after the device has been left in runtime suspend, the subsequent "thaw" callbacks need to be skipped for it (as they may not work correctly with a suspended device), so set the power.direct_complete flag for the device then to make the PM core skip those callbacks. In addition to the above add a core helper for checking if DPM_FLAG_SMART_SUSPEND is set and the device runtime PM status is "suspended" at the same time, which is done quite often in the new code (and will be done elsewhere going forward too). Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman Acked-by: Bjorn Helgaas --- Documentation/power/pci.txt | 14 +++++ drivers/base/power/main.c | 6 +++ drivers/pci/pci-driver.c | 103 ++++++++++++++++++++++++++++++------ include/linux/pm.h | 2 + 4 files changed, 108 insertions(+), 17 deletions(-) diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index ab4e7d0540c1..304162ea377e 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt @@ -980,6 +980,20 @@ positive value from pci_pm_prepare() if the ->prepare callback provided by the driver of the device returns a positive value. That allows the driver to opt out from using the direct-complete mechanism dynamically. +The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's +perspective the device can be safely left in runtime suspend during system +suspend. That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff() +to skip resuming the device from runtime suspend unless there are PCI-specific +reasons for doing that. Also, it causes pci_pm_suspend_late/noirq(), +pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early +if the device remains in runtime suspend in the beginning of the "late" phase +of the system-wide transition under way. Moreover, if the device is in +runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime +power management status will be changed to "active" (as it is going to be put +into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(), +the function will set the power.direct_complete flag for it (to make the PM core +skip the subsequent "thaw" callbacks for it) and return. + 3.2. Device Runtime Power Management ------------------------------------ In addition to providing device power management callbacks PCI device drivers diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 8d9024017645..6c6f1c74c24c 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1861,3 +1861,9 @@ void device_pm_check_callbacks(struct device *dev) !dev->driver->suspend && !dev->driver->resume)); spin_unlock_irq(&dev->power.lock); } + +bool dev_pm_smart_suspend_and_suspended(struct device *dev) +{ + return dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) && + pm_runtime_status_suspended(dev); +} diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index c1aeeb10539e..d19bd54d337e 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -734,18 +734,25 @@ static int pci_pm_suspend(struct device *dev) if (!pm) { pci_pm_default_suspend(pci_dev); - goto Fixup; + return 0; } /* - * PCI devices suspended at run time need to be resumed at this point, - * because in general it is necessary to reconfigure them for system - * suspend. Namely, if the device is supposed to wake up the system - * from the sleep state, we may need to reconfigure it for this purpose. - * In turn, if the device is not supposed to wake up the system from the - * sleep state, we'll have to prevent it from signaling wake-up. + * PCI devices suspended at run time may need to be resumed at this + * point, because in general it may be necessary to reconfigure them for + * system suspend. Namely, if the device is expected to wake up the + * system from the sleep state, it may have to be reconfigured for this + * purpose, or if the device is not expected to wake up the system from + * the sleep state, it should be prevented from signaling wakeup events + * going forward. + * + * Also if the driver of the device does not indicate that its system + * suspend callbacks can cope with runtime-suspended devices, it is + * better to resume the device from runtime suspend here. */ - pm_runtime_resume(dev); + if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) || + !pci_dev_keep_suspended(pci_dev)) + pm_runtime_resume(dev); pci_dev->state_saved = false; if (pm->suspend) { @@ -765,17 +772,27 @@ static int pci_pm_suspend(struct device *dev) } } - Fixup: - pci_fixup_device(pci_fixup_suspend, pci_dev); - return 0; } +static int pci_pm_suspend_late(struct device *dev) +{ + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev)); + + return pm_generic_suspend_late(dev); +} + static int pci_pm_suspend_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_suspend_late(dev, PMSG_SUSPEND); @@ -834,6 +851,14 @@ static int pci_pm_resume_noirq(struct device *dev) struct device_driver *drv = dev->driver; int error = 0; + /* + * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend + * during system suspend, so update their runtime PM status to "active" + * as they are going to be put into D0 shortly. + */ + if (dev_pm_smart_suspend_and_suspended(dev)) + pm_runtime_set_active(dev); + pci_pm_default_resume_early(pci_dev); if (pci_has_legacy_pm_support(pci_dev)) @@ -876,6 +901,7 @@ static int pci_pm_resume(struct device *dev) #else /* !CONFIG_SUSPEND */ #define pci_pm_suspend NULL +#define pci_pm_suspend_late NULL #define pci_pm_suspend_noirq NULL #define pci_pm_resume NULL #define pci_pm_resume_noirq NULL @@ -910,7 +936,8 @@ static int pci_pm_freeze(struct device *dev) * devices should not be touched during freeze/thaw transitions, * however. */ - pm_runtime_resume(dev); + if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)) + pm_runtime_resume(dev); pci_dev->state_saved = false; if (pm->freeze) { @@ -925,11 +952,22 @@ static int pci_pm_freeze(struct device *dev) return 0; } +static int pci_pm_freeze_late(struct device *dev) +{ + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + return pm_generic_freeze_late(dev);; +} + static int pci_pm_freeze_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); struct device_driver *drv = dev->driver; + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_suspend_late(dev, PMSG_FREEZE); @@ -959,6 +997,16 @@ static int pci_pm_thaw_noirq(struct device *dev) struct device_driver *drv = dev->driver; int error = 0; + /* + * If the device is in runtime suspend, the code below may not work + * correctly with it, so skip that code and make the PM core skip all of + * the subsequent "thaw" callbacks for the device. + */ + if (dev_pm_smart_suspend_and_suspended(dev)) { + dev->power.direct_complete = true; + return 0; + } + if (pcibios_pm_ops.thaw_noirq) { error = pcibios_pm_ops.thaw_noirq(dev); if (error) @@ -1008,11 +1056,13 @@ static int pci_pm_poweroff(struct device *dev) if (!pm) { pci_pm_default_suspend(pci_dev); - goto Fixup; + return 0; } /* The reason to do that is the same as in pci_pm_suspend(). */ - pm_runtime_resume(dev); + if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) || + !pci_dev_keep_suspended(pci_dev)) + pm_runtime_resume(dev); pci_dev->state_saved = false; if (pm->poweroff) { @@ -1024,17 +1074,27 @@ static int pci_pm_poweroff(struct device *dev) return error; } - Fixup: - pci_fixup_device(pci_fixup_suspend, pci_dev); - return 0; } +static int pci_pm_poweroff_late(struct device *dev) +{ + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev)); + + return pm_generic_poweroff_late(dev); +} + static int pci_pm_poweroff_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); struct device_driver *drv = dev->driver; + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + if (pci_has_legacy_pm_support(to_pci_dev(dev))) return pci_legacy_suspend_late(dev, PMSG_HIBERNATE); @@ -1076,6 +1136,10 @@ static int pci_pm_restore_noirq(struct device *dev) struct device_driver *drv = dev->driver; int error = 0; + /* This is analogous to the pci_pm_resume_noirq() case. */ + if (dev_pm_smart_suspend_and_suspended(dev)) + pm_runtime_set_active(dev); + if (pcibios_pm_ops.restore_noirq) { error = pcibios_pm_ops.restore_noirq(dev); if (error) @@ -1124,10 +1188,12 @@ static int pci_pm_restore(struct device *dev) #else /* !CONFIG_HIBERNATE_CALLBACKS */ #define pci_pm_freeze NULL +#define pci_pm_freeze_late NULL #define pci_pm_freeze_noirq NULL #define pci_pm_thaw NULL #define pci_pm_thaw_noirq NULL #define pci_pm_poweroff NULL +#define pci_pm_poweroff_late NULL #define pci_pm_poweroff_noirq NULL #define pci_pm_restore NULL #define pci_pm_restore_noirq NULL @@ -1243,10 +1309,13 @@ static const struct dev_pm_ops pci_dev_pm_ops = { .prepare = pci_pm_prepare, .complete = pci_pm_complete, .suspend = pci_pm_suspend, + .suspend_late = pci_pm_suspend_late, .resume = pci_pm_resume, .freeze = pci_pm_freeze, + .freeze_late = pci_pm_freeze_late, .thaw = pci_pm_thaw, .poweroff = pci_pm_poweroff, + .poweroff_late = pci_pm_poweroff_late, .restore = pci_pm_restore, .suspend_noirq = pci_pm_suspend_noirq, .resume_noirq = pci_pm_resume_noirq, diff --git a/include/linux/pm.h b/include/linux/pm.h index 43b5418e05bb..65d39115f06d 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -765,6 +765,8 @@ extern int pm_generic_poweroff_late(struct device *dev); extern int pm_generic_poweroff(struct device *dev); extern void pm_generic_complete(struct device *dev); +extern bool dev_pm_smart_suspend_and_suspended(struct device *dev); + #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) From 05087360fd7acf2cc9b7bbb243c12765c44c7693 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 27 Oct 2017 10:10:16 +0200 Subject: [PATCH 67/88] ACPI / PM: Take SMART_SUSPEND driver flag into account Make the ACPI PM domain take DPM_FLAG_SMART_SUSPEND into account in its system suspend callbacks. [Note that the pm_runtime_suspended() check in acpi_dev_needs_resume() is an optimization, because if is not passed, all of the subsequent checks may be skipped and some of them are much more overhead in general.] Also use the observation that if the device is in runtime suspend at the beginning of the "late" phase of a system-wide suspend-like transition, its state cannot change going forward (runtime PM is disabled for it at that time) until the transition is over and the subsequent system-wide PM callbacks should be skipped for it (as they generally assume the device to not be suspended), so add checks for that in acpi_subsys_suspend_late/noirq() and acpi_subsys_freeze_late/noirq(). Moreover, if acpi_subsys_resume_noirq() is called during the subsequent system-wide resume transition and if the device was left in runtime suspend previously, its runtime PM status needs to be changed to "active" as it is going to be put into the full-power state going forward, so add a check for that too in there. In turn, if acpi_subsys_thaw_noirq() runs after the device has been left in runtime suspend, the subsequent "thaw" callbacks need to be skipped for it (as they may not work correctly with a suspended device), so set the power.direct_complete flag for the device then to make the PM core skip those callbacks. On top of the above, make the analogous changes in the acpi_lpss driver that uses the ACPI PM domain callbacks. Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman --- drivers/acpi/acpi_lpss.c | 13 ++++- drivers/acpi/device_pm.c | 113 +++++++++++++++++++++++++++++++++++---- include/linux/acpi.h | 10 ++++ 3 files changed, 126 insertions(+), 10 deletions(-) diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 04d32bdb5a95..de7385b824e1 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -849,8 +849,12 @@ static int acpi_lpss_resume(struct device *dev) #ifdef CONFIG_PM_SLEEP static int acpi_lpss_suspend_late(struct device *dev) { - int ret = pm_generic_suspend_late(dev); + int ret; + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + ret = pm_generic_suspend_late(dev); return ret ? ret : acpi_lpss_suspend(dev, device_may_wakeup(dev)); } @@ -889,10 +893,17 @@ static struct dev_pm_domain acpi_lpss_pm_domain = { .complete = acpi_subsys_complete, .suspend = acpi_subsys_suspend, .suspend_late = acpi_lpss_suspend_late, + .suspend_noirq = acpi_subsys_suspend_noirq, + .resume_noirq = acpi_subsys_resume_noirq, .resume_early = acpi_lpss_resume_early, .freeze = acpi_subsys_freeze, + .freeze_late = acpi_subsys_freeze_late, + .freeze_noirq = acpi_subsys_freeze_noirq, + .thaw_noirq = acpi_subsys_thaw_noirq, .poweroff = acpi_subsys_suspend, .poweroff_late = acpi_lpss_suspend_late, + .poweroff_noirq = acpi_subsys_suspend_noirq, + .restore_noirq = acpi_subsys_resume_noirq, .restore_early = acpi_lpss_resume_early, #endif .runtime_suspend = acpi_lpss_runtime_suspend, diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index b4dcc6144e6b..3d6ec51d2bbc 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -936,7 +936,8 @@ static bool acpi_dev_needs_resume(struct device *dev, struct acpi_device *adev) u32 sys_target = acpi_target_system_state(); int ret, state; - if (device_may_wakeup(dev) != !!adev->wakeup.prepare_count) + if (!pm_runtime_suspended(dev) || !adev || + device_may_wakeup(dev) != !!adev->wakeup.prepare_count) return true; if (sys_target == ACPI_STATE_S0) @@ -970,9 +971,6 @@ int acpi_subsys_prepare(struct device *dev) return 0; } - if (!adev || !pm_runtime_suspended(dev)) - return 0; - return !acpi_dev_needs_resume(dev, adev); } EXPORT_SYMBOL_GPL(acpi_subsys_prepare); @@ -998,12 +996,17 @@ EXPORT_SYMBOL_GPL(acpi_subsys_complete); * acpi_subsys_suspend - Run the device driver's suspend callback. * @dev: Device to handle. * - * Follow PCI and resume devices suspended at run time before running their - * system suspend callbacks. + * Follow PCI and resume devices from runtime suspend before running their + * system suspend callbacks, unless the driver can cope with runtime-suspended + * devices during system suspend and there are no ACPI-specific reasons for + * resuming them. */ int acpi_subsys_suspend(struct device *dev) { - pm_runtime_resume(dev); + if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) || + acpi_dev_needs_resume(dev, ACPI_COMPANION(dev))) + pm_runtime_resume(dev); + return pm_generic_suspend(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_suspend); @@ -1017,11 +1020,47 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend); */ int acpi_subsys_suspend_late(struct device *dev) { - int ret = pm_generic_suspend_late(dev); + int ret; + + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + ret = pm_generic_suspend_late(dev); return ret ? ret : acpi_dev_suspend(dev, device_may_wakeup(dev)); } EXPORT_SYMBOL_GPL(acpi_subsys_suspend_late); +/** + * acpi_subsys_suspend_noirq - Run the device driver's "noirq" suspend callback. + * @dev: Device to suspend. + */ +int acpi_subsys_suspend_noirq(struct device *dev) +{ + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + return pm_generic_suspend_noirq(dev); +} +EXPORT_SYMBOL_GPL(acpi_subsys_suspend_noirq); + +/** + * acpi_subsys_resume_noirq - Run the device driver's "noirq" resume callback. + * @dev: Device to handle. + */ +int acpi_subsys_resume_noirq(struct device *dev) +{ + /* + * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend + * during system suspend, so update their runtime PM status to "active" + * as they will be put into D0 going forward. + */ + if (dev_pm_smart_suspend_and_suspended(dev)) + pm_runtime_set_active(dev); + + return pm_generic_resume_noirq(dev); +} +EXPORT_SYMBOL_GPL(acpi_subsys_resume_noirq); + /** * acpi_subsys_resume_early - Resume device using ACPI. * @dev: Device to Resume. @@ -1049,11 +1088,60 @@ int acpi_subsys_freeze(struct device *dev) * runtime-suspended devices should not be touched during freeze/thaw * transitions. */ - pm_runtime_resume(dev); + if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)) + pm_runtime_resume(dev); + return pm_generic_freeze(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_freeze); +/** + * acpi_subsys_freeze_late - Run the device driver's "late" freeze callback. + * @dev: Device to handle. + */ +int acpi_subsys_freeze_late(struct device *dev) +{ + + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + return pm_generic_freeze_late(dev); +} +EXPORT_SYMBOL_GPL(acpi_subsys_freeze_late); + +/** + * acpi_subsys_freeze_noirq - Run the device driver's "noirq" freeze callback. + * @dev: Device to handle. + */ +int acpi_subsys_freeze_noirq(struct device *dev) +{ + + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + return pm_generic_freeze_noirq(dev); +} +EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq); + +/** + * acpi_subsys_thaw_noirq - Run the device driver's "noirq" thaw callback. + * @dev: Device to handle. + */ +int acpi_subsys_thaw_noirq(struct device *dev) +{ + /* + * If the device is in runtime suspend, the "thaw" code may not work + * correctly with it, so skip the driver callback and make the PM core + * skip all of the subsequent "thaw" callbacks for the device. + */ + if (dev_pm_smart_suspend_and_suspended(dev)) { + dev->power.direct_complete = true; + return 0; + } + + return pm_generic_thaw_noirq(dev); +} +EXPORT_SYMBOL_GPL(acpi_subsys_thaw_noirq); #endif /* CONFIG_PM_SLEEP */ static struct dev_pm_domain acpi_general_pm_domain = { @@ -1065,10 +1153,17 @@ static struct dev_pm_domain acpi_general_pm_domain = { .complete = acpi_subsys_complete, .suspend = acpi_subsys_suspend, .suspend_late = acpi_subsys_suspend_late, + .suspend_noirq = acpi_subsys_suspend_noirq, + .resume_noirq = acpi_subsys_resume_noirq, .resume_early = acpi_subsys_resume_early, .freeze = acpi_subsys_freeze, + .freeze_late = acpi_subsys_freeze_late, + .freeze_noirq = acpi_subsys_freeze_noirq, + .thaw_noirq = acpi_subsys_thaw_noirq, .poweroff = acpi_subsys_suspend, .poweroff_late = acpi_subsys_suspend_late, + .poweroff_noirq = acpi_subsys_suspend_noirq, + .restore_noirq = acpi_subsys_resume_noirq, .restore_early = acpi_subsys_resume_early, #endif }, diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 0ada2a948b44..dc1ebfeeb5ec 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -885,17 +885,27 @@ int acpi_dev_suspend_late(struct device *dev); int acpi_subsys_prepare(struct device *dev); void acpi_subsys_complete(struct device *dev); int acpi_subsys_suspend_late(struct device *dev); +int acpi_subsys_suspend_noirq(struct device *dev); +int acpi_subsys_resume_noirq(struct device *dev); int acpi_subsys_resume_early(struct device *dev); int acpi_subsys_suspend(struct device *dev); int acpi_subsys_freeze(struct device *dev); +int acpi_subsys_freeze_late(struct device *dev); +int acpi_subsys_freeze_noirq(struct device *dev); +int acpi_subsys_thaw_noirq(struct device *dev); #else static inline int acpi_dev_resume_early(struct device *dev) { return 0; } static inline int acpi_subsys_prepare(struct device *dev) { return 0; } static inline void acpi_subsys_complete(struct device *dev) {} static inline int acpi_subsys_suspend_late(struct device *dev) { return 0; } +static inline int acpi_subsys_suspend_noirq(struct device *dev) { return 0; } +static inline int acpi_subsys_resume_noirq(struct device *dev) { return 0; } static inline int acpi_subsys_resume_early(struct device *dev) { return 0; } static inline int acpi_subsys_suspend(struct device *dev) { return 0; } static inline int acpi_subsys_freeze(struct device *dev) { return 0; } +static inline int acpi_subsys_freeze_late(struct device *dev) { return 0; } +static inline int acpi_subsys_freeze_noirq(struct device *dev) { return 0; } +static inline int acpi_subsys_thaw_noirq(struct device *dev) { return 0; } #endif #ifdef CONFIG_ACPI From 95a20ef6f7e54c6a982715a7d0da2fd81790db28 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 7 Nov 2017 13:48:11 +0100 Subject: [PATCH 68/88] PM / Domains: Allow genpd users to specify default active wakeup behavior It is quite common for PM Domains to require slave devices to be kept active during system suspend if they are to be used as wakeup sources. To enable this, currently each PM Domain or driver has to provide its own gpd_dev_ops.active_wakeup() callback. Introduce a new flag GENPD_FLAG_ACTIVE_WAKEUP to consolidate this. If specified, all slave devices configured as wakeup sources will be kept active during system suspend. PM Domains that need more fine-grained controls, based on the slave device, can still provide their own callbacks, as before. Signed-off-by: Geert Uytterhoeven Acked-by: Ulf Hansson Reviewed-by: Kevin Hilman Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 3 +++ include/linux/pm_domain.h | 7 ++++--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 7e01ae364d78..e343844357c8 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -124,6 +124,7 @@ static const struct genpd_lock_ops genpd_spin_ops = { #define genpd_status_on(genpd) (genpd->status == GPD_STATE_ACTIVE) #define genpd_is_irq_safe(genpd) (genpd->flags & GENPD_FLAG_IRQ_SAFE) #define genpd_is_always_on(genpd) (genpd->flags & GENPD_FLAG_ALWAYS_ON) +#define genpd_is_active_wakeup(genpd) (genpd->flags & GENPD_FLAG_ACTIVE_WAKEUP) static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev, const struct generic_pm_domain *genpd) @@ -868,6 +869,8 @@ static bool genpd_present(const struct generic_pm_domain *genpd) static bool genpd_dev_active_wakeup(const struct generic_pm_domain *genpd, struct device *dev) { + if (genpd_is_active_wakeup(genpd)) + return true; return GENPD_DEV_CALLBACK(genpd, bool, active_wakeup, dev); } diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 9af0356bd69c..28c24c58d479 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -18,9 +18,10 @@ #include /* Defines used for the flags field in the struct generic_pm_domain */ -#define GENPD_FLAG_PM_CLK (1U << 0) /* PM domain uses PM clk */ -#define GENPD_FLAG_IRQ_SAFE (1U << 1) /* PM domain operates in atomic */ -#define GENPD_FLAG_ALWAYS_ON (1U << 2) /* PM domain is always powered on */ +#define GENPD_FLAG_PM_CLK (1U << 0) /* PM domain uses PM clk */ +#define GENPD_FLAG_IRQ_SAFE (1U << 1) /* PM domain operates in atomic */ +#define GENPD_FLAG_ALWAYS_ON (1U << 2) /* PM domain is always powered on */ +#define GENPD_FLAG_ACTIVE_WAKEUP (1U << 3) /* Keep devices active if wakeup */ enum gpd_status { GPD_STATE_ACTIVE = 0, /* PM domain is active */ From eb0ddf9dd22be098301ab8a09e9be5a13ae8c804 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 7 Nov 2017 13:48:12 +0100 Subject: [PATCH 69/88] ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP Set the newly introduced GENPD_FLAG_ACTIVE_WAKEUP, which allows to remove the driver's own "always true" callback. Signed-off-by: Geert Uytterhoeven Acked-by: Ulf Hansson Acked-by: Simon Horman Signed-off-by: Rafael J. Wysocki --- arch/arm/mach-shmobile/pm-rmobile.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/arm/mach-shmobile/pm-rmobile.c b/arch/arm/mach-shmobile/pm-rmobile.c index 3a4ed4c33a68..e348bcfe389d 100644 --- a/arch/arm/mach-shmobile/pm-rmobile.c +++ b/arch/arm/mach-shmobile/pm-rmobile.c @@ -120,18 +120,12 @@ static int rmobile_pd_power_up(struct generic_pm_domain *genpd) return __rmobile_pd_power_up(to_rmobile_pd(genpd), true); } -static bool rmobile_pd_active_wakeup(struct device *dev) -{ - return true; -} - static void rmobile_init_pm_domain(struct rmobile_pm_domain *rmobile_pd) { struct generic_pm_domain *genpd = &rmobile_pd->genpd; struct dev_power_governor *gov = rmobile_pd->gov; - genpd->flags |= GENPD_FLAG_PM_CLK; - genpd->dev_ops.active_wakeup = rmobile_pd_active_wakeup; + genpd->flags |= GENPD_FLAG_PM_CLK | GENPD_FLAG_ACTIVE_WAKEUP; genpd->power_off = rmobile_pd_power_down; genpd->power_on = rmobile_pd_power_up; genpd->attach_dev = cpg_mstp_attach_dev; From 7534d181a8e60dff0c2a8e12aa6515a87a25b47d Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 7 Nov 2017 13:48:13 +0100 Subject: [PATCH 70/88] soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP Set the newly introduced GENPD_FLAG_ACTIVE_WAKEUP, which allows to remove the driver's own flag-based callback. Signed-off-by: Geert Uytterhoeven Acked-by: Ulf Hansson Acked-by: Matthias Brugger Signed-off-by: Rafael J. Wysocki --- drivers/soc/mediatek/mtk-scpsys.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/drivers/soc/mediatek/mtk-scpsys.c b/drivers/soc/mediatek/mtk-scpsys.c index e1ce8b1b5090..e570b6af2e6f 100644 --- a/drivers/soc/mediatek/mtk-scpsys.c +++ b/drivers/soc/mediatek/mtk-scpsys.c @@ -361,17 +361,6 @@ out: return ret; } -static bool scpsys_active_wakeup(struct device *dev) -{ - struct generic_pm_domain *genpd; - struct scp_domain *scpd; - - genpd = pd_to_genpd(dev->pm_domain); - scpd = container_of(genpd, struct scp_domain, genpd); - - return scpd->data->active_wakeup; -} - static void init_clks(struct platform_device *pdev, struct clk **clk) { int i; @@ -466,7 +455,8 @@ static struct scp *init_scp(struct platform_device *pdev, genpd->name = data->name; genpd->power_off = scpsys_power_off; genpd->power_on = scpsys_power_on; - genpd->dev_ops.active_wakeup = scpsys_active_wakeup; + if (scpd->data->active_wakeup) + genpd->flags |= GENPD_FLAG_ACTIVE_WAKEUP; } return scp; From 89c7aea915c0c9820191a533e1f304e234074b2d Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 7 Nov 2017 13:48:14 +0100 Subject: [PATCH 71/88] soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP Set the newly introduced GENPD_FLAG_ACTIVE_WAKEUP, which allows to remove the driver's own flag-based callback. Signed-off-by: Geert Uytterhoeven Acked-by: Ulf Hansson Acked-by: Heiko Stuebner Signed-off-by: Rafael J. Wysocki --- drivers/soc/rockchip/pm_domains.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/drivers/soc/rockchip/pm_domains.c b/drivers/soc/rockchip/pm_domains.c index 40b75748835f..5c342167b9db 100644 --- a/drivers/soc/rockchip/pm_domains.c +++ b/drivers/soc/rockchip/pm_domains.c @@ -358,17 +358,6 @@ static void rockchip_pd_detach_dev(struct generic_pm_domain *genpd, pm_clk_destroy(dev); } -static bool rockchip_active_wakeup(struct device *dev) -{ - struct generic_pm_domain *genpd; - struct rockchip_pm_domain *pd; - - genpd = pd_to_genpd(dev->pm_domain); - pd = container_of(genpd, struct rockchip_pm_domain, genpd); - - return pd->info->active_wakeup; -} - static int rockchip_pm_add_one_domain(struct rockchip_pmu *pmu, struct device_node *node) { @@ -489,8 +478,9 @@ static int rockchip_pm_add_one_domain(struct rockchip_pmu *pmu, pd->genpd.power_on = rockchip_pd_power_on; pd->genpd.attach_dev = rockchip_pd_attach_dev; pd->genpd.detach_dev = rockchip_pd_detach_dev; - pd->genpd.dev_ops.active_wakeup = rockchip_active_wakeup; pd->genpd.flags = GENPD_FLAG_PM_CLK; + if (pd_info->active_wakeup) + pd->genpd.flags |= GENPD_FLAG_ACTIVE_WAKEUP; pm_genpd_init(&pd->genpd, NULL, false); pmu->genpd_data.domains[id] = &pd->genpd; From d0af45f1f6528949e05385976eb61c5ebd31854e Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 7 Nov 2017 13:48:15 +0100 Subject: [PATCH 72/88] PM / Domains: Remove gpd_dev_ops.active_wakeup() callback There are no more users left of the gpd_dev_ops.active_wakeup() callback. All have been converted to GENPD_FLAG_ACTIVE_WAKEUP. Hence remove the callback. Signed-off-by: Geert Uytterhoeven Acked-by: Ulf Hansson Reviewed-by: Kevin Hilman Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 14 +++----------- include/linux/pm_domain.h | 1 - 2 files changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index e343844357c8..65bb40c240fb 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -866,14 +866,6 @@ static bool genpd_present(const struct generic_pm_domain *genpd) #ifdef CONFIG_PM_SLEEP -static bool genpd_dev_active_wakeup(const struct generic_pm_domain *genpd, - struct device *dev) -{ - if (genpd_is_active_wakeup(genpd)) - return true; - return GENPD_DEV_CALLBACK(genpd, bool, active_wakeup, dev); -} - /** * genpd_sync_power_off - Synchronously power off a PM domain and its masters. * @genpd: PM domain to power off, if possible. @@ -978,7 +970,7 @@ static bool resume_needed(struct device *dev, if (!device_can_wakeup(dev)) return false; - active_wakeup = genpd_dev_active_wakeup(genpd, dev); + active_wakeup = genpd_is_active_wakeup(genpd); return device_may_wakeup(dev) ? active_wakeup : !active_wakeup; } @@ -1047,7 +1039,7 @@ static int genpd_finish_suspend(struct device *dev, bool poweroff) if (IS_ERR(genpd)) return -EINVAL; - if (dev->power.wakeup_path && genpd_dev_active_wakeup(genpd, dev)) + if (dev->power.wakeup_path && genpd_is_active_wakeup(genpd)) return 0; if (poweroff) @@ -1102,7 +1094,7 @@ static int genpd_resume_noirq(struct device *dev) if (IS_ERR(genpd)) return -EINVAL; - if (dev->power.wakeup_path && genpd_dev_active_wakeup(genpd, dev)) + if (dev->power.wakeup_path && genpd_is_active_wakeup(genpd)) return 0; genpd_lock(genpd); diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 28c24c58d479..04dbef9847d3 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -36,7 +36,6 @@ struct dev_power_governor { struct gpd_dev_ops { int (*start)(struct device *dev); int (*stop)(struct device *dev); - bool (*active_wakeup)(struct device *dev); }; struct genpd_power_state { From 704d2ce6603f7e40bb607ae9452ff18a4cec701f Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 7 Nov 2017 02:23:18 +0100 Subject: [PATCH 73/88] PM / domains: Rework governor code to be more consistent The genpd governor currently uses negative PM QoS values to indicate the "no suspend" condition and 0 as "no restriction", but it doesn't use them consistently. Moreover, it tries to refresh QoS values for already suspended devices in a quite questionable way. For the above reasons, rework it to be a bit more consistent. First off, note that dev_pm_qos_read_value() in dev_update_qos_constraint() and __default_power_down_ok() is evaluated for devices in suspend. Moreover, that only happens if the effective_constraint_ns value for them is negative (meaning "no suspend"). It is not evaluated in any other cases, so effectively the QoS values are only updated for devices in suspend that should not have been suspended in the first place. In all of the other cases, the QoS values taken into account are the effective ones from the time before the device has been suspended, so generally devices need to be resumed and suspended again for new QoS values to take effect anyway. Thus evaluating dev_update_qos_constraint() in those two places doesn't make sense at all, so drop it. Second, initialize effective_constraint_ns to 0 ("no constraint") rather than to (-1) ("no suspend"), which makes more sense in general and in case effective_constraint_ns is never updated (the device is in suspend all the time or it is never suspended) it doesn't affect the device's parent and so on. Finally, rework default_suspend_ok() to explicitly handle the "no restriction" and "no suspend" special cases. Signed-off-by: Rafael J. Wysocki Tested-by: Geert Uytterhoeven Tested-by: Tero Kristo Reviewed-by: Ramesh Thomas --- drivers/base/power/domain.c | 2 +- drivers/base/power/domain_governor.c | 71 +++++++++++++++++++--------- 2 files changed, 50 insertions(+), 23 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 65bb40c240fb..b914e373a478 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -1328,7 +1328,7 @@ static struct generic_pm_domain_data *genpd_alloc_dev_data(struct device *dev, gpd_data->base.dev = dev; gpd_data->td.constraint_changed = true; - gpd_data->td.effective_constraint_ns = -1; + gpd_data->td.effective_constraint_ns = 0; gpd_data->nb.notifier_call = genpd_dev_pm_qos_notifier; spin_lock_irq(&dev->power.lock); diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c index 281f949c5ffe..e4cca8adab32 100644 --- a/drivers/base/power/domain_governor.c +++ b/drivers/base/power/domain_governor.c @@ -14,22 +14,33 @@ static int dev_update_qos_constraint(struct device *dev, void *data) { s64 *constraint_ns_p = data; - s32 constraint_ns = -1; + s64 constraint_ns; - if (dev->power.subsys_data && dev->power.subsys_data->domain_data) + if (dev->power.subsys_data && dev->power.subsys_data->domain_data) { + /* + * Only take suspend-time QoS constraints of devices into + * account, because constraints updated after the device has + * been suspended are not guaranteed to be taken into account + * anyway. In order for them to take effect, the device has to + * be resumed and suspended again. + */ constraint_ns = dev_gpd_data(dev)->td.effective_constraint_ns; - - if (constraint_ns < 0) { + } else { + /* + * The child is not in a domain and there's no info on its + * suspend/resume latencies, so assume them to be negligible and + * take its current PM QoS constraint (that's the only thing + * known at this point anyway). + */ constraint_ns = dev_pm_qos_read_value(dev); - constraint_ns *= NSEC_PER_USEC; + if (constraint_ns > 0) + constraint_ns *= NSEC_PER_USEC; } + + /* 0 means "no constraint" */ if (constraint_ns == 0) return 0; - /* - * constraint_ns cannot be negative here, because the device has been - * suspended. - */ if (constraint_ns < *constraint_ns_p || *constraint_ns_p == 0) *constraint_ns_p = constraint_ns; @@ -76,14 +87,32 @@ static bool default_suspend_ok(struct device *dev) device_for_each_child(dev, &constraint_ns, dev_update_qos_constraint); - if (constraint_ns > 0) { + if (constraint_ns == 0) { + /* "No restriction", so the device is allowed to suspend. */ + td->effective_constraint_ns = 0; + td->cached_suspend_ok = true; + } else if (constraint_ns < 0) { + /* + * This triggers if one of the children that don't belong to a + * domain has a negative PM QoS constraint and it's better not + * to suspend then. effective_constraint_ns is negative already + * and cached_suspend_ok is false, so bail out. + */ + return false; + } else { constraint_ns -= td->suspend_latency_ns + td->resume_latency_ns; - if (constraint_ns == 0) + /* + * effective_constraint_ns is negative already and + * cached_suspend_ok is false, so if the computed value is not + * positive, return right away. + */ + if (constraint_ns <= 0) return false; + + td->effective_constraint_ns = constraint_ns; + td->cached_suspend_ok = true; } - td->effective_constraint_ns = constraint_ns; - td->cached_suspend_ok = constraint_ns >= 0; /* * The children have been suspended already, so we don't need to take @@ -144,18 +173,16 @@ static bool __default_power_down_ok(struct dev_pm_domain *pd, */ td = &to_gpd_data(pdd)->td; constraint_ns = td->effective_constraint_ns; - /* default_suspend_ok() need not be called before us. */ - if (constraint_ns < 0) { - constraint_ns = dev_pm_qos_read_value(pdd->dev); - constraint_ns *= NSEC_PER_USEC; - } + /* + * Negative values mean "no suspend at all" and this runs only + * when all devices in the domain are suspended, so it must be + * 0 at least. + * + * 0 means "no constraint" + */ if (constraint_ns == 0) continue; - /* - * constraint_ns cannot be negative here, because the device has - * been suspended. - */ if (constraint_ns <= off_on_time_ns) return false; From 0759e80b84e34a84e7e46e2b1adb528c83d84a47 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 7 Nov 2017 11:33:49 +0100 Subject: [PATCH 74/88] PM / QoS: Fix device resume latency framework The special value of 0 for device resume latency PM QoS means "no restriction", but there are two problems with that. First, device resume latency PM QoS requests with 0 as the value are always put in front of requests with positive values in the priority lists used internally by the PM QoS framework, causing 0 to be chosen as an effective constraint value. However, that 0 is then interpreted as "no restriction" effectively overriding the other requests with specific restrictions which is incorrect. Second, the users of device resume latency PM QoS have no way to specify that *any* resume latency at all should be avoided, which is an artificial limitation in general. To address these issues, modify device resume latency PM QoS to use S32_MAX as the "no constraint" value and 0 as the "no latency at all" one and rework its users (the cpuidle menu governor, the genpd QoS governor and the runtime PM framework) to follow these changes. Also add a special "n/a" value to the corresponding user space I/F to allow user space to indicate that it cannot accept any resume latencies at all for the given device. Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency constraints) Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323 Reported-by: Reinette Chatre Signed-off-by: Rafael J. Wysocki Tested-by: Reinette Chatre Tested-by: Geert Uytterhoeven Tested-by: Tero Kristo Reviewed-by: Ramesh Thomas --- Documentation/ABI/testing/sysfs-devices-power | 4 +- drivers/base/cpu.c | 3 +- drivers/base/power/domain.c | 2 +- drivers/base/power/domain_governor.c | 40 ++++++++----------- drivers/base/power/qos.c | 5 ++- drivers/base/power/runtime.c | 2 +- drivers/base/power/sysfs.c | 25 ++++++++++-- drivers/cpuidle/governors/menu.c | 4 +- include/linux/pm_qos.h | 26 ++++++++---- 9 files changed, 68 insertions(+), 43 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power index f4b24c327665..80a00f7b6667 100644 --- a/Documentation/ABI/testing/sysfs-devices-power +++ b/Documentation/ABI/testing/sysfs-devices-power @@ -211,7 +211,9 @@ Description: device, after it has been suspended at run time, from a resume request to the moment the device will be ready to process I/O, in microseconds. If it is equal to 0, however, this means that - the PM QoS resume latency may be arbitrary. + the PM QoS resume latency may be arbitrary and the special value + "n/a" means that user space cannot accept any resume latency at + all for the given device. Not all drivers support this attribute. If it isn't supported, it is not present. diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 321cd7b4d817..227bac5f1191 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -377,7 +377,8 @@ int register_cpu(struct cpu *cpu, int num) per_cpu(cpu_sys_devices, num) = &cpu->dev; register_cpu_under_node(num, cpu_to_node(num)); - dev_pm_qos_expose_latency_limit(&cpu->dev, 0); + dev_pm_qos_expose_latency_limit(&cpu->dev, + PM_QOS_RESUME_LATENCY_NO_CONSTRAINT); return 0; } diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 679c79545e42..24e39ce27bd8 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -1326,7 +1326,7 @@ static struct generic_pm_domain_data *genpd_alloc_dev_data(struct device *dev, gpd_data->base.dev = dev; gpd_data->td.constraint_changed = true; - gpd_data->td.effective_constraint_ns = 0; + gpd_data->td.effective_constraint_ns = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS; gpd_data->nb.notifier_call = genpd_dev_pm_qos_notifier; spin_lock_irq(&dev->power.lock); diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c index e4cca8adab32..99896fbf18e4 100644 --- a/drivers/base/power/domain_governor.c +++ b/drivers/base/power/domain_governor.c @@ -33,15 +33,10 @@ static int dev_update_qos_constraint(struct device *dev, void *data) * known at this point anyway). */ constraint_ns = dev_pm_qos_read_value(dev); - if (constraint_ns > 0) - constraint_ns *= NSEC_PER_USEC; + constraint_ns *= NSEC_PER_USEC; } - /* 0 means "no constraint" */ - if (constraint_ns == 0) - return 0; - - if (constraint_ns < *constraint_ns_p || *constraint_ns_p == 0) + if (constraint_ns < *constraint_ns_p) *constraint_ns_p = constraint_ns; return 0; @@ -69,12 +64,12 @@ static bool default_suspend_ok(struct device *dev) } td->constraint_changed = false; td->cached_suspend_ok = false; - td->effective_constraint_ns = -1; + td->effective_constraint_ns = 0; constraint_ns = __dev_pm_qos_read_value(dev); spin_unlock_irqrestore(&dev->power.lock, flags); - if (constraint_ns < 0) + if (constraint_ns == 0) return false; constraint_ns *= NSEC_PER_USEC; @@ -87,25 +82,25 @@ static bool default_suspend_ok(struct device *dev) device_for_each_child(dev, &constraint_ns, dev_update_qos_constraint); - if (constraint_ns == 0) { + if (constraint_ns == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS) { /* "No restriction", so the device is allowed to suspend. */ - td->effective_constraint_ns = 0; + td->effective_constraint_ns = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS; td->cached_suspend_ok = true; - } else if (constraint_ns < 0) { + } else if (constraint_ns == 0) { /* * This triggers if one of the children that don't belong to a - * domain has a negative PM QoS constraint and it's better not - * to suspend then. effective_constraint_ns is negative already - * and cached_suspend_ok is false, so bail out. + * domain has a zero PM QoS constraint and it's better not to + * suspend then. effective_constraint_ns is zero already and + * cached_suspend_ok is false, so bail out. */ return false; } else { constraint_ns -= td->suspend_latency_ns + td->resume_latency_ns; /* - * effective_constraint_ns is negative already and - * cached_suspend_ok is false, so if the computed value is not - * positive, return right away. + * effective_constraint_ns is zero already and cached_suspend_ok + * is false, so if the computed value is not positive, return + * right away. */ if (constraint_ns <= 0) return false; @@ -174,13 +169,10 @@ static bool __default_power_down_ok(struct dev_pm_domain *pd, td = &to_gpd_data(pdd)->td; constraint_ns = td->effective_constraint_ns; /* - * Negative values mean "no suspend at all" and this runs only - * when all devices in the domain are suspended, so it must be - * 0 at least. - * - * 0 means "no constraint" + * Zero means "no suspend at all" and this runs only when all + * devices in the domain are suspended, so it must be positive. */ - if (constraint_ns == 0) + if (constraint_ns == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS) continue; if (constraint_ns <= off_on_time_ns) diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c index 277d43a83f53..3382542b39b7 100644 --- a/drivers/base/power/qos.c +++ b/drivers/base/power/qos.c @@ -139,6 +139,9 @@ static int apply_constraint(struct dev_pm_qos_request *req, switch(req->type) { case DEV_PM_QOS_RESUME_LATENCY: + if (WARN_ON(action != PM_QOS_REMOVE_REQ && value < 0)) + value = 0; + ret = pm_qos_update_target(&qos->resume_latency, &req->data.pnode, action, value); break; @@ -189,7 +192,7 @@ static int dev_pm_qos_constraints_allocate(struct device *dev) plist_head_init(&c->list); c->target_value = PM_QOS_RESUME_LATENCY_DEFAULT_VALUE; c->default_value = PM_QOS_RESUME_LATENCY_DEFAULT_VALUE; - c->no_constraint_value = PM_QOS_RESUME_LATENCY_DEFAULT_VALUE; + c->no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; c->type = PM_QOS_MIN; c->notifiers = n; diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 7bcf80fa9ada..13e015905543 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -253,7 +253,7 @@ static int rpm_check_suspend_allowed(struct device *dev) || (dev->power.request_pending && dev->power.request == RPM_REQ_RESUME)) retval = -EAGAIN; - else if (__dev_pm_qos_read_value(dev) < 0) + else if (__dev_pm_qos_read_value(dev) == 0) retval = -EPERM; else if (dev->power.runtime_status == RPM_SUSPENDED) retval = 1; diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c index 29bf28fef136..e153e28b1857 100644 --- a/drivers/base/power/sysfs.c +++ b/drivers/base/power/sysfs.c @@ -218,7 +218,14 @@ static ssize_t pm_qos_resume_latency_show(struct device *dev, struct device_attribute *attr, char *buf) { - return sprintf(buf, "%d\n", dev_pm_qos_requested_resume_latency(dev)); + s32 value = dev_pm_qos_requested_resume_latency(dev); + + if (value == 0) + return sprintf(buf, "n/a\n"); + else if (value == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT) + value = 0; + + return sprintf(buf, "%d\n", value); } static ssize_t pm_qos_resume_latency_store(struct device *dev, @@ -228,11 +235,21 @@ static ssize_t pm_qos_resume_latency_store(struct device *dev, s32 value; int ret; - if (kstrtos32(buf, 0, &value)) - return -EINVAL; + if (!kstrtos32(buf, 0, &value)) { + /* + * Prevent users from writing negative or "no constraint" values + * directly. + */ + if (value < 0 || value == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT) + return -EINVAL; - if (value < 0) + if (value == 0) + value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; + } else if (!strcmp(buf, "n/a") || !strcmp(buf, "n/a\n")) { + value = 0; + } else { return -EINVAL; + } ret = dev_pm_qos_update_request(dev->power.qos->resume_latency_req, value); diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 48eaf2879228..aa390404e85f 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -298,8 +298,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) data->needs_update = 0; } - /* resume_latency is 0 means no restriction */ - if (resume_latency && resume_latency < latency_req) + if (resume_latency < latency_req && + resume_latency != PM_QOS_RESUME_LATENCY_NO_CONSTRAINT) latency_req = resume_latency; /* Special case when user has set very strict latency requirement */ diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h index 51f0d7e0b15f..2a3b36da61b1 100644 --- a/include/linux/pm_qos.h +++ b/include/linux/pm_qos.h @@ -27,16 +27,19 @@ enum pm_qos_flags_status { PM_QOS_FLAGS_ALL, }; -#define PM_QOS_DEFAULT_VALUE -1 +#define PM_QOS_DEFAULT_VALUE (-1) +#define PM_QOS_LATENCY_ANY S32_MAX +#define PM_QOS_LATENCY_ANY_NS ((s64)PM_QOS_LATENCY_ANY * NSEC_PER_USEC) #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC) #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC) #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0 #define PM_QOS_MEMORY_BANDWIDTH_DEFAULT_VALUE 0 -#define PM_QOS_RESUME_LATENCY_DEFAULT_VALUE 0 +#define PM_QOS_RESUME_LATENCY_DEFAULT_VALUE PM_QOS_LATENCY_ANY +#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT PM_QOS_LATENCY_ANY +#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS PM_QOS_LATENCY_ANY_NS #define PM_QOS_LATENCY_TOLERANCE_DEFAULT_VALUE 0 #define PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT (-1) -#define PM_QOS_LATENCY_ANY ((s32)(~(__u32)0 >> 1)) #define PM_QOS_FLAG_NO_POWER_OFF (1 << 0) @@ -173,7 +176,8 @@ static inline s32 dev_pm_qos_requested_flags(struct device *dev) static inline s32 dev_pm_qos_raw_read_value(struct device *dev) { return IS_ERR_OR_NULL(dev->power.qos) ? - 0 : pm_qos_read_value(&dev->power.qos->resume_latency); + PM_QOS_RESUME_LATENCY_NO_CONSTRAINT : + pm_qos_read_value(&dev->power.qos->resume_latency); } #else static inline enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev, @@ -183,9 +187,9 @@ static inline enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev, s32 mask) { return PM_QOS_FLAGS_UNDEFINED; } static inline s32 __dev_pm_qos_read_value(struct device *dev) - { return 0; } + { return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; } static inline s32 dev_pm_qos_read_value(struct device *dev) - { return 0; } + { return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; } static inline int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req, enum dev_pm_qos_req_type type, @@ -231,9 +235,15 @@ static inline int dev_pm_qos_expose_latency_tolerance(struct device *dev) { return 0; } static inline void dev_pm_qos_hide_latency_tolerance(struct device *dev) {} -static inline s32 dev_pm_qos_requested_resume_latency(struct device *dev) { return 0; } +static inline s32 dev_pm_qos_requested_resume_latency(struct device *dev) +{ + return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; +} static inline s32 dev_pm_qos_requested_flags(struct device *dev) { return 0; } -static inline s32 dev_pm_qos_raw_read_value(struct device *dev) { return 0; } +static inline s32 dev_pm_qos_raw_read_value(struct device *dev) +{ + return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; +} #endif #endif From c523c68da2117a3f9f777110839b1cf7ed7221be Mon Sep 17 00:00:00 2001 From: Ramesh Thomas Date: Thu, 26 Oct 2017 19:01:34 -0700 Subject: [PATCH 75/88] cpuidle: ladder: Add per CPU PM QoS resume latency support Individual CPUs may have special requirements to not enter deep idle states. For example, a CPU running real time applications would not want to enter deep idle states to avoid latency impacts. At the same time other CPUs that do not have such a requirement could allow deep idle states to save power. This was already implemented in the menu governor. Implementing similar changes in the ladder governor which gets selected when CONFIG_NO_HZ and CONFIG_NO_HZ_IDLE are not set. Refer following commits for the menu governor changes. Signed-off-by: Ramesh Thomas Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/governors/ladder.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c index ce1a2ffffb2a..1ad8745fd6d6 100644 --- a/drivers/cpuidle/governors/ladder.c +++ b/drivers/cpuidle/governors/ladder.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -67,10 +68,16 @@ static int ladder_select_state(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct ladder_device *ldev = this_cpu_ptr(&ladder_devices); + struct device *device = get_cpu_device(dev->cpu); struct ladder_device_state *last_state; int last_residency, last_idx = ldev->last_state_idx; int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING ? 1 : 0; int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); + int resume_latency = dev_pm_qos_raw_read_value(device); + + if (resume_latency < latency_req && + resume_latency != PM_QOS_RESUME_LATENCY_NO_CONSTRAINT) + latency_req = resume_latency; /* Special case when user has set very strict latency requirement */ if (unlikely(latency_req == 0)) { From 5241ab40f6e742f8a1631f8826faf6dc6412b3b5 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 8 Nov 2017 10:11:02 +0100 Subject: [PATCH 76/88] PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare() During system-wide PM, genpd relies on its PM callbacks to be invoked for all its attached devices, as to deal with powering off/on the PM domain. In other words, genpd is not compatible with the direct_complete path, if executed by the PM core for any of its attached devices. However, when genpd's ->prepare() callback invokes pm_generic_prepare(), it does not take into account that it may return 1. Instead it treats that as an error internally and expects the PM core to abort the prepare phase and roll back. This leads to genpd not properly powering on/off the PM domain, because its internal counters gets wrongly balanced. To fix the behaviour, allow drivers to return 1 from their ->prepare() callbacks, but let's return 0 from genpd's ->prepare() callback in such case, as that prevents the PM core from running the direct_complete path for the device. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index b914e373a478..47fb71a8066a 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -1010,7 +1010,7 @@ static int genpd_prepare(struct device *dev) genpd_unlock(genpd); ret = pm_generic_prepare(dev); - if (ret) { + if (ret < 0) { genpd_lock(genpd); genpd->prepared_count--; @@ -1018,7 +1018,8 @@ static int genpd_prepare(struct device *dev) genpd_unlock(genpd); } - return ret; + /* Never return 1, as genpd don't cope with the direct_complete path. */ + return ret >= 0 ? 0 : ret; } /** From ff1656790b3a4caca94505c52fd0250f981ea187 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Tue, 7 Nov 2017 23:08:10 +0200 Subject: [PATCH 77/88] ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit acpi_remove_pm_notifier() ends up calling flush_workqueue() while holding acpi_pm_notifier_lock, and that same lock is taken by by the work via acpi_pm_notify_handler(). This can deadlock. To fix the problem let's split the single lock into two: one to protect the dev->wakeup between the work vs. add/remove, and another one to handle notifier installation vs. removal. After commit a1d14934ea4b "workqueue/lockdep: 'Fix' flush_work() annotation" I was able to kill the machine (Intel Braswell) very easily with 'powertop --auto-tune', runtime suspending i915, and trying to wake it up via the USB keyboard. The cases when it didn't die are presumably explained by lockdep getting disabled by something else (cpu hotplug locking issues usually). Fortunately I still got a lockdep report over netconsole (trickling in very slowly), even though the machine was otherwise practically dead: [ 112.179806] ====================================================== [ 114.670858] WARNING: possible circular locking dependency detected [ 117.155663] 4.13.0-rc6-bsw-bisect-00169-ga1d14934ea4b #119 Not tainted [ 119.658101] ------------------------------------------------------ [ 121.310242] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command. [ 121.313294] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead [ 121.313346] xhci_hcd 0000:00:14.0: HC died; cleaning up [ 121.313485] usb 1-6: USB disconnect, device number 3 [ 121.313501] usb 1-6.2: USB disconnect, device number 4 [ 134.747383] kworker/0:2/47 is trying to acquire lock: [ 137.220790] (acpi_pm_notifier_lock){+.+.}, at: [] acpi_pm_notify_handler+0x2f/0x80 [ 139.721524] [ 139.721524] but task is already holding lock: [ 144.672922] ((&dpc->work)){+.+.}, at: [] process_one_work+0x160/0x720 [ 147.184450] [ 147.184450] which lock already depends on the new lock. [ 147.184450] [ 154.604711] [ 154.604711] the existing dependency chain (in reverse order) is: [ 159.447888] [ 159.447888] -> #2 ((&dpc->work)){+.+.}: [ 164.183486] __lock_acquire+0x1255/0x13f0 [ 166.504313] lock_acquire+0xb5/0x210 [ 168.778973] process_one_work+0x1b9/0x720 [ 171.030316] worker_thread+0x4c/0x440 [ 173.257184] kthread+0x154/0x190 [ 175.456143] ret_from_fork+0x27/0x40 [ 177.624348] [ 177.624348] -> #1 ("kacpi_notify"){+.+.}: [ 181.850351] __lock_acquire+0x1255/0x13f0 [ 183.941695] lock_acquire+0xb5/0x210 [ 186.046115] flush_workqueue+0xdd/0x510 [ 190.408153] acpi_os_wait_events_complete+0x31/0x40 [ 192.625303] acpi_remove_notify_handler+0x133/0x188 [ 194.820829] acpi_remove_pm_notifier+0x56/0x90 [ 196.989068] acpi_dev_pm_detach+0x5f/0xa0 [ 199.145866] dev_pm_domain_detach+0x27/0x30 [ 201.285614] i2c_device_probe+0x100/0x210 [ 203.411118] driver_probe_device+0x23e/0x310 [ 205.522425] __driver_attach+0xa3/0xb0 [ 207.634268] bus_for_each_dev+0x69/0xa0 [ 209.714797] driver_attach+0x1e/0x20 [ 211.778258] bus_add_driver+0x1bc/0x230 [ 213.837162] driver_register+0x60/0xe0 [ 215.868162] i2c_register_driver+0x42/0x70 [ 217.869551] 0xffffffffa0172017 [ 219.863009] do_one_initcall+0x45/0x170 [ 221.843863] do_init_module+0x5f/0x204 [ 223.817915] load_module+0x225b/0x29b0 [ 225.757234] SyS_finit_module+0xc6/0xd0 [ 227.661851] do_syscall_64+0x5c/0x120 [ 229.536819] return_from_SYSCALL_64+0x0/0x7a [ 231.392444] [ 231.392444] -> #0 (acpi_pm_notifier_lock){+.+.}: [ 235.124914] check_prev_add+0x44e/0x8a0 [ 237.024795] __lock_acquire+0x1255/0x13f0 [ 238.937351] lock_acquire+0xb5/0x210 [ 240.840799] __mutex_lock+0x75/0x940 [ 242.709517] mutex_lock_nested+0x1c/0x20 [ 244.551478] acpi_pm_notify_handler+0x2f/0x80 [ 246.382052] acpi_ev_notify_dispatch+0x44/0x5c [ 248.194412] acpi_os_execute_deferred+0x14/0x30 [ 250.003925] process_one_work+0x1ec/0x720 [ 251.803191] worker_thread+0x4c/0x440 [ 253.605307] kthread+0x154/0x190 [ 255.387498] ret_from_fork+0x27/0x40 [ 257.153175] [ 257.153175] other info that might help us debug this: [ 257.153175] [ 262.324392] Chain exists of: [ 262.324392] acpi_pm_notifier_lock --> "kacpi_notify" --> (&dpc->work) [ 262.324392] [ 267.391997] Possible unsafe locking scenario: [ 267.391997] [ 270.758262] CPU0 CPU1 [ 272.431713] ---- ---- [ 274.060756] lock((&dpc->work)); [ 275.646532] lock("kacpi_notify"); [ 277.260772] lock((&dpc->work)); [ 278.839146] lock(acpi_pm_notifier_lock); [ 280.391902] [ 280.391902] *** DEADLOCK *** [ 280.391902] [ 284.986385] 2 locks held by kworker/0:2/47: [ 286.524895] #0: ("kacpi_notify"){+.+.}, at: [] process_one_work+0x160/0x720 [ 288.112927] #1: ((&dpc->work)){+.+.}, at: [] process_one_work+0x160/0x720 [ 289.727725] Fixes: c072530f391e (ACPI / PM: Revork the handling of ACPI device wakeup notifications) Signed-off-by: Ville Syrjälä Cc: 3.17+ # 3.17+ Signed-off-by: Rafael J. Wysocki --- drivers/acpi/device_pm.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index 17e8eb93a76c..69ffd1dc1de7 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable); #ifdef CONFIG_PM static DEFINE_MUTEX(acpi_pm_notifier_lock); +static DEFINE_MUTEX(acpi_pm_notifier_install_lock); void acpi_pm_wakeup_event(struct device *dev) { @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, if (!dev && !func) return AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (adev->wakeup.flags.notifier_present) goto out; - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); - adev->wakeup.context.dev = dev; - adev->wakeup.context.func = func; - status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, acpi_pm_notify_handler, NULL); if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); + adev->wakeup.context.dev = dev; + adev->wakeup.context.func = func; adev->wakeup.flags.notifier_present = true; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; } @@ -472,7 +474,7 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) { acpi_status status = AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (!adev->wakeup.flags.notifier_present) goto out; @@ -483,14 +485,15 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); adev->wakeup.context.func = NULL; adev->wakeup.context.dev = NULL; wakeup_source_unregister(adev->wakeup.ws); - adev->wakeup.flags.notifier_present = false; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; } From e7b06a09e7d87ec0d6d8b17eec50fbb93667eee1 Mon Sep 17 00:00:00 2001 From: Gaurav Jindal Date: Fri, 1 Sep 2017 20:37:26 +0530 Subject: [PATCH 78/88] cpuidle: Clean up cpuidle_enable_device() error handling a bit Do not fetch per CPU drv if cpuidle_curr_governor is NULL to avoid useless per CPU processing. Signed-off-by: Gaurav Jindal [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/cpuidle.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index ed4df58a855e..27f9648b61c2 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -388,9 +388,12 @@ int cpuidle_enable_device(struct cpuidle_device *dev) if (dev->enabled) return 0; + if (!cpuidle_curr_governor) + return -EIO; + drv = cpuidle_get_cpu_driver(dev); - if (!drv || !cpuidle_curr_governor) + if (!drv) return -EIO; if (!dev->registered) From 3fc74bd8a723c91a5b4627079c511fcaf3c75017 Mon Sep 17 00:00:00 2001 From: Gaurav Jindal Date: Sat, 2 Sep 2017 00:56:38 +0530 Subject: [PATCH 79/88] cpuidle: Avoid assignment in if () argument Clean up cpuidle_enable_device() to avoid doing an assignment in an expression evaluated as an argument of if (), which also makes the code in question more readable. Signed-off-by: Gaurav Jindal [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/cpuidle.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 27f9648b61c2..68a16827f45f 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -403,9 +403,11 @@ int cpuidle_enable_device(struct cpuidle_device *dev) if (ret) return ret; - if (cpuidle_curr_governor->enable && - (ret = cpuidle_curr_governor->enable(drv, dev))) - goto fail_sysfs; + if (cpuidle_curr_governor->enable) { + ret = cpuidle_curr_governor->enable(drv, dev); + if (ret) + goto fail_sysfs; + } smp_wmb(); From cd6ce860eb1920568361cf270fe4f89674cf411b Mon Sep 17 00:00:00 2001 From: Bhumika Goyal Date: Thu, 19 Oct 2017 12:59:14 +0200 Subject: [PATCH 80/88] cpufreq: arm_big_little: make function arguments and structure pointer const Make the arguments of functions bL_cpufreq_{register/unregister} as const as the ops pointer does not modify the fields of the cpufreq_arm_bL_ops structure it points to. The pointer arm_bL_ops is also getting initialized with ops but the pointer does not modify the fields. So, make the function argument and the structure pointer const. Add const to function prototypes too. Signed-off-by: Bhumika Goyal Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/arm_big_little.c | 6 +++--- drivers/cpufreq/arm_big_little.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c index 0c41ab3b16eb..65ec5f01aa8d 100644 --- a/drivers/cpufreq/arm_big_little.c +++ b/drivers/cpufreq/arm_big_little.c @@ -57,7 +57,7 @@ static bool bL_switching_enabled; #define VIRT_FREQ(cluster, freq) ((cluster == A7_CLUSTER) ? freq >> 1 : freq) static struct thermal_cooling_device *cdev[MAX_CLUSTERS]; -static struct cpufreq_arm_bL_ops *arm_bL_ops; +static const struct cpufreq_arm_bL_ops *arm_bL_ops; static struct clk *clk[MAX_CLUSTERS]; static struct cpufreq_frequency_table *freq_table[MAX_CLUSTERS + 1]; static atomic_t cluster_usage[MAX_CLUSTERS + 1]; @@ -617,7 +617,7 @@ static int __bLs_register_notifier(void) { return 0; } static int __bLs_unregister_notifier(void) { return 0; } #endif -int bL_cpufreq_register(struct cpufreq_arm_bL_ops *ops) +int bL_cpufreq_register(const struct cpufreq_arm_bL_ops *ops) { int ret, i; @@ -661,7 +661,7 @@ int bL_cpufreq_register(struct cpufreq_arm_bL_ops *ops) } EXPORT_SYMBOL_GPL(bL_cpufreq_register); -void bL_cpufreq_unregister(struct cpufreq_arm_bL_ops *ops) +void bL_cpufreq_unregister(const struct cpufreq_arm_bL_ops *ops) { if (arm_bL_ops != ops) { pr_err("%s: Registered with: %s, can't unregister, exiting\n", diff --git a/drivers/cpufreq/arm_big_little.h b/drivers/cpufreq/arm_big_little.h index 184d7c3a112a..88a176e466c8 100644 --- a/drivers/cpufreq/arm_big_little.h +++ b/drivers/cpufreq/arm_big_little.h @@ -37,7 +37,7 @@ struct cpufreq_arm_bL_ops { void (*free_opp_table)(const struct cpumask *cpumask); }; -int bL_cpufreq_register(struct cpufreq_arm_bL_ops *ops); -void bL_cpufreq_unregister(struct cpufreq_arm_bL_ops *ops); +int bL_cpufreq_register(const struct cpufreq_arm_bL_ops *ops); +void bL_cpufreq_unregister(const struct cpufreq_arm_bL_ops *ops); #endif /* CPUFREQ_ARM_BIG_LITTLE_H */ From 0011c6da99ddc428a35456d5819d6e476005f6f2 Mon Sep 17 00:00:00 2001 From: Bhumika Goyal Date: Thu, 19 Oct 2017 12:59:15 +0200 Subject: [PATCH 81/88] cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const Make these const as they are only getting passed to the functions bL_cpufreq_{register/unregister} having the arguments as const. Signed-off-by: Bhumika Goyal Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/arm_big_little_dt.c | 2 +- drivers/cpufreq/scpi-cpufreq.c | 2 +- drivers/cpufreq/vexpress-spc-cpufreq.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/cpufreq/arm_big_little_dt.c b/drivers/cpufreq/arm_big_little_dt.c index 39b3f51d9a30..b944f290c8a4 100644 --- a/drivers/cpufreq/arm_big_little_dt.c +++ b/drivers/cpufreq/arm_big_little_dt.c @@ -61,7 +61,7 @@ static int dt_get_transition_latency(struct device *cpu_dev) return transition_latency; } -static struct cpufreq_arm_bL_ops dt_bL_ops = { +static const struct cpufreq_arm_bL_ops dt_bL_ops = { .name = "dt-bl", .get_transition_latency = dt_get_transition_latency, .init_opp_table = dev_pm_opp_of_cpumask_add_table, diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c index 8de2364b5995..05d299052c5c 100644 --- a/drivers/cpufreq/scpi-cpufreq.c +++ b/drivers/cpufreq/scpi-cpufreq.c @@ -53,7 +53,7 @@ static int scpi_init_opp_table(const struct cpumask *cpumask) return ret; } -static struct cpufreq_arm_bL_ops scpi_cpufreq_ops = { +static const struct cpufreq_arm_bL_ops scpi_cpufreq_ops = { .name = "scpi", .get_transition_latency = scpi_get_transition_latency, .init_opp_table = scpi_init_opp_table, diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c index 87e5bdc5ec74..53237289e606 100644 --- a/drivers/cpufreq/vexpress-spc-cpufreq.c +++ b/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -42,7 +42,7 @@ static int ve_spc_get_transition_latency(struct device *cpu_dev) return 1000000; /* 1 ms */ } -static struct cpufreq_arm_bL_ops ve_spc_cpufreq_ops = { +static const struct cpufreq_arm_bL_ops ve_spc_cpufreq_ops = { .name = "vexpress-spc", .get_transition_latency = ve_spc_get_transition_latency, .init_opp_table = ve_spc_init_opp_table, From f7bc9b209e27c0b617378400136cc663a6314d0c Mon Sep 17 00:00:00 2001 From: "Gautham R. Shenoy" Date: Tue, 7 Nov 2017 13:39:29 +0530 Subject: [PATCH 82/88] cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE On platforms with large number of Pstates, the transition table, which is a NxN matrix, can overflow beyond the PAGE_SIZE boundary. This can be seen on POWER9 which has 100+ Pstates. As a result, each time the trans_table is read for any of the CPUs, we will get the following error. --------------------------------------------------- fill_read_buffer: show+0x0/0xa0 returned bad count --------------------------------------------------- This patch ensures that in case of an overflow, we print a warning once in the dmesg and return FILE TOO LARGE error for this and all subsequent accesses of trans_table. Signed-off-by: Gautham R. Shenoy Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- Documentation/cpu-freq/cpufreq-stats.txt | 3 +++ drivers/cpufreq/cpufreq_stats.c | 7 +++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/Documentation/cpu-freq/cpufreq-stats.txt b/Documentation/cpu-freq/cpufreq-stats.txt index 2bbe207354ed..a873855c811d 100644 --- a/Documentation/cpu-freq/cpufreq-stats.txt +++ b/Documentation/cpu-freq/cpufreq-stats.txt @@ -90,6 +90,9 @@ Freq_i to Freq_j. Freq_i is in descending order with increasing rows and Freq_j is in descending order with increasing columns. The output here also contains the actual freq values for each row and column for better readability. +If the transition table is bigger than PAGE_SIZE, reading this will +return an -EFBIG error. + -------------------------------------------------------------------------------- :/sys/devices/system/cpu/cpu0/cpufreq/stats # cat trans_table From : To diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index e75880eb037d..1e55b5790853 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -118,8 +118,11 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf) break; len += snprintf(buf + len, PAGE_SIZE - len, "\n"); } - if (len >= PAGE_SIZE) - return PAGE_SIZE; + + if (len >= PAGE_SIZE) { + pr_warn_once("cpufreq transition table exceeds PAGE_SIZE. Disabling\n"); + return -EFBIG; + } return len; } cpufreq_freq_attr_ro(trans_table); From 95b982b45122c57da2ee0b46cce70775e1d987af Mon Sep 17 00:00:00 2001 From: Rajat Jain Date: Tue, 31 Oct 2017 14:44:24 -0700 Subject: [PATCH 83/88] PM / s2idle: Clear the events_check_enabled flag Problem: This flag does not get cleared currently in the suspend or resume path in the following cases: * In case some driver's suspend routine returns an error. * Successful s2idle case * etc? Why is this a problem: What happens is that the next suspend attempt could fail even though the user did not enable the flag by writing to /sys/power/wakeup_count. This is 1 use case how the issue can be seen (but similar use case with driver suspend failure can be thought of): 1. Read /sys/power/wakeup_count 2. echo count > /sys/power/wakeup_count 3. echo freeze > /sys/power/wakeup_count 4. Let the system suspend, and wakeup the system using some wake source that calls pm_wakeup_event() e.g. power button or something. 5. Note that the combined wakeup count would be incremented due to the pm_wakeup_event() in the resume path. 6. After resuming the events_check_enabled flag is still set. At this point if the user attempts to freeze again (without writing to /sys/power/wakeup_count), the suspend would fail even though there has been no wake event since the past resume. Address that by clearing the flag just before a resume is completed, so that it is always cleared for the corner cases mentioned above. Signed-off-by: Rajat Jain Acked-by: Pavel Machek Signed-off-by: Rafael J. Wysocki --- kernel/power/suspend.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index ccd2d20e6b06..0685c4499431 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -437,7 +437,6 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) error = suspend_ops->enter(state); trace_suspend_resume(TPS("machine_suspend"), state, false); - events_check_enabled = false; } else if (*wakeup) { error = -EBUSY; } @@ -582,6 +581,7 @@ static int enter_state(suspend_state_t state) pm_restore_gfp_mask(); Finish: + events_check_enabled = false; pm_pr_dbg("Finishing wakeup.\n"); suspend_finish(); Unlock: From 2dd9789c76ffde05d5f4c56f45c3cb71b3936694 Mon Sep 17 00:00:00 2001 From: Himanshu Jha Date: Sun, 5 Nov 2017 03:27:32 +0530 Subject: [PATCH 84/88] freezer: Fix typo in freezable_schedule_timeout() comment Signed-off-by: Himanshu Jha Acked-by: Luis R. Rodriguez Acked-by: Pavel Machek Signed-off-by: Rafael J. Wysocki --- include/linux/freezer.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/freezer.h b/include/linux/freezer.h index dd03e837ebb7..5b2cf48b2a7c 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -181,7 +181,7 @@ static inline void freezable_schedule_unsafe(void) } /* - * Like freezable_schedule_timeout(), but should not block the freezer. Do not + * Like schedule_timeout(), but should not block the freezer. Do not * call this with locks held. */ static inline long freezable_schedule_timeout(long timeout) From 07458f6a5171d97511dfbdf6ce549ed2ca0280c7 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Wed, 8 Nov 2017 20:23:55 +0530 Subject: [PATCH 85/88] cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq 'cached_raw_freq' is used to get the next frequency quickly but should always be in sync with sg_policy->next_freq. There is a case where it is not and in such cases it should be reset to avoid switching to incorrect frequencies. Consider this case for example: - policy->cur is 1.2 GHz (Max) - New request comes for 780 MHz and we store that in cached_raw_freq. - Based on 780 MHz, we calculate the effective frequency as 800 MHz. - We then see the CPU wasn't idle recently and choose to keep the next freq as 1.2 GHz. - Now we have cached_raw_freq is 780 MHz and sg_policy->next_freq is 1.2 GHz. - Now if the utilization doesn't change in then next request, then the next target frequency will still be 780 MHz and it will match with cached_raw_freq. But we will choose 1.2 GHz instead of 800 MHz here. Fixes: b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely) Signed-off-by: Viresh Kumar Cc: 4.12+ # 4.12+ Signed-off-by: Rafael J. Wysocki --- kernel/sched/cpufreq_schedutil.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index ba0da243fdd8..2f52ec0f1539 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -282,8 +282,12 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, * Do not reduce the frequency if the CPU has not been idle * recently, as the reduction is likely to be premature then. */ - if (busy && next_f < sg_policy->next_freq) + if (busy && next_f < sg_policy->next_freq) { next_f = sg_policy->next_freq; + + /* Reset cached freq as next_freq has changed */ + sg_policy->cached_raw_freq = 0; + } } sugov_update_commit(sg_policy, time, next_f); } From a4c447533a18ee86e07232d6344ba12b1f9c5077 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Thu, 9 Nov 2017 02:19:39 -0500 Subject: [PATCH 86/88] intel_idle: Graceful probe failure when MWAIT is disabled When MWAIT is disabled, intel_idle refuses to probe. But it may mis-lead the user by blaming this on the model number: intel_idle: does not run on family 6 modesl 79 So defer the check for MWAIT until after the model# white-list check succeeds, and if the MWAIT check fails, tell the user how to fix it: intel_idle: Please enable MWAIT in BIOS SETUP Signed-off-by: Len Brown Signed-off-by: Rafael J. Wysocki --- drivers/idle/intel_idle.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 5db5e3176f6a..9c93abdf635f 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -1066,7 +1066,7 @@ static const struct idle_cpu idle_cpu_dnv = { }; #define ICPU(model, cpu) \ - { X86_VENDOR_INTEL, 6, model, X86_FEATURE_MWAIT, (unsigned long)&cpu } + { X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, (unsigned long)&cpu } static const struct x86_cpu_id intel_idle_ids[] __initconst = { ICPU(INTEL_FAM6_NEHALEM_EP, idle_cpu_nehalem), @@ -1130,6 +1130,11 @@ static int __init intel_idle_probe(void) return -ENODEV; } + if (!boot_cpu_has(X86_FEATURE_MWAIT)) { + pr_debug("Please enable MWAIT in BIOS SETUP\n"); + return -ENODEV; + } + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) return -ENODEV; From d4dbfa4bb4c624636eeef9efc6d87c4b7bf2c611 Mon Sep 17 00:00:00 2001 From: Prarit Bhargava Date: Wed, 1 Nov 2017 20:48:17 -0400 Subject: [PATCH 87/88] tools/power/cpupower: Add 64 bit library detection The kernel-tools-lib rpm is installing the library to /usr/lib64, and not /usr/lib as the cpupower Makefile is doing in the kernel tree. This resulted in a conflict between the two libraries. After looking at how other tools installed libraries, and looking at the perf code in tools/perf it looks like installing to /usr/lib64 for 64-bit arches is the correct thing to do. Checks with 'ldd cpupower' on SLES, RHEL, Fedora, and Ubuntu result in the correct binary AFAICT: [root@testsystem cpupower]# ldd cpupower | grep cpupower libcpupower.so.0 => /lib64/libcpupower.so.0 (0x00007f1dab447000) Commit ac5a181d065d ("cpupower: Add cpuidle parts into library") added a new cpupower library version. On Fedora, executing the cpupower binary then resulted in this error [root@testsystem cpupower]# ./cpupower monitor ./cpupower: symbol lookup error: ./cpupower: undefined symbol: get_cpu_topology 64-bit libraries should be installed to /usr/lib64, and other libraries should be installed to /usr/lib. This code was taken from the perf Makefile.config which supports /usr/lib and /usr/lib64. Signed-off-by: Prarit Bhargava Cc: Shuah Khan Signed-off-by: Shuah Khan --- tools/power/cpupower/Makefile | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/tools/power/cpupower/Makefile b/tools/power/cpupower/Makefile index d6e1c02ddcfe..da205d1fa03c 100644 --- a/tools/power/cpupower/Makefile +++ b/tools/power/cpupower/Makefile @@ -30,6 +30,8 @@ OUTDIR := $(shell cd $(OUTPUT) && /bin/pwd) $(if $(OUTDIR),, $(error output directory "$(OUTPUT)" does not exist)) endif +include ../../scripts/Makefile.arch + # --- CONFIGURATION BEGIN --- # Set the following to `true' to make a unstripped, unoptimized @@ -79,7 +81,11 @@ bindir ?= /usr/bin sbindir ?= /usr/sbin mandir ?= /usr/man includedir ?= /usr/include +ifeq ($(IS_64_BIT), 1) +libdir ?= /usr/lib64 +else libdir ?= /usr/lib +endif localedir ?= /usr/share/locale docdir ?= /usr/share/doc/packages/cpupower confdir ?= /etc/ From 69b6f8a9b7961efd7dcc11ab9b1d5be55ed8a15e Mon Sep 17 00:00:00 2001 From: Prarit Bhargava Date: Wed, 1 Nov 2017 20:48:32 -0400 Subject: [PATCH 88/88] tools/power/cpupower: add libcpupower.so.0.0.1 to .gitignore Commit ac5a181d065d ("cpupower: Add cpuidle parts into library") added libcpupower.so.0.0.1 which should be hidden from git commands. This patch changes the ignore to all libcpupower.so.* . Signed-off-by: Prarit Bhargava Cc: Shuah Khan Signed-off-by: Shuah Khan --- tools/power/cpupower/.gitignore | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/power/cpupower/.gitignore b/tools/power/cpupower/.gitignore index d42073f12609..1f9977cc609c 100644 --- a/tools/power/cpupower/.gitignore +++ b/tools/power/cpupower/.gitignore @@ -1,7 +1,6 @@ .libs libcpupower.so -libcpupower.so.0 -libcpupower.so.0.0.0 +libcpupower.so.* build/ccdv cpufreq-info cpufreq-set