OpenCloudOS-Kernel/drivers/cpuidle/cpuidle-powernv.c

418 lines
10 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
// SPDX-License-Identifier: GPL-2.0
/*
* cpuidle-powernv - idle state cpuidle driver.
* Adapted from drivers/cpuidle/cpuidle-pseries
*
*/
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/moduleparam.h>
#include <linux/cpuidle.h>
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/clockchips.h>
#include <linux/of.h>
#include <linux/slab.h>
#include <asm/machdep.h>
#include <asm/firmware.h>
#include <asm/opal.h>
#include <asm/runlatch.h>
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
#include <asm/cpuidle.h>
/*
* Expose only those Hardware idle states via the cpuidle framework
* that have latency value below POWERNV_THRESHOLD_LATENCY_NS.
*/
#define POWERNV_THRESHOLD_LATENCY_NS 200000
static struct cpuidle_driver powernv_idle_driver = {
.name = "powernv_idle",
.owner = THIS_MODULE,
};
static int max_idle_state __read_mostly;
static struct cpuidle_state *cpuidle_state_table __read_mostly;
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
struct stop_psscr_table {
u64 val;
u64 mask;
};
static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly;
cpuidle: powernv: Fix promotion from snooze if next state disabled The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") introduced a timeout for the snooze idle state so that it could be eventually be promoted to a deeper idle state. The snooze timeout value is static and set to the target residency of the next idle state, which would train the cpuidle governor to pick the next idle state eventually. The unfortunate side-effect of this is that if the next idle state(s) is disabled, the CPU will forever remain in snooze, despite the fact that the system is completely idle, and other deeper idle states are available. This patch fixes the issue by dynamically setting the snooze timeout to the target residency of the next enabled state on the device. Before Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01297 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 96.41| 0.00| 0.00 0| 8| 1| 96.43| 0.00| 0.00 0| 8| 2| 96.47| 0.00| 0.00 0| 8| 3| 96.35| 0.00| 0.00 0| 8| 4| 96.37| 0.00| 0.00 0| 8| 5| 96.37| 0.00| 0.00 0| 8| 6| 96.47| 0.00| 0.00 0| 8| 7| 96.47| 0.00| 0.00 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.05033 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 16| 0| 89.79| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 1| 90.12| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 2| 90.21| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 3| 90.29| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 After Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01200 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 16.58| 0.00| 77.21 0| 8| 1| 18.42| 0.00| 75.38 0| 8| 2| 4.70| 0.00| 94.09 0| 8| 3| 17.06| 0.00| 81.73 0| 8| 4| 3.06| 0.00| 95.73 0| 8| 5| 7.00| 0.00| 96.80 0| 8| 6| 1.00| 0.00| 98.79 0| 8| 7| 5.62| 0.00| 94.17 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.02110 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 0| 0| 0.69| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 9.39| 89.70 0| 0| 1| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.05| 93.21 0| 0| 2| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 89.93 0| 0| 3| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 93.26 Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-31 20:15:09 +08:00
static u64 default_snooze_timeout __read_mostly;
static bool snooze_timeout_en __read_mostly;
cpuidle: powernv: Fix promotion from snooze if next state disabled The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") introduced a timeout for the snooze idle state so that it could be eventually be promoted to a deeper idle state. The snooze timeout value is static and set to the target residency of the next idle state, which would train the cpuidle governor to pick the next idle state eventually. The unfortunate side-effect of this is that if the next idle state(s) is disabled, the CPU will forever remain in snooze, despite the fact that the system is completely idle, and other deeper idle states are available. This patch fixes the issue by dynamically setting the snooze timeout to the target residency of the next enabled state on the device. Before Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01297 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 96.41| 0.00| 0.00 0| 8| 1| 96.43| 0.00| 0.00 0| 8| 2| 96.47| 0.00| 0.00 0| 8| 3| 96.35| 0.00| 0.00 0| 8| 4| 96.37| 0.00| 0.00 0| 8| 5| 96.37| 0.00| 0.00 0| 8| 6| 96.47| 0.00| 0.00 0| 8| 7| 96.47| 0.00| 0.00 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.05033 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 16| 0| 89.79| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 1| 90.12| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 2| 90.21| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 3| 90.29| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 After Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01200 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 16.58| 0.00| 77.21 0| 8| 1| 18.42| 0.00| 75.38 0| 8| 2| 4.70| 0.00| 94.09 0| 8| 3| 17.06| 0.00| 81.73 0| 8| 4| 3.06| 0.00| 95.73 0| 8| 5| 7.00| 0.00| 96.80 0| 8| 6| 1.00| 0.00| 98.79 0| 8| 7| 5.62| 0.00| 94.17 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.02110 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 0| 0| 0.69| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 9.39| 89.70 0| 0| 1| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.05| 93.21 0| 0| 2| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 89.93 0| 0| 3| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 93.26 Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-31 20:15:09 +08:00
static u64 get_snooze_timeout(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
int i;
if (unlikely(!snooze_timeout_en))
return default_snooze_timeout;
for (i = index + 1; i < drv->state_count; i++) {
if (dev->states_usage[i].disable)
cpuidle: powernv: Fix promotion from snooze if next state disabled The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") introduced a timeout for the snooze idle state so that it could be eventually be promoted to a deeper idle state. The snooze timeout value is static and set to the target residency of the next idle state, which would train the cpuidle governor to pick the next idle state eventually. The unfortunate side-effect of this is that if the next idle state(s) is disabled, the CPU will forever remain in snooze, despite the fact that the system is completely idle, and other deeper idle states are available. This patch fixes the issue by dynamically setting the snooze timeout to the target residency of the next enabled state on the device. Before Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01297 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 96.41| 0.00| 0.00 0| 8| 1| 96.43| 0.00| 0.00 0| 8| 2| 96.47| 0.00| 0.00 0| 8| 3| 96.35| 0.00| 0.00 0| 8| 4| 96.37| 0.00| 0.00 0| 8| 5| 96.37| 0.00| 0.00 0| 8| 6| 96.47| 0.00| 0.00 0| 8| 7| 96.47| 0.00| 0.00 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.05033 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 16| 0| 89.79| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 1| 90.12| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 2| 90.21| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 3| 90.29| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 After Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01200 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 16.58| 0.00| 77.21 0| 8| 1| 18.42| 0.00| 75.38 0| 8| 2| 4.70| 0.00| 94.09 0| 8| 3| 17.06| 0.00| 81.73 0| 8| 4| 3.06| 0.00| 95.73 0| 8| 5| 7.00| 0.00| 96.80 0| 8| 6| 1.00| 0.00| 98.79 0| 8| 7| 5.62| 0.00| 94.17 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.02110 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 0| 0| 0.69| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 9.39| 89.70 0| 0| 1| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.05| 93.21 0| 0| 2| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 89.93 0| 0| 3| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 93.26 Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-31 20:15:09 +08:00
continue;
return drv->states[i].target_residency * tb_ticks_per_usec;
cpuidle: powernv: Fix promotion from snooze if next state disabled The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") introduced a timeout for the snooze idle state so that it could be eventually be promoted to a deeper idle state. The snooze timeout value is static and set to the target residency of the next idle state, which would train the cpuidle governor to pick the next idle state eventually. The unfortunate side-effect of this is that if the next idle state(s) is disabled, the CPU will forever remain in snooze, despite the fact that the system is completely idle, and other deeper idle states are available. This patch fixes the issue by dynamically setting the snooze timeout to the target residency of the next enabled state on the device. Before Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01297 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 96.41| 0.00| 0.00 0| 8| 1| 96.43| 0.00| 0.00 0| 8| 2| 96.47| 0.00| 0.00 0| 8| 3| 96.35| 0.00| 0.00 0| 8| 4| 96.37| 0.00| 0.00 0| 8| 5| 96.37| 0.00| 0.00 0| 8| 6| 96.47| 0.00| 0.00 0| 8| 7| 96.47| 0.00| 0.00 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.05033 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 16| 0| 89.79| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 1| 90.12| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 2| 90.21| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 3| 90.29| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 After Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01200 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 16.58| 0.00| 77.21 0| 8| 1| 18.42| 0.00| 75.38 0| 8| 2| 4.70| 0.00| 94.09 0| 8| 3| 17.06| 0.00| 81.73 0| 8| 4| 3.06| 0.00| 95.73 0| 8| 5| 7.00| 0.00| 96.80 0| 8| 6| 1.00| 0.00| 98.79 0| 8| 7| 5.62| 0.00| 94.17 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.02110 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 0| 0| 0.69| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 9.39| 89.70 0| 0| 1| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.05| 93.21 0| 0| 2| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 89.93 0| 0| 3| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 93.26 Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-31 20:15:09 +08:00
}
return default_snooze_timeout;
}
static int snooze_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
u64 snooze_exit_time;
set_thread_flag(TIF_POLLING_NRFLAG);
local_irq_enable();
cpuidle: powernv: Fix promotion from snooze if next state disabled The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") introduced a timeout for the snooze idle state so that it could be eventually be promoted to a deeper idle state. The snooze timeout value is static and set to the target residency of the next idle state, which would train the cpuidle governor to pick the next idle state eventually. The unfortunate side-effect of this is that if the next idle state(s) is disabled, the CPU will forever remain in snooze, despite the fact that the system is completely idle, and other deeper idle states are available. This patch fixes the issue by dynamically setting the snooze timeout to the target residency of the next enabled state on the device. Before Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01297 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 96.41| 0.00| 0.00 0| 8| 1| 96.43| 0.00| 0.00 0| 8| 2| 96.47| 0.00| 0.00 0| 8| 3| 96.35| 0.00| 0.00 0| 8| 4| 96.37| 0.00| 0.00 0| 8| 5| 96.37| 0.00| 0.00 0| 8| 6| 96.47| 0.00| 0.00 0| 8| 7| 96.47| 0.00| 0.00 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.05033 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 16| 0| 89.79| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 1| 90.12| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 2| 90.21| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 3| 90.29| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 After Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01200 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 16.58| 0.00| 77.21 0| 8| 1| 18.42| 0.00| 75.38 0| 8| 2| 4.70| 0.00| 94.09 0| 8| 3| 17.06| 0.00| 81.73 0| 8| 4| 3.06| 0.00| 95.73 0| 8| 5| 7.00| 0.00| 96.80 0| 8| 6| 1.00| 0.00| 98.79 0| 8| 7| 5.62| 0.00| 94.17 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.02110 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 0| 0| 0.69| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 9.39| 89.70 0| 0| 1| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.05| 93.21 0| 0| 2| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 89.93 0| 0| 3| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 93.26 Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-31 20:15:09 +08:00
snooze_exit_time = get_tb() + get_snooze_timeout(dev, drv, index);
ppc64_runlatch_off();
HMT_very_low();
while (!need_resched()) {
if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
/*
* Task has not woken up but we are exiting the polling
* loop anyway. Require a barrier after polling is
* cleared to order subsequent test of need_resched().
*/
clear_thread_flag(TIF_POLLING_NRFLAG);
smp_mb();
break;
}
}
HMT_medium();
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
local_irq_disable();
return index;
}
static int nap_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
power7_idle_type(PNV_THREAD_NAP);
return index;
}
tick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in periodic mode On some archs, the local clockevent device stops in deep cpuidle states. The broadcast framework is used to wakeup cpus in these idle states, in which either an external clockevent device is used to send wakeup ipis or the hrtimer broadcast framework kicks in in the absence of such a device. One cpu is nominated as the broadcast cpu and this cpu sends wakeup ipis to sleeping cpus at the appropriate time. This is the implementation in the oneshot mode of broadcast. In periodic mode of broadcast however, the presence of such cpuidle states results in the cpuidle driver calling tick_broadcast_enable() which shuts down the local clockevent devices of all the cpus and appoints the tick broadcast device as the clockevent device for each of them. This works on those archs where the tick broadcast device is a real clockevent device. But on archs which depend on the hrtimer mode of broadcast, the tick broadcast device hapens to be a pseudo device. The consequence is that the local clockevent devices of all cpus are shutdown and the kernel hangs at boot time in periodic mode. Let us thus not register the cpuidle states which have CPUIDLE_FLAG_TIMER_STOP flag set, on archs which depend on the hrtimer mode of broadcast in periodic mode. This patch takes care of doing this on powerpc. The cpus would not have entered into such deep cpuidle states in periodic mode on powerpc anyway. So there is no loss here. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: 3.19+ <stable@vger.kernel.org> # 3.19+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-24 14:48:01 +08:00
/* Register for fastsleep only in oneshot mode of broadcast */
#ifdef CONFIG_TICK_ONESHOT
static int fastsleep_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
unsigned long old_lpcr = mfspr(SPRN_LPCR);
unsigned long new_lpcr;
if (unlikely(system_state < SYSTEM_RUNNING))
return index;
new_lpcr = old_lpcr;
/* Do not exit powersave upon decrementer as we've setup the timer
* offload.
*/
new_lpcr &= ~LPCR_PECE1;
mtspr(SPRN_LPCR, new_lpcr);
power7_idle_type(PNV_THREAD_SLEEP);
mtspr(SPRN_LPCR, old_lpcr);
return index;
}
tick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in periodic mode On some archs, the local clockevent device stops in deep cpuidle states. The broadcast framework is used to wakeup cpus in these idle states, in which either an external clockevent device is used to send wakeup ipis or the hrtimer broadcast framework kicks in in the absence of such a device. One cpu is nominated as the broadcast cpu and this cpu sends wakeup ipis to sleeping cpus at the appropriate time. This is the implementation in the oneshot mode of broadcast. In periodic mode of broadcast however, the presence of such cpuidle states results in the cpuidle driver calling tick_broadcast_enable() which shuts down the local clockevent devices of all the cpus and appoints the tick broadcast device as the clockevent device for each of them. This works on those archs where the tick broadcast device is a real clockevent device. But on archs which depend on the hrtimer mode of broadcast, the tick broadcast device hapens to be a pseudo device. The consequence is that the local clockevent devices of all cpus are shutdown and the kernel hangs at boot time in periodic mode. Let us thus not register the cpuidle states which have CPUIDLE_FLAG_TIMER_STOP flag set, on archs which depend on the hrtimer mode of broadcast in periodic mode. This patch takes care of doing this on powerpc. The cpus would not have entered into such deep cpuidle states in periodic mode on powerpc anyway. So there is no loss here. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: 3.19+ <stable@vger.kernel.org> # 3.19+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-24 14:48:01 +08:00
#endif
static int stop_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
power9_idle_type(stop_psscr_table[index].val,
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
stop_psscr_table[index].mask);
return index;
}
/*
* States for dedicated partition case.
*/
static struct cpuidle_state powernv_states[CPUIDLE_STATE_MAX] = {
{ /* Snooze */
.name = "snooze",
.desc = "snooze",
.exit_latency = 0,
.target_residency = 0,
.enter = snooze_loop },
};
static int powernv_cpuidle_cpu_online(unsigned int cpu)
{
struct cpuidle_device *dev = per_cpu(cpuidle_devices, cpu);
if (dev && cpuidle_get_driver()) {
cpuidle_pause_and_lock();
cpuidle_enable_device(dev);
cpuidle_resume_and_unlock();
}
return 0;
}
static int powernv_cpuidle_cpu_dead(unsigned int cpu)
{
struct cpuidle_device *dev = per_cpu(cpuidle_devices, cpu);
if (dev && cpuidle_get_driver()) {
cpuidle_pause_and_lock();
cpuidle_disable_device(dev);
cpuidle_resume_and_unlock();
}
return 0;
}
/*
* powernv_cpuidle_driver_init()
*/
static int powernv_cpuidle_driver_init(void)
{
int idle_state;
struct cpuidle_driver *drv = &powernv_idle_driver;
drv->state_count = 0;
for (idle_state = 0; idle_state < max_idle_state; ++idle_state) {
/* Is the state not enabled? */
if (cpuidle_state_table[idle_state].enter == NULL)
continue;
drv->states[drv->state_count] = /* structure copy */
cpuidle_state_table[idle_state];
drv->state_count += 1;
}
cpuidle: powernv: Pass correct drv->cpumask for registration drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init(). On PowerNV platform cpu_present could be less than cpu_possible in cases where firmware detects the cpu, but it is not available to the OS. When CONFIG_HOTPLUG_CPU=n, such cpus are not hotplugable at runtime and hence we skip creating cpu_device. This breaks cpuidle on powernv where register_cpu() is not called for cpus in cpu_possible_mask that cannot be hot-added at runtime. Trying cpuidle_register_device() on cpu without cpu_device will cause crash like this: cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490] pc: c00000000022c8bc: string+0x34/0x60 lr: c00000000022ed78: vsnprintf+0x284/0x42c sp: c000000ff1503710 msr: 9000000000009033 dar: 6000000060000000 current = 0xc000000ff1480000 paca = 0xc00000000fe82d00 softe: 0 irq_happened: 0x01 pid = 1, comm = swapper/8 Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4 (Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017 enter ? for help [link register ] c00000000022ed78 vsnprintf+0x284/0x42c [c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable) [c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44 [c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc [c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74 [c000000ff15038c0] c000000000619694 printk+0x38/0x4c [c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60 [c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4 [c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78 [c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0 [c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c [c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc [c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0 [c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c [c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c [c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c [c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78 This patch fixes the bug by passing correct cpumask from powernv-cpuidle driver. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [ rjw: Comment massage ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-23 23:22:46 +08:00
/*
* On the PowerNV platform cpu_present may be less than cpu_possible in
* cases when firmware detects the CPU, but it is not available to the
* OS. If CONFIG_HOTPLUG_CPU=n, then such CPUs are not hotplugable at
* run time and hence cpu_devices are not created for those CPUs by the
* generic topology_init().
*
* drv->cpumask defaults to cpu_possible_mask in
* __cpuidle_driver_init(). This breaks cpuidle on PowerNV where
* cpu_devices are not created for CPUs in cpu_possible_mask that
* cannot be hot-added later at run time.
*
* Trying cpuidle_register_device() on a CPU without a cpu_device is
* incorrect, so pass a correct CPU mask to the generic cpuidle driver.
*/
drv->cpumask = (struct cpumask *)cpu_present_mask;
return 0;
}
static inline void add_powernv_state(int index, const char *name,
unsigned int flags,
int (*idle_fn)(struct cpuidle_device *,
struct cpuidle_driver *,
int),
unsigned int target_residency,
unsigned int exit_latency,
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
u64 psscr_val, u64 psscr_mask)
{
strlcpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN);
strlcpy(powernv_states[index].desc, name, CPUIDLE_NAME_LEN);
powernv_states[index].flags = flags;
powernv_states[index].target_residency = target_residency;
powernv_states[index].exit_latency = exit_latency;
powernv_states[index].enter = idle_fn;
/* For power8 and below psscr_* will be 0 */
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
stop_psscr_table[index].val = psscr_val;
stop_psscr_table[index].mask = psscr_mask;
}
/*
* Returns 0 if prop1_len == prop2_len. Else returns -1
*/
static inline int validate_dt_prop_sizes(const char *prop1, int prop1_len,
const char *prop2, int prop2_len)
{
if (prop1_len == prop2_len)
return 0;
pr_warn("cpuidle-powernv: array sizes don't match for %s and %s\n",
prop1, prop2);
return -1;
}
extern u32 pnv_get_supported_cpuidle_states(void);
static int powernv_add_idle_states(void)
{
int nr_idle_states = 1; /* Snooze */
int dt_idle_states;
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
u32 has_stop_states = 0;
int i;
u32 supported_flags = pnv_get_supported_cpuidle_states();
/* Currently we have snooze statically defined */
if (nr_pnv_idle_states <= 0) {
pr_warn("cpuidle-powernv : Only Snooze is available\n");
goto out;
}
/* TODO: Count only states which are eligible for cpuidle */
dt_idle_states = nr_pnv_idle_states;
/*
* Since snooze is used as first idle state, max idle states allowed is
* CPUIDLE_STATE_MAX -1
*/
if (nr_pnv_idle_states > CPUIDLE_STATE_MAX - 1) {
pr_warn("cpuidle-powernv: discovered idle states more than allowed");
dt_idle_states = CPUIDLE_STATE_MAX - 1;
}
/*
* If the idle states use stop instruction, probe for psscr values
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
* and psscr mask which are necessary to specify required stop level.
*/
has_stop_states = (pnv_idle_states[0].flags &
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
(OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP));
for (i = 0; i < dt_idle_states; i++) {
unsigned int exit_latency, target_residency;
bool stops_timebase = false;
struct pnv_idle_states_t *state = &pnv_idle_states[i];
/*
* Skip the platform idle state whose flag isn't in
* the supported_cpuidle_states flag mask.
*/
if ((state->flags & supported_flags) != state->flags)
continue;
/*
* If an idle state has exit latency beyond
* POWERNV_THRESHOLD_LATENCY_NS then don't use it
* in cpu-idle.
*/
if (state->latency_ns > POWERNV_THRESHOLD_LATENCY_NS)
continue;
/*
* Firmware passes residency and latency values in ns.
* cpuidle expects it in us.
*/
exit_latency = DIV_ROUND_UP(state->latency_ns, 1000);
target_residency = DIV_ROUND_UP(state->residency_ns, 1000);
if (has_stop_states && !(state->valid))
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
continue;
if (state->flags & OPAL_PM_TIMEBASE_STOP)
stops_timebase = true;
if (state->flags & OPAL_PM_NAP_ENABLED) {
/* Add NAP state */
add_powernv_state(nr_idle_states, "Nap",
CPUIDLE_FLAG_NONE, nap_loop,
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
target_residency, exit_latency, 0, 0);
} else if (has_stop_states && !stops_timebase) {
add_powernv_state(nr_idle_states, state->name,
CPUIDLE_FLAG_NONE, stop_loop,
target_residency, exit_latency,
state->psscr_val,
state->psscr_mask);
tick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in periodic mode On some archs, the local clockevent device stops in deep cpuidle states. The broadcast framework is used to wakeup cpus in these idle states, in which either an external clockevent device is used to send wakeup ipis or the hrtimer broadcast framework kicks in in the absence of such a device. One cpu is nominated as the broadcast cpu and this cpu sends wakeup ipis to sleeping cpus at the appropriate time. This is the implementation in the oneshot mode of broadcast. In periodic mode of broadcast however, the presence of such cpuidle states results in the cpuidle driver calling tick_broadcast_enable() which shuts down the local clockevent devices of all the cpus and appoints the tick broadcast device as the clockevent device for each of them. This works on those archs where the tick broadcast device is a real clockevent device. But on archs which depend on the hrtimer mode of broadcast, the tick broadcast device hapens to be a pseudo device. The consequence is that the local clockevent devices of all cpus are shutdown and the kernel hangs at boot time in periodic mode. Let us thus not register the cpuidle states which have CPUIDLE_FLAG_TIMER_STOP flag set, on archs which depend on the hrtimer mode of broadcast in periodic mode. This patch takes care of doing this on powerpc. The cpus would not have entered into such deep cpuidle states in periodic mode on powerpc anyway. So there is no loss here. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: 3.19+ <stable@vger.kernel.org> # 3.19+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-24 14:48:01 +08:00
}
/*
* All cpuidle states with CPUIDLE_FLAG_TIMER_STOP set must come
* within this config dependency check.
*/
#ifdef CONFIG_TICK_ONESHOT
else if (state->flags & OPAL_PM_SLEEP_ENABLED ||
state->flags & OPAL_PM_SLEEP_ENABLED_ER1) {
/* Add FASTSLEEP state */
add_powernv_state(nr_idle_states, "FastSleep",
CPUIDLE_FLAG_TIMER_STOP,
fastsleep_loop,
powernv: Pass PSSCR value and mask to power9_idle_stop The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
target_residency, exit_latency, 0, 0);
} else if (has_stop_states && stops_timebase) {
add_powernv_state(nr_idle_states, state->name,
CPUIDLE_FLAG_TIMER_STOP, stop_loop,
target_residency, exit_latency,
state->psscr_val,
state->psscr_mask);
}
tick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in periodic mode On some archs, the local clockevent device stops in deep cpuidle states. The broadcast framework is used to wakeup cpus in these idle states, in which either an external clockevent device is used to send wakeup ipis or the hrtimer broadcast framework kicks in in the absence of such a device. One cpu is nominated as the broadcast cpu and this cpu sends wakeup ipis to sleeping cpus at the appropriate time. This is the implementation in the oneshot mode of broadcast. In periodic mode of broadcast however, the presence of such cpuidle states results in the cpuidle driver calling tick_broadcast_enable() which shuts down the local clockevent devices of all the cpus and appoints the tick broadcast device as the clockevent device for each of them. This works on those archs where the tick broadcast device is a real clockevent device. But on archs which depend on the hrtimer mode of broadcast, the tick broadcast device hapens to be a pseudo device. The consequence is that the local clockevent devices of all cpus are shutdown and the kernel hangs at boot time in periodic mode. Let us thus not register the cpuidle states which have CPUIDLE_FLAG_TIMER_STOP flag set, on archs which depend on the hrtimer mode of broadcast in periodic mode. This patch takes care of doing this on powerpc. The cpus would not have entered into such deep cpuidle states in periodic mode on powerpc anyway. So there is no loss here. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: 3.19+ <stable@vger.kernel.org> # 3.19+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-24 14:48:01 +08:00
#endif
else
continue;
nr_idle_states++;
}
out:
return nr_idle_states;
}
/*
* powernv_idle_probe()
* Choose state table for shared versus dedicated partition
*/
static int powernv_idle_probe(void)
{
if (cpuidle_disable != IDLE_NO_OVERRIDE)
return -ENODEV;
if (firmware_has_feature(FW_FEATURE_OPAL)) {
cpuidle_state_table = powernv_states;
/* Device tree can indicate more idle states */
max_idle_state = powernv_add_idle_states();
cpuidle: powernv: Fix promotion from snooze if next state disabled The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") introduced a timeout for the snooze idle state so that it could be eventually be promoted to a deeper idle state. The snooze timeout value is static and set to the target residency of the next idle state, which would train the cpuidle governor to pick the next idle state eventually. The unfortunate side-effect of this is that if the next idle state(s) is disabled, the CPU will forever remain in snooze, despite the fact that the system is completely idle, and other deeper idle states are available. This patch fixes the issue by dynamically setting the snooze timeout to the target residency of the next enabled state on the device. Before Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01297 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 96.41| 0.00| 0.00 0| 8| 1| 96.43| 0.00| 0.00 0| 8| 2| 96.47| 0.00| 0.00 0| 8| 3| 96.35| 0.00| 0.00 0| 8| 4| 96.37| 0.00| 0.00 0| 8| 5| 96.37| 0.00| 0.00 0| 8| 6| 96.47| 0.00| 0.00 0| 8| 7| 96.47| 0.00| 0.00 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.05033 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 16| 0| 89.79| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 1| 90.12| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 2| 90.21| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 3| 90.29| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 After Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01200 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 16.58| 0.00| 77.21 0| 8| 1| 18.42| 0.00| 75.38 0| 8| 2| 4.70| 0.00| 94.09 0| 8| 3| 17.06| 0.00| 81.73 0| 8| 4| 3.06| 0.00| 95.73 0| 8| 5| 7.00| 0.00| 96.80 0| 8| 6| 1.00| 0.00| 98.79 0| 8| 7| 5.62| 0.00| 94.17 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.02110 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 0| 0| 0.69| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 9.39| 89.70 0| 0| 1| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.05| 93.21 0| 0| 2| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 89.93 0| 0| 3| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 93.26 Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-31 20:15:09 +08:00
default_snooze_timeout = TICK_USEC * tb_ticks_per_usec;
if (max_idle_state > 1)
snooze_timeout_en = true;
} else
return -ENODEV;
return 0;
}
static int __init powernv_processor_idle_init(void)
{
int retval;
retval = powernv_idle_probe();
if (retval)
return retval;
powernv_cpuidle_driver_init();
retval = cpuidle_register(&powernv_idle_driver, NULL);
if (retval) {
printk(KERN_DEBUG "Registration of powernv driver failed.\n");
return retval;
}
retval = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
"cpuidle/powernv:online",
powernv_cpuidle_cpu_online, NULL);
WARN_ON(retval < 0);
retval = cpuhp_setup_state_nocalls(CPUHP_CPUIDLE_DEAD,
"cpuidle/powernv:dead", NULL,
powernv_cpuidle_cpu_dead);
WARN_ON(retval < 0);
printk(KERN_DEBUG "powernv_idle_driver registered\n");
return 0;
}
device_initcall(powernv_processor_idle_init);