Commit Graph

2243 Commits

Author SHA1 Message Date
Linus Torvalds 4d25ec1966 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:

 - Improve thermal cpu_cooling interaction with cpufreq core.

   The cpu_cooling driver is designed to use CPU frequency scaling to
   avoid high thermal states for a platform. But it wasn't glued really
   well with cpufreq core.

   For example clipped-cpus is copied from the policy structure and its
   much better to use the policy->cpus (or related_cpus) fields directly
   as they may have got updated. Not that things were broken before this
   series, but they can be optimized a bit more.

   This series tries to improve interactions between cpufreq core and
   cpu_cooling driver and does some fixes/cleanups to the cpu_cooling
   driver. (Viresh Kumar)

 - A couple of fixes and cleanups in thermal core and imx, hisilicon,
   bcm_2835, int340x thermal drivers. (Arvind Yadav, Dan Carpenter,
   Sumeet Pawnikar, Srinivas Pandruvada, Willy WOLFF)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (24 commits)
  thermal: bcm2835: fix an error code in probe()
  thermal: hisilicon: Handle return value of clk_prepare_enable
  thermal: imx: Handle return value of clk_prepare_enable
  thermal: int340x: check for sensor when PTYP is missing
  Thermal/int340x: Fix few typos and kernel-doc style
  thermal: fix source code documentation for parameters
  thermal: cpu_cooling: Replace kmalloc with kmalloc_array
  thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device
  thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power()
  thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev
  thermal: cpu_cooling: get_level() can't fail
  thermal: cpu_cooling: create structure for idle time stats
  thermal: cpu_cooling: merge frequency and power tables
  thermal: cpu_cooling: get rid of 'allowed_cpus'
  thermal: cpu_cooling: OPPs are registered for all CPUs
  thermal: cpu_cooling: store cpufreq policy
  cpufreq: create cpufreq_table_count_valid_entries()
  thermal: cpu_cooling: use cpufreq_policy to register cooling device
  thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state()
  thermal: cpu_cooling: remove cpufreq_cooling_get_level()
  ...
2017-07-14 13:12:32 -07:00
Rafael J. Wysocki a252c258dd Merge branches 'pm-cpufreq-sched' and 'intel_pstate'
* pm-cpufreq-sched:
  cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race

* intel_pstate:
  cpufreq: intel_pstate: Fix ratio setting for min_perf_pct
2017-07-14 13:16:16 +02:00
Srinivas Pandruvada 6e34e1f23d cpufreq: intel_pstate: Correct the busy calculation for KNL
The busy percent calculated for the Knights Landing (KNL) platform
is 1024 times smaller than the correct busy value.  This causes
performance to get stuck at the lowest ratio.

The scaling algorithm used for KNL is performance-based, but it still
looks at the CPU load to set the scaled busy factor to 0 when the
load is less than 1 percent.  In this case, since the computed load
is 1024x smaller than it should be, the scaled busy factor will
always be 0, irrespective of CPU business.

This needs a fix similar to the turbostat one in commit b2b34dfe4d
(tools/power turbostat: KNL workaround for %Busy and Avg_MHz).

For this reason, add one more callback to processor-specific
callbacks to specify an MPERF multiplier represented by a number of
bit positions to shift the value of that register to the left to
copmensate for its rate difference with respect to the TSC.  This
shift value is used during CPU busy calculations.

Fixes: ffb810563c (intel_pstate: Avoid getting stuck in high P-states when idle)
Reported-and-tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-14 03:01:28 +02:00
Srinivas Pandruvada d4436c0dba cpufreq: intel_pstate: Fix ratio setting for min_perf_pct
When the minimum performance limit percentage is set to the power-up
default, it is possible that minimum performance ratio is off by one.

In the set_policy() callback the minimum ratio is calculated by
applying global.min_perf_pct to turbo_ratio and rounding up, but the
power-up default global.min_perf_pct is already rounded up to the
next percent in min_perf_pct_min().  That results in two round up
operations, so for the default min_perf_pct one of them is not
required.

It is better to remove rounding up in min_perf_pct_min() as this
matches the displayed min_perf_pct prior to commit c5a2ee7dde
(cpufreq: intel_pstate: Active mode P-state limits rework) in 4.12.

For example on a platform with max turbo ratio of 37 and minimum
ratio of 10, the min_perf_pct resulted in 28 with the above commit.
Before this commit it was 27 and it will be the same after this
change.

Fixes: 1a4fe38add (cpufreq: intel_pstate: Remove max/min fractions to limit performance)
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-12 14:39:10 +02:00
Linus Torvalds 1633b39610 More power management updates for v4.13-rc1
- Revert a recent change in the generic power domains (genpd)
    framework that led to regressions and turned out the be misguided
    (Rafael Wysocki).
 
  - Fix a recently introduced build issue in the generic power domains
    (genpd) framework (Arnd Bergmann).
 
  - Constify attribute_group structures in the PM core, the cpufreq
    stats code and in intel_pstate (Arvind Yadav).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZY/L6AAoJEILEb/54YlRxMBUP/0lCziyBNAUPm8+gpC7RA5gv
 tZwtGPnDr2toWg3+te0Aqc3/LOKt4CFtQEui+IGGPA5ghoZZ53lPuxZH8MCEcv/I
 LhoHmNK+2D088JViiaXlkanLgjcbtkgWKEgRQXOm75XlbaReW3wKmiPkc8iLTRde
 tytdf82GUN0AKKWCsUMiiEWDCYs9mpM3MPX2GOS+ZPBZfX2cyubKZ9STPzzFDwFf
 NfP+NuzJmYfEonBXproTa6ZqAq2UVGPeolyYe1lwV8QVCU8Z4W6GRAKbSzk0Hq6N
 wcdpNaNmkQytjDQ1hZ0NNFecTH4qjStOkc9OwNZJwoSbC31sQGHyKnxWP8Re1+hU
 UmpIAuNBc6eKJmkyoOE9GfIB08AvvuB4s7B3X8ffpWGqQmASYAY9DhEKDlPmmkhD
 NV+HTUkebw+gZoJp6VGL072KGARrNEmodKrcmXA/T4T8ZwoHFbnQbzDaODzW7rzx
 1UxwCtUa/Jl5hOPngo0XuLnbeM7AAG1MjaJSKDqoUl4WbjdYG3f7yRxs6T+JS+dk
 1+NpVJiIKBM1bqP7Jf+v9xrbYG31w5blikxhCpjA601ztV0vgtiiojssKpNwjkpv
 Myh1BavAaLcnMCkCfHppLlXv2bnLHFANMyMcPU+EuLzPDTsxxxfhmbBHBur4r7BA
 DXmpRQvWCoAqlQzzvZoZ
 =B+R7
 -----END PGP SIGNATURE-----

Merge tag 'pm-extra-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These revert one recent change in the generic power domains
  framework, fix a recently introduced build issue in there and
  constify attribute_group structures in some places.

  Specifics:

   - Revert a recent change in the generic power domains (genpd)
     framework that led to regressions and turned out the be misguided
     (Rafael Wysocki).

   - Fix a recently introduced build issue in the generic power domains
     (genpd) framework (Arnd Bergmann).

   - Constify attribute_group structures in the PM core, the cpufreq
     stats code and in intel_pstate (Arvind Yadav)"

* tag 'pm-extra-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: constify attribute_group structures
  cpufreq: cpufreq_stats: constify attribute_group structures
  PM / sleep: constify attribute_group structures
  PM / Domains: provide pm_genpd_poweroff_noirq() stub
  Revert "PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device"
2017-07-10 15:16:21 -07:00
Rafael J. Wysocki 15d56b3921 Merge branches 'pm-domains', 'pm-sleep' and 'pm-cpufreq'
* pm-domains:
  PM / Domains: provide pm_genpd_poweroff_noirq() stub
  Revert "PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device"

* pm-sleep:
  PM / sleep: constify attribute_group structures

* pm-cpufreq:
  cpufreq: intel_pstate: constify attribute_group structures
  cpufreq: cpufreq_stats: constify attribute_group structures
2017-07-10 22:45:16 +02:00
Zhang Rui 467aebee87 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal into thermal-soc 2017-07-05 14:51:32 +08:00
Linus Torvalds e854711291 ARM: SoC driver updates
- New SoC specific drivers
   - NVIDIA Tegra PM Domain support for newer SoCs (Tegra186 and later)
     based on the "BPMP" firmware
   - Clocksource and system controller drivers for the newly added
     Action Semi platforms (both arm and arm64).
 
 - Reset subsystem, merged through arm-soc by tradition:
   - New drivers for Altera Stratix10, TI Keystone and Cortina Gemini SoCs
   - Various subsystem-wide cleanups
 
 - Updates for existing SoC-specific drivers
   - TI GPMC (General Purpose Memory Controller)
   - Mediatek "scpsys" system controller support for MT6797
   - Broadcom "brcmstb_gisb" bus arbitrer
   - ARM SCPI firmware
   - Renesas "SYSC" system controller
 
 One more driver update was submitted for the Freescale/NXP DPAA
 data path acceleration that has previously been used on PowerPC
 chips. I ended up postponing the merge until some API questions
 for its unusual MMIO access are resolved.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAWVpDFmCrR//JCVInAQKoihAAosgC3+3IHppOhHid+HepkOcp2teyKknw
 42fSXVpTdfWa8Oe7q80Kzmz2CPNfFq2SzHz6oXb9WCcDFqSGr0b9aSE7NnksRjTf
 2euHVJ4MnJpkRewvorRmcpK8dPXDcHwEw/8hU3yZtJUGI0IKtlrqXis+evgkz9cn
 YDynuVdAZgZiEfiNeSeadyNLcxaQCc3x9ovvsBXxBa1/x1pfeRnTbp+6hiHilCJu
 Szts/yAzZzZE9Jkf9dLKfNlsT6m2SgtjukqqOR+zHAhi7/BdTFSVUP6L8u7QjrR+
 +ijTICg8FMJGiWLAOe6ED2qZVByN92EJ2AGwriYlSles9ouoGfRkJ2rwxyjbete7
 avy0HP+PSBFXWdwbOcq8HX8CrbuBltagl5fkMokct6biWLLMshNZ33WWdQ6/DsM2
 b9mAAZuhbs0g1qWUBD3+q6qBytSuGme6Px6fMoVEc4GQ2YVFUQOoEfZOGKRv1U1o
 aMWGt/6qeF8SG288rYAnQ/TuYWpOLtksV6yhotA00HUHhkTCy0xVCdyWGZtNyKhG
 o/x4YnhWFzHsXmqKcR1sM7LzfZY/WNmbrOLvK6i83Z0P7QptqrdaLAylL3iSPEyX
 ZDUgExf6PYXkWIewc3KwC5sJjuD05z3ZCgIR+mCezwbuD+3Z2fOdjodY/VBZ74hq
 URcM/BqtuWE=
 =5L6n
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "New SoC specific drivers:

   - NVIDIA Tegra PM Domain support for newer SoCs (Tegra186 and later)
     based on the "BPMP" firmware

   - Clocksource and system controller drivers for the newly added
     Action Semi platforms (both arm and arm64).

  Reset subsystem, merged through arm-soc by tradition:

   - New drivers for Altera Stratix10, TI Keystone and Cortina Gemini
     SoCs

   - Various subsystem-wide cleanups

  Updates for existing SoC-specific drivers

   - TI GPMC (General Purpose Memory Controller)

   - Mediatek "scpsys" system controller support for MT6797

   - Broadcom "brcmstb_gisb" bus arbitrer

   - ARM SCPI firmware

   - Renesas "SYSC" system controller

  One more driver update was submitted for the Freescale/NXP DPAA data
  path acceleration that has previously been used on PowerPC chips. I
  ended up postponing the merge until some API questions for its unusual
  MMIO access are resolved"

* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (35 commits)
  clocksource: owl: Add S900 support
  clocksource: Add Owl timer
  soc: renesas: rcar-sysc: Use GENPD_FLAG_ALWAYS_ON
  firmware: tegra: Fix locking bugs in BPMP
  soc/tegra: flowctrl: Fix error handling
  soc/tegra: bpmp: Implement generic PM domains
  soc/tegra: bpmp: Update ABI header
  PM / Domains: Allow overriding the ->xlate() callback
  soc: brcmstb: enable drivers for ARM64 and BMIPS
  soc: renesas: Rework Kconfig and Makefile logic
  reset: Add the TI SCI reset driver
  dt-bindings: reset: Add TI SCI reset binding
  reset: use kref for reference counting
  soc: qcom: smsm: Improve error handling, quiesce probe deferral
  cpufreq: scpi: use new scpi_ops functions to remove duplicate code
  firmware: arm_scpi: add support to populate OPPs and get transition latency
  dt-bindings: reset: Add reset manager offsets for Stratix10
  memory: omap-gpmc: add error message if bank-width property is absent
  memory: omap-gpmc: make dts snippet include semicolon
  reset: Add a Gemini reset controller
  ...
2017-07-04 14:47:47 -07:00
Linus Torvalds 408c9861c6 Power management updates for v4.13-rc1
- Rework suspend-to-idle to allow it to take wakeup events signaled
    by the EC into account on ACPI-based platforms in order to properly
    support power button wakeup from suspend-to-idle on recent Dell
    laptops (Rafael Wysocki).
 
    That includes the core suspend-to-idle code rework, support for
    the Low Power S0 _DSM interface, and support for the ACPI INT0002
    Virtual GPIO device from Hans de Goede (required for USB keyboard
    wakeup from suspend-to-idle to work on some machines).
 
  - Stop trying to export the current CPU frequency via /proc/cpuinfo
    on x86 as that is inaccurate and confusing (Len Brown).
 
  - Rework the way in which the current CPU frequency is exported by
    the kernel (over the cpufreq sysfs interface) on x86 systems with
    the APERF and MPERF registers by always using values read from
    these registers, when available, to compute the current frequency
    regardless of which cpufreq driver is in use (Len Brown).
 
  - Rework the PCI/ACPI device wakeup infrastructure to remove the
    questionable and artificial distinction between "devices that
    can wake up the system from sleep states" and "devices that can
    generate wakeup signals in the working state" from it, which
    allows the code to be simplified quite a bit (Rafael Wysocki).
 
  - Fix the wakeup IRQ framework by making it use SRCU instead of
    RCU which doesn't allow sleeping in the read-side critical
    sections, but which in turn is expected to be allowed by the
    IRQ bus locking infrastructure (Thomas Gleixner).
 
  - Modify some computations in the intel_pstate driver to avoid
    rounding errors resulting from them (Srinivas Pandruvada).
 
  - Reduce the overhead of the intel_pstate driver in the HWP
    (hardware-managed P-states) mode and when the "performance"
    P-state selection algorithm is in use by making it avoid
    registering scheduler callbacks in those cases (Len Brown).
 
  - Rework the energy_performance_preference sysfs knob in
    intel_pstate by changing the values that correspond to
    different symbolic hint names used by it (Len Brown).
 
  - Make it possible to use more than one cpuidle driver at the same
    time on ARM (Daniel Lezcano).
 
  - Make it possible to prevent the cpuidle menu governor from using
    the 0 state by disabling it via sysfs (Nicholas Piggin).
 
  - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1
    on AMD systems (Yazen Ghannam).
 
  - Make the CPPC cpufreq driver take the lowest nonlinear performance
    information into account (Prashanth Prakash).
 
  - Add support for hi3660 to the cpufreq-dt driver, fix the
    imx6q driver and clean up the sfi, exynos5440 and intel_pstate
    drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila,
    Rafael Wysocki, Tao Wang).
 
  - Fix a few minor issues in the generic power domains (genpd)
    framework and clean it up somewhat (Krzysztof Kozlowski,
    Mikko Perttunen, Viresh Kumar).
 
  - Fix a couple of minor issues in the operating performance points
    (OPP) framework and clean it up somewhat (Viresh Kumar).
 
  - Fix a CONFIG dependency in the hibernation core and clean it up
    slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).
 
  - Add rk3228 support to the rockchip-io adaptive voltage scaling
    (AVS) driver (David Wu).
 
  - Fix an incorrect bit shift operation in the RAPL power capping
    driver (Adam Lessnau).
 
  - Add support for the EPP field in the HWP (hardware managed
    P-states) control register, HWP.EPP, to the x86_energy_perf_policy
    tool and update msr-index.h with HWP.EPP values (Len Brown).
 
  - Fix some minor issues in the turbostat tool (Len Brown).
 
  - Add support for AMD family 0x17 CPUs to the cpupower tool and fix
    a minor issue in it (Sherry Hurwitz).
 
  - Assorted cleanups, mostly related to the constification of some
    data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
    Kozlowski).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZWrICAAoJEILEb/54YlRxZYMQAIRhfbyDxKq+ByvSilUS8kTA
 AItwJ8FFzykhiwN75Cqabg4rAGyWma7IRs1vzU7zeC1aEQIn+bTQtvk+utZNI+g2
 ANFlDha20q/sXsP/CDMMTIAdW9tSOC0TOvFI9s2V2Y8dJZhoekO4ctx34FAfUS5d
 Ao6rwSAWCMsCXcGaTAlqTA+TEJmBG7u6Iq6hq6ngltoFwOv3mWWBVn52VVaJ7SMp
 9/IPbbLGMFAedrgEBRGCR+MME1xZZpvcZIJaTt1Mgn7Cx3cJaysIUAvqY/SsvFGq
 5FcUTcF2qpK3+AGawiAxZIjvOBsGRtIwqKinNIzYWs/NjiIdzmgVAmTeuPtTqp+5
 HFehUdtkFcnuDnLqSNzAaZUa7tw84cJkwnbVMnesx0MkG6rZ1SeL22E2Sabpcdsh
 3Yo1ThzJSxi59DhiiE92EQnNCEjmCldRy+8q5Ag035muxl6EJYvuNBMnZv/BMCUn
 ltSNOrmps1DlN+Col8ORIeNzQ1YjYzWMqKAYzSbyccm4ug/iSHx0/DuESmQ4GTlF
 YCwkmqyWiHrBwpl51jc+4a7SGlMmKRqU+MJes0CjagaaqoUAb8qeBOpzEJ0yNwjZ
 wtI41l6blE6kbMD3yqGdCfiB2S7GlPVoxa15eX1wRyLH3fLjwwrzJirEaiBS86tI
 1PzHZEOmBlh3DYC6DBKA
 =Wsph
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "The big ticket items here are the rework of suspend-to-idle in order
  to add proper support for power button wakeup from it on recent Dell
  laptops and the rework of interfaces exporting the current CPU
  frequency on x86.

  In addition to that, support for a few new pieces of hardware is
  added, the PCI/ACPI device wakeup infrastructure is simplified
  significantly and the wakeup IRQ framework is fixed to unbreak the IRQ
  bus locking infrastructure.

  Also, there are some functional improvements for intel_pstate, tools
  updates and small fixes and cleanups all over.

  Specifics:

   - Rework suspend-to-idle to allow it to take wakeup events signaled
     by the EC into account on ACPI-based platforms in order to properly
     support power button wakeup from suspend-to-idle on recent Dell
     laptops (Rafael Wysocki).

     That includes the core suspend-to-idle code rework, support for the
     Low Power S0 _DSM interface, and support for the ACPI INT0002
     Virtual GPIO device from Hans de Goede (required for USB keyboard
     wakeup from suspend-to-idle to work on some machines).

   - Stop trying to export the current CPU frequency via /proc/cpuinfo
     on x86 as that is inaccurate and confusing (Len Brown).

   - Rework the way in which the current CPU frequency is exported by
     the kernel (over the cpufreq sysfs interface) on x86 systems with
     the APERF and MPERF registers by always using values read from
     these registers, when available, to compute the current frequency
     regardless of which cpufreq driver is in use (Len Brown).

   - Rework the PCI/ACPI device wakeup infrastructure to remove the
     questionable and artificial distinction between "devices that can
     wake up the system from sleep states" and "devices that can
     generate wakeup signals in the working state" from it, which allows
     the code to be simplified quite a bit (Rafael Wysocki).

   - Fix the wakeup IRQ framework by making it use SRCU instead of RCU
     which doesn't allow sleeping in the read-side critical sections,
     but which in turn is expected to be allowed by the IRQ bus locking
     infrastructure (Thomas Gleixner).

   - Modify some computations in the intel_pstate driver to avoid
     rounding errors resulting from them (Srinivas Pandruvada).

   - Reduce the overhead of the intel_pstate driver in the HWP
     (hardware-managed P-states) mode and when the "performance" P-state
     selection algorithm is in use by making it avoid registering
     scheduler callbacks in those cases (Len Brown).

   - Rework the energy_performance_preference sysfs knob in intel_pstate
     by changing the values that correspond to different symbolic hint
     names used by it (Len Brown).

   - Make it possible to use more than one cpuidle driver at the same
     time on ARM (Daniel Lezcano).

   - Make it possible to prevent the cpuidle menu governor from using
     the 0 state by disabling it via sysfs (Nicholas Piggin).

   - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on
     AMD systems (Yazen Ghannam).

   - Make the CPPC cpufreq driver take the lowest nonlinear performance
     information into account (Prashanth Prakash).

   - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q
     driver and clean up the sfi, exynos5440 and intel_pstate drivers
     (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael
     Wysocki, Tao Wang).

   - Fix a few minor issues in the generic power domains (genpd)
     framework and clean it up somewhat (Krzysztof Kozlowski, Mikko
     Perttunen, Viresh Kumar).

   - Fix a couple of minor issues in the operating performance points
     (OPP) framework and clean it up somewhat (Viresh Kumar).

   - Fix a CONFIG dependency in the hibernation core and clean it up
     slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).

   - Add rk3228 support to the rockchip-io adaptive voltage scaling
     (AVS) driver (David Wu).

   - Fix an incorrect bit shift operation in the RAPL power capping
     driver (Adam Lessnau).

   - Add support for the EPP field in the HWP (hardware managed
     P-states) control register, HWP.EPP, to the x86_energy_perf_policy
     tool and update msr-index.h with HWP.EPP values (Len Brown).

   - Fix some minor issues in the turbostat tool (Len Brown).

   - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a
     minor issue in it (Sherry Hurwitz).

   - Assorted cleanups, mostly related to the constification of some
     data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
     Kozlowski)"

* tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits)
  cpufreq: Update scaling_cur_freq documentation
  cpufreq: intel_pstate: Clean up after performance governor changes
  PM: hibernate: constify attribute_group structures.
  cpuidle: menu: allow state 0 to be disabled
  intel_idle: Use more common logging style
  PM / Domains: Fix missing default_power_down_ok comment
  PM / Domains: Fix unsafe iteration over modified list of domains
  PM / Domains: Fix unsafe iteration over modified list of domain providers
  PM / Domains: Fix unsafe iteration over modified list of device links
  PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device
  PM / Domains: Call driver's noirq callbacks
  PM / core: Drop run_wake flag from struct dev_pm_info
  PCI / PM: Simplify device wakeup settings code
  PCI / PM: Drop pme_interrupt flag from struct pci_dev
  ACPI / PM: Consolidate device wakeup settings code
  ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags
  PM / QoS: constify *_attribute_group.
  PM / AVS: rockchip-io: add io selectors and supplies for rk3228
  powercap/RAPL: prevent overridding bits outside of the mask
  PM / sysfs: Constify attribute groups
  ...
2017-07-04 13:39:41 -07:00
Arvind Yadav 106c9c77ed cpufreq: intel_pstate: constify attribute_group structures
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
  15197	   2552	     40	  17789	   457d	drivers/cpufreq/intel_pstate.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
  15261	   2488	     40	  17789	   457d	drivers/cpufreq/intel_pstate.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-04 22:03:13 +02:00
Arvind Yadav 402202e8de cpufreq: cpufreq_stats: constify attribute_group structures
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
   1655	    256	      4	   1915	    77b	drivers/cpufreq/cpufreq_stats.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   1695	    192	      4	   1891	    763	drivers/cpufreq/cpufreq_stats.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-04 22:03:13 +02:00
Linus Torvalds 9a9594efe5 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP hotplug updates from Thomas Gleixner:
 "This update is primarily a cleanup of the CPU hotplug locking code.

  The hotplug locking mechanism is an open coded RWSEM, which allows
  recursive locking. The main problem with that is the recursive nature
  as it evades the full lockdep coverage and hides potential deadlocks.

  The rework replaces the open coded RWSEM with a percpu RWSEM and
  establishes full lockdep coverage that way.

  The bulk of the changes fix up recursive locking issues and address
  the now fully reported potential deadlocks all over the place. Some of
  these deadlocks have been observed in the RT tree, but on mainline the
  probability was low enough to hide them away."

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  cpu/hotplug: Constify attribute_group structures
  powerpc: Only obtain cpu_hotplug_lock if called by rtasd
  ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
  cpu/hotplug: Remove unused check_for_tasks() function
  perf/core: Don't release cred_guard_mutex if not taken
  cpuhotplug: Link lock stacks for hotplug callbacks
  acpi/processor: Prevent cpu hotplug deadlock
  sched: Provide is_percpu_thread() helper
  cpu/hotplug: Convert hotplug locking to percpu rwsem
  s390: Prevent hotplug rwsem recursion
  arm: Prevent hotplug rwsem recursion
  arm64: Prevent cpu hotplug rwsem recursion
  kprobes: Cure hotplug lock ordering issues
  jump_label: Reorder hotplug lock and jump_label_lock
  perf/tracing/cpuhotplug: Fix locking order
  ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
  PCI: Replace the racy recursion prevention
  PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
  perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
  x86/perf: Drop EXPORT of perf_check_microcode
  ...
2017-07-03 18:08:06 -07:00
Rafael J. Wysocki f1c7842e5f Merge branches 'pm-cpufreq', 'intel_pstate' and 'pm-cpuidle'
* pm-cpufreq:
  cpufreq / CPPC: Initialize policy->min to lowest nonlinear performance
  cpufreq: sfi: make freq_table static
  cpufreq: exynos5440: Fix inconsistent indenting
  cpufreq: imx6q: imx6ull should use the same flow as imx6ul
  cpufreq: dt: Add support for hi3660

* intel_pstate:
  cpufreq: Update scaling_cur_freq documentation
  cpufreq: intel_pstate: Clean up after performance governor changes
  intel_pstate: skip scheduler hook when in "performance" mode
  intel_pstate: delete scheduler hook in HWP mode
  x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
  cpufreq: intel_pstate: Remove max/min fractions to limit performance
  x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"

* pm-cpuidle:
  cpuidle: menu: allow state 0 to be disabled
  intel_idle: Use more common logging style
  x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems
  ARM: cpuidle: Support asymmetric idle definition
2017-07-03 14:21:18 +02:00
Rafael J. Wysocki 16b5b09240 Merge branch 'pm-tools'
* pm-tools:
  cpupower: Add support for new AMD family 0x17
  cpupower: Fix bug where return value was not used
  tools/power turbostat: update version number
  tools/power turbostat: decode MSR_IA32_MISC_ENABLE only on Intel
  tools/power turbostat: stop migrating, unless '-m'
  tools/power turbostat: if  --debug, print sampling overhead
  tools/power turbostat: hide SKL counters, when not requested
  intel_pstate: use updated msr-index.h HWP.EPP values
  tools/power x86_energy_perf_policy: support HWP.EPP
  x86: msr-index.h: fix shifts to ULL results in HWP macros.
  x86: msr-index.h: define HWP.EPP values
  x86: msr-index.h: define EPB mid-points
2017-07-03 14:17:16 +02:00
Rafael J. Wysocki fab24dcc39 cpufreq: intel_pstate: Clean up after performance governor changes
After commit 82b4e03e01 (intel_pstate: skip scheduler hook when in
"performance" mode) get_target_pstate_use_performance() and
get_target_pstate_use_cpu_load() are never called if scaling_governor
is "performance", so drop the CPUFREQ_POLICY_PERFORMANCE checks from
them as they will never trigger anyway.

Moreover, the documentation needs to be updated to reflect the change
made by the above commit, so do that too.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2017-06-29 23:25:15 +02:00
Prakash, Prashanth 73808d0fd2 cpufreq / CPPC: Initialize policy->min to lowest nonlinear performance
Description of Lowest Perfomance in ACPI 6.1 specification states:
"Lowest Performance is the absolute lowest performance level of
the platform. Selecting a performance level lower than the lowest
nonlinear performance level may actually cause an efficiency penalty,
but should reduce the instantaneous power consumption of the processor.
In traditional terms, this represents the T-state range of performance
levels."

Set the default value of policy->min to Lowest Nonlinear Performance
to avoid any potential efficiency penalty.

Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-27 02:19:39 +02:00
Len Brown 82b4e03e01 intel_pstate: skip scheduler hook when in "performance" mode
When the governor is set to "performance", intel_pstate does not
need the scheduler hook for doing any calculations.  Under these
conditions, its only purpose is to continue to maintain
cpufreq/scaling_cur_freq.

The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.

So in "performance" governor mode, the scheduler hook can be skipped.
This applies to both in Software and Hardware P-state control modes.

Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-27 01:47:34 +02:00
Len Brown 62611cb912 intel_pstate: delete scheduler hook in HWP mode
The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.

In HWP mode, maintaining that value was the sole purpose of
the scheduler hook, intel_pstate_update_util_hwp(),
so it can now be removed.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-27 01:47:33 +02:00
Len Brown f8475cef90 x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.

Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.

Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.

MHz is computed like so:

MHz = base_MHz * delta_APERF / delta_MPERF

MHz is the average frequency of the busy processor
over a measurement interval.  The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.

As with previous methods of calculating MHz,
idle time is excluded.

base_MHz above is from TSC calibration global "cpu_khz".

This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.

When this routine is invoked more frequently, the measurement
interval becomes shorter.  However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.

Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-27 01:47:32 +02:00
Rafael J. Wysocki 5422583bfa Merge back PM tools material for v4.13. 2017-06-27 01:42:51 +02:00
Srinivas Pandruvada 1a4fe38add cpufreq: intel_pstate: Remove max/min fractions to limit performance
In the current model the max/min perf limits are a fraction of current
user space limits to the allowed max_freq or 100% for global limits.
This results in wrong ratio limits calculation because of rounding
issues for some user space limits.

Initially we tried to solve this issue by issue by having more shift
bits to increase precision. Still there are isolated cases where we still
have error.

This can be avoided by using ratios all together. Since the way we get
cpuinfo.max_freq is by multiplying scaling factor to max ratio, we can
easily keep the max/min ratios in terms of ratios and not fractions.

For example:
if the max ratio = 36
cpuinfo.max_freq = 36 * 100000 = 3600000

Suppose user space sets a limit of 1200000, then we can calculate
max ratio limit as
= 36 * 1200000 / 3600000
= 12
This will be correct for any user limits.

The other advantage is that, we don't need to do any calculation in the
fast path as ratio limit is already calculated via set_policy() callback.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-24 01:48:12 +02:00
Colin Ian King 0370f0f975 cpufreq: sfi: make freq_table static
pointer freq_table can be made static as it does not need to be in
global scope.

Cleans up sparse warning:
"symbol 'freq_table' was not declared. Should it be static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-24 01:43:21 +02:00
Krzysztof Kozlowski e773f5c7e8 cpufreq: exynos5440: Fix inconsistent indenting
Fix inconsistent indenting and unneeded white space in assignment.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-24 01:38:00 +02:00
Octavian Purdila 3fafb4e772 cpufreq: imx6q: imx6ull should use the same flow as imx6ul
This fixes an issue with imx6ull where setting the frequency to 528Mhz
would actually set the ARM clock to 324Mhz.

Signed-off-by: Octavian Purdila <octavian.purdila@nxp.com>
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-24 01:35:13 +02:00
Tao Wang a0df77348a cpufreq: dt: Add support for hi3660
Add the compatible string for supporting the generic device tree cpufreq-dt
driver on Hisilicon's 3660 SoC.

Signed-off-by: Tao Wang <kevin.wangtao@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-24 01:33:56 +02:00
Ingo Molnar 902b319413 Merge branch 'WIP.sched/core' into sched/core
Conflicts:
	kernel/sched/Makefile

Pick up the waitqueue related renames - it didn't get much feedback,
so it appears to be uncontroversial. Famous last words? ;-)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:28:21 +02:00
Olof Johansson 93c452f5d3 SCPI update for v4.13
Adds support to get DVFS transition latency and OPP for any device whose
 DVFS are managed by SCPI. This avoids code duplication in both cpufreq
 and devfreq SCPI drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZQmv9AAoJEABBurwxfuKYerEP/jaBOlgvAomhBreaERMX5Oud
 He+7VIj2qjuapptE3Su2+t+IKCyebc2hTPSi+PhFNn4Upqk/ZchE5X+dbvXNG3nf
 eZzBCSsvGPJoIXVeU3ACB5rT+Apv0qm0PSYUPnu9ZyCl46/qGZFECVufDJN1TdQc
 FEdYZDi7mQotYL27Mx7q4i7vay8Z6MkJs9EH86BnB9M6dZa3C5kFY3jkm2tu1Rca
 1ythsWMoy/Wrc7kHex8Dk90hbHBJ7RhJIYYxxv2IJ+67SkIdTd/+HpBcdLDpqVNt
 e9ElLAw9t4ym1PdgGcQjDcHlpqRkxucgOJhSticrxTNdGThTH6hx7HZhz+CqCJ7k
 e9my66sfKFH+wzz0C8agd1H9sMbaqoVoHUucj2kR6PR/rO/VQ9ugTDnaZfkRoPvh
 +87XEv+d7Gf/B4jfEUwxsfcLSDe1vpghSt02FM4BrYtdoTZFse9koPMt0jKGNnjO
 5jzk7VQVoKLpB5Xg8X9ePv808GWlUWTaxuPYfZMqThPdNHPsnAJrWe47s42Yogqw
 ZswExEDY/MdixtgzZssi96r5m69QVycSCvs+QiBNDPFqRgsuwTC1CDtvF4/3vZks
 mLtIxQ9vPXgHY7SfPGVZH3RuE07YC3QA+0m22xBCytogeriuMtxsyzHAkB3z2YFf
 xSNL5FIu5jOGQRVZ+PKQ
 =8QF6
 -----END PGP SIGNATURE-----

Merge tag 'scpi-updates-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into next/drivers

SCPI update for v4.13

Adds support to get DVFS transition latency and OPP for any device whose
DVFS are managed by SCPI. This avoids code duplication in both cpufreq
and devfreq SCPI drivers.

* tag 'scpi-updates-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  cpufreq: scpi: use new scpi_ops functions to remove duplicate code
  firmware: arm_scpi: add support to populate OPPs and get transition latency

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 20:55:07 -07:00
Rafael J. Wysocki f63e4f7d41 Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-devfreq'
* pm-cpufreq:
  cpufreq: conservative: Allow down_threshold to take values from 1 to 10
  Revert "cpufreq: schedutil: Reduce frequencies slower"

* pm-cpuidle:
  cpuidle: dt: Add missing 'of_node_put()'

* pm-devfreq:
  PM / devfreq: exynos-ppmu: Staticize event list
  PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
  PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
2017-06-15 01:51:33 +02:00
Tomasz Wilczyński b8e11f7d27 cpufreq: conservative: Allow down_threshold to take values from 1 to 10
Commit 27ed3cd2eb (cpufreq: conservative: Fix the logic in frequency
decrease checking) removed the 10 point substraction when comparing the
load against down_threshold but did not remove the related limit for the
down_threshold value.  As a result, down_threshold lower than 11 is not
allowed even though values from 1 to 10 do work correctly too. The
comment ("cannot be lower than 11 otherwise freq will not fall") also
does not apply after removing the substraction.

For this reason, allow down_threshold to take any value from 1 to 99
and fix the related comment.

Fixes: 27ed3cd2eb (cpufreq: conservative: Fix the logic in frequency decrease checking)
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-12 14:28:07 +02:00
Rafael J. Wysocki fbd78afe34 Merge branches 'intel_pstate' and 'pm-sleep'
* intel_pstate:
  cpufreq: intel_pstate: Avoid division by 0 in min_perf_pct_min()

* pm-sleep:
  Revert "ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle"
2017-06-09 01:25:16 +02:00
Rafael J. Wysocki 57caf4ec2b cpufreq: intel_pstate: Avoid division by 0 in min_perf_pct_min()
Commit c5a2ee7dde (cpufreq: intel_pstate: Active mode P-state
limits rework) incorrectly assumed that pstate.turbo_pstate would
always be nonzero for CPU0 in min_perf_pct_min() if
cpufreq_register_driver() had succeeded which may not be the case
in virtualized environments.

If that assumption doesn't hold, it leads to an early crash on boot
in intel_pstate_register_driver(), so add a sanity check to
min_perf_pct_min() to prevent the crash from happening.

Fixes: c5a2ee7dde (cpufreq: intel_pstate: Active mode P-state limits rework)
Reported-and-tested-by: Jongman Heo <jongman.heo@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-05 14:51:18 +02:00
Sudeep Holla c0f2e21953 cpufreq: scpi: use new scpi_ops functions to remove duplicate code
scpi_ops now provide APIs to get the transition_latency and to add
OPPs to the devices making those logic redundant here.

This patch makes use of those APIs and removes the redundant code in
this driver.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2017-06-05 11:14:35 +01:00
Rafael J. Wysocki bb5710e72c Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: kirkwood-cpufreq:- Handle return value of clk_prepare_enable()
  cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
2017-06-03 00:01:45 +02:00
Arvind Yadav 7575f82572 cpufreq: kirkwood-cpufreq:- Handle return value of clk_prepare_enable()
clk_prepare_enable() can fail here and we must check its return value.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-05-30 00:09:41 +02:00
David Arcari 6c77003677 cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
For a driver that does not set the CPUFREQ_STICKY flag, if all of the
->init() calls fail, cpufreq_register_driver() should return an error.
This will prevent the driver from loading.

Fixes: ce1bcfe94d (cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs)
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-05-30 00:07:20 +02:00
Viresh Kumar 55d8529313 cpufreq: create cpufreq_table_count_valid_entries()
We need such a routine at two places already, lets create one.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:28 -07:00
Viresh Kumar 4d753aa7b6 thermal: cpu_cooling: use cpufreq_policy to register cooling device
The CPU cooling driver uses the cpufreq policy, to get clip_cpus, the
frequency table, etc. Most of the callers of CPU cooling driver's
registration routines have the cpufreq policy with them, but they only
pass the policy->related_cpus cpumask. The __cpufreq_cooling_register()
routine then gets the policy by itself and uses it.

It would be much better if the callers can pass the policy instead
directly. This also fixes a basic design flaw, where the policy can be
freed while the CPU cooling driver is still active.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:24 -07:00
Sebastian Andrzej Siewior a92551e41d cpufreq: Use cpuhp_setup_state_nocalls_cpuslocked()
cpufreq holds get_online_cpus() while invoking cpuhp_setup_state_nocalls()
to make subsys_interface_register() and the registration of hotplug calls
atomic versus cpu hotplug.

cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is
correct, but prevents the conversion of the hotplug locking to a percpu
rwsem.

Use cpuhp_setup/remove_state_nocalls_cpuslocked() to avoid the nested
call. Convert *_online_cpus() to the new interfaces while at it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170524081547.731628408@linutronix.de
2017-05-26 10:10:38 +02:00
Thomas Gleixner d04e31a23c cpufreq/pasemi: Adjust system_state check
To enable smp_processor_id() and might_sleep() debug checks earlier, it's
required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

Adjust the system_state check in pas_cpufreq_cpu_exit() to handle the extra
states.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170516184735.620023128@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-23 10:01:36 +02:00
Rafael J. Wysocki 079c1812a2 Merge branches 'intel_pstate', 'pm-cpufreq' and 'pm-cpufreq-sched'
* intel_pstate:
  cpufreq: intel_pstate: Document the current behavior and user interface

* pm-cpufreq:
  cpufreq: dbx500: add a Kconfig symbol

* pm-cpufreq-sched:
  cpufreq: schedutil: use now as reference when aggregating shared policy requests
2017-05-22 20:28:22 +02:00
Rafael J. Wysocki a32f80b30d Merge branch 'utilities' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull power management utilities updates from Len Brown.

* 'utilities' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  intel_pstate: use updated msr-index.h HWP.EPP values
  tools/power x86_energy_perf_policy: support HWP.EPP
  x86: msr-index.h: fix shifts to ULL results in HWP macros.
  x86: msr-index.h: define HWP.EPP values
  x86: msr-index.h: define EPB mid-points
2017-05-16 03:15:27 +02:00
Arnd Bergmann be0408d74d cpufreq: dbx500: add a Kconfig symbol
Moving the cooling code into the cpufreq driver caused a possible build failure
when the cpu_thermal helper code is a loadable module or disabled:

drivers/cpufreq/dbx500-cpufreq.o: In function `dbx500_cpufreq_ready':
dbx500-cpufreq.c:(.text.dbx500_cpufreq_ready+0x4): undefined reference to `cpufreq_cooling_register'

This adds the same dependency that we have in other cpufreq drivers,
forcing the driver to be disabled when we can't possibly link it.

Fixes: 19678ffb9f (cpufreq: dbx500: Manage cooling device from cpufreq driver)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-05-14 13:40:16 +02:00
Linus Torvalds ac3c4aa248 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from James Hogan:
 "math-emu:
   - Add missing clearing of BLTZALL and BGEZALL emulation counters
   - Fix BC1EQZ and BC1NEZ condition handling
   - Fix BLEZL and BGTZL identification

  BPF:
   - Add JIT support for SKF_AD_HATYPE
   - Use unsigned access for unsigned SKB fields
   - Quit clobbering callee saved registers in JIT code
   - Fix multiple problems in JIT skb access helpers

  Loongson 3:
   - Select MIPS_L1_CACHE_SHIFT_6

  Octeon:
   - Remove vestiges of CONFIG_CAVIUM_OCTEON_2ND_KERNEL
   - Remove unused L2C types and macros.
   - Remove unused SLI types and macros.
   - Fix compile error when USB is not enabled.
   - Octeon: Remove unused PCIERCX types and macros.
   - Octeon: Clean up platform code.

  SNI:
   - Remove recursive include of cpu-feature-overrides.h

  Sibyte:
   - Export symbol periph_rev to sb1250-mac network driver.
   - Fix Kconfig warning.

  Generic platform:
   - Enable Root FS on NFS in generic_defconfig

  SMP-MT:
   - Use CPU interrupt controller IPI IRQ domain support

  UASM:
   - Add support for LHU for uasm.
   - Remove needless ISA abstraction

  mm:
   - Add 48-bit VA space and 4-level page tables for 4K pages.

  PCI:
   - Add controllers before the specified head

  irqchip driver for MIPS CPU:
   - Replace magic 0x100 with IE_SW0
   - Prepare for non-legacy IRQ domains
   - Introduce IPI IRQ domain support

  MAINTAINERS:
   - Update email-id of Rahul Bedarkar

  NET:
   - sb1250-mac: Add missing MODULE_LICENSE()

  CPUFREQ:
   - Loongson2: drop set_cpus_allowed_ptr()

  Misc:
   - Disable Werror when W= is set
   - Opt into HAVE_COPY_THREAD_TLS
   - Enable GENERIC_CPU_AUTOPROBE
   - Use common outgoing-CPU-notification code
   - Remove dead define of ST_OFF
   - Remove CONFIG_ARCH_HAS_ILOG2_U{32,64}
   - Stengthen IPI IRQ domain sanity check
   - Remove confusing else statement in __do_page_fault()
   - Don't unnecessarily include kmalloc.h into <asm/cache.h>.
   - Delete unused definition of SMP_CACHE_SHIFT.
   - Delete redundant definition of SMP_CACHE_BYTES"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (39 commits)
  MIPS: Sibyte: Fix Kconfig warning.
  MIPS: Sibyte: Export symbol periph_rev to sb1250-mac network driver.
  NET: sb1250-mac: Add missing MODULE_LICENSE()
  MAINTAINERS: Update email-id of Rahul Bedarkar
  MIPS: Remove confusing else statement in __do_page_fault()
  MIPS: Stengthen IPI IRQ domain sanity check
  MIPS: smp-mt: Use CPU interrupt controller IPI IRQ domain support
  irqchip: mips-cpu: Introduce IPI IRQ domain support
  irqchip: mips-cpu: Prepare for non-legacy IRQ domains
  irqchip: mips-cpu: Replace magic 0x100 with IE_SW0
  MIPS: Remove CONFIG_ARCH_HAS_ILOG2_U{32,64}
  MIPS: generic: Enable Root FS on NFS in generic_defconfig
  MIPS: mach-rm: Remove recursive include of cpu-feature-overrides.h
  MIPS: Opt into HAVE_COPY_THREAD_TLS
  CPUFREQ: Loongson2: drop set_cpus_allowed_ptr()
  MIPS: uasm: Remove needless ISA abstraction
  MIPS: Remove dead define of ST_OFF
  MIPS: Use common outgoing-CPU-notification code
  MIPS: math-emu: Fix BC1EQZ and BC1NEZ condition handling
  MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters
  ...
2017-05-12 09:56:30 -07:00
Len Brown 3cedbc5a6d intel_pstate: use updated msr-index.h HWP.EPP values
intel_pstate exports sysfs attributes for setting and observing HWP.EPP.
These attributes use strings to describe 4 operating states, and
inside the driver, these strings are mapped to numerical register
values.

The authorative mapping between the strings and numerical HWP.EPP values
are now globally defined in msr-index.h, replacing the out-dated
mapping that were open-coded into intel_pstate.c

new old string
--- --- ------
  0   0 performance
128  64 balance_performance
192 128 balance_power
255 192 power

Note that the HW and BIOS default value on most system is 128,
which intel_pstate will now call "balance_performance"
while it used to call it "balance_power".

Signed-off-by: Len Brown <len.brown@intel.com>
2017-05-11 21:27:53 -04:00
Linus Torvalds 291b38a756 Annotation of module parameters that specify device settings
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWPiW6vSw1s6N8H32AQLOrw/+NTqGf7bjq+64YKS6NfR0XDgE+wNJltGO
 ck7zJW3NHIg76RNu8s0I9xg5aVmwizz3Z5DGROZquaolnezux4tQihZ3AFyxIzLc
 +Y3WHYagcML7yFfjl/WznCLRD5EW3yPln4lCvQO0nW/xICRYeRI057JaIbi2Dtek
 BhcXt3c4AjXDLdYJkgtHV3p2R2mt8hcdFdWqqx6s7JaIThZNRGNzxAgtbcB9k5IW
 HVG9ZEIL73VBYWHrYivzjHYF5rBnNCPt87eOwDQeTOSkhv8te+u9k+bH8vxZw1T0
 XUtDrLBndKiuVo2GUfLkkF8LItx3Q9eLCJYy0joaIliyPqTEsPx9KjQ+Af0cxS9s
 ZPCZ5SYf96stKmDeL5xaMfrAmeyVHJ4lc4JTOqdzbIT8blsOSfYO/03p0ALShSDv
 /RQLaKGlf8Bjoy8PwKFcXb4sIDufcd/U1Av/EMFXxOfgN/u2JUkGKq6EaIM5B68L
 fHPje+aR9VNELPmPjwNOWtmN4I79EH3EItQf7zv0KG+UeKhcHLx/EAcSJ3ZRKEkH
 Lathg7pPOEJGArPiVO79TZzBG01ADn1aiwv65XObMzNZ+54xI/mN/Y1DNF/kL5jU
 XzvNzEjFt8mwMIZGVNdAt4+pDyMfIZGZSyUkSRKFnaQZMIvQrfQIU9RLBYLX5eOx
 +/p0VkIwDpg=
 =lbS7
 -----END PGP SIGNATURE-----

Merge tag 'hwparam-20170420' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull hw lockdown support from David Howells:
 "Annotation of module parameters that configure hardware resources
  including ioports, iomem addresses, irq lines and dma channels.

  This allows a future patch to prohibit the use of such module
  parameters to prevent that hardware from being abused to gain access
  to the running kernel image as part of locking the kernel down under
  UEFI secure boot conditions.

  Annotations are made by changing:

        module_param(n, t, p)
        module_param_named(n, v, t, p)
        module_param_array(n, t, m, p)

  to:

        module_param_hw(n, t, hwtype, p)
        module_param_hw_named(n, v, t, hwtype, p)
        module_param_hw_array(n, t, hwtype, m, p)

  where the module parameter refers to a hardware setting

  hwtype specifies the type of the resource being configured. This can
  be one of:

        ioport          Module parameter configures an I/O port
        iomem           Module parameter configures an I/O mem address
        ioport_or_iomem Module parameter could be either (runtime set)
        irq             Module parameter configures an I/O port
        dma             Module parameter configures a DMA channel
        dma_addr        Module parameter configures a DMA buffer address
        other           Module parameter configures some other value

  Note that the hwtype is compile checked, but not currently stored (the
  lockdown code probably won't require it). It is, however, there for
  future use.

  A bonus is that the hwtype can also be used for grepping.

  The intention is for the kernel to ignore or reject attempts to set
  annotated module parameters if lockdown is enabled. This applies to
  options passed on the boot command line, passed to insmod/modprobe or
  direct twiddling in /sys/module/ parameter files.

  The module initialisation then needs to handle the parameter not being
  set, by (1) giving an error, (2) probing for a value or (3) using a
  reasonable default.

  What I can't do is just reject a module out of hand because it may
  take a hardware setting in the module parameters. Some important
  modules, some ipmi stuff for instance, both probe for hardware and
  allow hardware to be manually specified; if the driver is aborts with
  any error, you don't get any ipmi hardware.

  Further, trying to do this entirely in the module initialisation code
  doesn't protect against sysfs twiddling.

  [!] Note that in and of itself, this series of patches should have no
      effect on the the size of the kernel or code execution - that is
      left to a patch in the next series to effect. It does mark
      annotated kernel parameters with a KERNEL_PARAM_FL_HWPARAM flag in
      an already existing field"

* tag 'hwparam-20170420' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (38 commits)
  Annotate hardware config module parameters in sound/pci/
  Annotate hardware config module parameters in sound/oss/
  Annotate hardware config module parameters in sound/isa/
  Annotate hardware config module parameters in sound/drivers/
  Annotate hardware config module parameters in fs/pstore/
  Annotate hardware config module parameters in drivers/watchdog/
  Annotate hardware config module parameters in drivers/video/
  Annotate hardware config module parameters in drivers/tty/
  Annotate hardware config module parameters in drivers/staging/vme/
  Annotate hardware config module parameters in drivers/staging/speakup/
  Annotate hardware config module parameters in drivers/staging/media/
  Annotate hardware config module parameters in drivers/scsi/
  Annotate hardware config module parameters in drivers/pcmcia/
  Annotate hardware config module parameters in drivers/pci/hotplug/
  Annotate hardware config module parameters in drivers/parport/
  Annotate hardware config module parameters in drivers/net/wireless/
  Annotate hardware config module parameters in drivers/net/wan/
  Annotate hardware config module parameters in drivers/net/irda/
  Annotate hardware config module parameters in drivers/net/hamradio/
  Annotate hardware config module parameters in drivers/net/ethernet/
  ...
2017-05-10 19:13:03 -07:00
Kees Cook 063246641d format-security: move static strings to const
While examining output from trial builds with -Wformat-security enabled,
many strings were found that should be defined as "const", or as a char
array instead of char pointer.  This makes some static analysis easier,
by producing fewer false positives.

As these are all trivial changes, it seemed best to put them all in a
single patch rather than chopping them up per maintainer.

Link: http://lkml.kernel.org/r/20170405214711.GA5711@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jes Sorensen <jes@trained-monkey.org>	[runner.c]
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Patrice Chotard <patrice.chotard@st.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kejian Yan <yankejian@huawei.com>
Cc: Daode Huang <huangdaode@hisilicon.com>
Cc: Qianqian Xie <xieqianqian@huawei.com>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Christian Gromm <christian.gromm@microchip.com>
Cc: Andrey Shvetsov <andrey.shvetsov@k2l.de>
Cc: Jason Litzinger <jlitzingerdev@gmail.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Stephen Boyd ad61dd303a scripts/spelling.txt: add regsiter -> register spelling mistake
This typo is quite common.  Fix it and add it to the spelling file so
that checkpatch catches it earlier.

Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Linus Torvalds 3527d3e951 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - another round of rq-clock handling debugging, robustization and
     fixes

   - PELT accounting improvements

   - CPU hotplug related ->cpus_allowed affinity handling fixes all
     around the tree

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  sched/x86: Update reschedule warning text
  crypto: N2 - Replace racy task affinity logic
  cpufreq/sparc-us2e: Replace racy task affinity logic
  cpufreq/sparc-us3: Replace racy task affinity logic
  cpufreq/sh: Replace racy task affinity logic
  cpufreq/ia64: Replace racy task affinity logic
  ACPI/processor: Replace racy task affinity logic
  ACPI/processor: Fix error handling in __acpi_processor_start()
  sparc/sysfs: Replace racy task affinity logic
  powerpc/smp: Replace open coded task affinity logic
  ia64/sn/hwperf: Replace racy task affinity logic
  ia64/salinfo: Replace racy task affinity logic
  workqueue: Provide work_on_cpu_safe()
  ia64/topology: Remove cpus_allowed manipulation
  sched/fair: Move the PELT constants into a generated header
  sched/fair: Increase PELT accuracy for small tasks
  sched/fair: Fix comments
  sched/Documentation: Add 'sched-pelt' tool
  sched/fair: Fix corner case in __accumulate_sum()
  sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()
  ...
2017-05-01 19:12:53 -07:00
Rafael J. Wysocki 2addac72af Merge schedutil governor updates for v4.12. 2017-04-28 23:13:33 +02:00
Rafael J. Wysocki 2dee4b0e0b Merge intel_pstate driver updates for v4.12. 2017-04-28 23:13:04 +02:00
David Howells 40059ec670 Annotate hardware config module parameters in drivers/cpufreq/
When the kernel is running in secure boot mode, we lock down the kernel to
prevent userspace from modifying the running kernel image.  Whilst this
includes prohibiting access to things like /dev/mem, it must also prevent
access by means of configuring driver modules in such a way as to cause a
device to access or modify the kernel image.

To this end, annotate module_param* statements that refer to hardware
configuration and indicate for future reference what type of parameter they
specify.  The parameter parser in the core sees this information and can
skip such parameters with an error message if the kernel is locked down.
The module initialisation then runs as normal, but just sees whatever the
default values for those parameters is.

Note that we do still need to do the module initialisation because some
drivers have viable defaults set in case parameters aren't specified and
some drivers support automatic configuration (e.g. PNP or PCI) in addition
to manually coded parameters.

This patch annotates drivers in drivers/cpufreq/.

Suggested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
cc: linux-pm@vger.kernel.org
2017-04-20 12:02:32 +01:00
Mikko Perttunen 939dc6f51e cpufreq: Add Tegra186 cpufreq driver
Add a new cpufreq driver for Tegra186 (and likely later).
The CPUs are organized into two clusters, Denver and A57,
with two and four cores respectively. CPU frequency can be
adjusted by writing the desired rate divisor and a voltage
hint to a special per-core register.

The frequency of each core can be set individually; however,
this is just a hint as all CPUs in a cluster will run at
the maximum rate of non-idle CPUs in the cluster.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-04-19 23:23:08 +02:00
Christophe Jaillet eafca85163 cpufreq: imx6q: Fix error handling code
According to the previous error handling code, it is likely that
'goto out_free_opp' is expected here in order to avoid a memory leak in
error handling path.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-04-19 23:22:01 +02:00
Leonard Crestez 5aa1599ff0 cpufreq: imx6q: Set max suspend_freq to avoid changes during suspend
If the cpufreq driver tries to modify voltage/freq during suspend/resume
it might need to control an external PMIC via I2C or SPI but those
devices might be already suspended. This issue is likely to happen
whenever the LDOs have their vin-supply set.

To avoid this scenario we just increase cpufreq to the maximum before
suspend.

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-04-19 23:22:01 +02:00
Irina Tirdea 54cad2fce7 cpufreq: imx6q: Fix handling EPROBE_DEFER from regulator
If there are any errors in getting the cpu0 regulators, the driver returns
-ENOENT. In case the regulators are not yet available, the devm_regulator_get
calls will return -EPROBE_DEFER, so that the driver can be probed later.
If we return -ENOENT, the driver will fail its initialization and will
not try to probe again (when the regulators become available).

Return the actual error received from regulator_get in probe. Print a
differentiated message in case we need to probe the device later and
in case we actually failed. Also add a message to inform when the
driver has been successfully registered.

Signed-off-by: Irina Tirdea <irina.tirdea@nxp.com>
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-04-19 23:22:00 +02:00
Rafael J. Wysocki 1b72e7fd30 cpufreq: schedutil: Use policy-dependent transition delays
Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).

That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.

Make intel_pstate set transition_delay_us to 500.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-04-17 18:37:27 +02:00
Rafael J. Wysocki 5ed8a1c19d Merge branch 'intel_pstate' into pm-cpufreq-sched 2017-04-17 01:13:02 +02:00
Thomas Gleixner 12699ac53a cpufreq/sparc-us2e: Replace racy task affinity logic
The access to the HBIRD_ESTAR_MODE register in the cpu frequency control
functions must happen on the target CPU. This is achieved by temporarily
setting the affinity of the calling user space thread to the requested CPU
and reset it to the original affinity afterwards.

That's racy vs. CPU hotplug and concurrent affinity settings for that
thread resulting in code executing on the wrong CPU and overwriting the
new affinity setting.

Replace it by a straight forward smp function call. 

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: linux-pm@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704131020280.2408@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-15 12:20:56 +02:00
Thomas Gleixner 9fe24c4e92 cpufreq/sparc-us3: Replace racy task affinity logic
The access to the safari config register in the CPU frequency functions
must be executed on the target CPU. This is achieved by temporarily setting
the affinity of the calling user space thread to the requested CPU and
reset it to the original affinity afterwards.

That's racy vs. CPU hotplug and concurrent affinity settings for that
thread resulting in code executing on the wrong CPU and overwriting the
new affinity setting.

Replace it by a straight forward smp function call. 

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: linux-pm@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170412201043.047558840@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-15 12:20:55 +02:00
Thomas Gleixner 205dcc1ecb cpufreq/sh: Replace racy task affinity logic
The target() callback must run on the affected cpu. This is achieved by
temporarily setting the affinity of the calling thread to the requested CPU
and reset it to the original affinity afterwards.

That's racy vs. concurrent affinity settings for that thread resulting in
code executing on the wrong CPU.

Replace it by work_on_cpu(). All call pathes which invoke the callbacks are
already protected against CPU hotplug.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: linux-pm@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170412201042.958216363@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-15 12:20:55 +02:00
Thomas Gleixner 38f05ed04b cpufreq/ia64: Replace racy task affinity logic
The get() and target() callbacks must run on the affected cpu. This is
achieved by temporarily setting the affinity of the calling thread to the
requested CPU and reset it to the original affinity afterwards.

That's racy vs. concurrent affinity settings for that thread resulting in
code executing on the wrong CPU and overwriting the new affinity setting.

Replace it by work_on_cpu(). All call pathes which invoke the callbacks are
already protected against CPU hotplug.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: linux-pm@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704122231100.2548@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-15 12:20:55 +02:00
Rafael J. Wysocki c97ad0fc4f Merge back cpufreq core changes for v4.12. 2017-04-15 00:23:36 +02:00
Chen Yu c4a3fa261b cpufreq: Bring CPUs up even if cpufreq_online() failed
There is a report that after commit 27622b061e ("cpufreq: Convert
to hotplug state machine"), the normal CPU offline/online cycle
fails on some platforms.

According to the ftrace result, this problem was triggered on
platforms using acpi-cpufreq as the default cpufreq driver,
and due to the lack of some ACPI freq method (eg. _PCT),
cpufreq_online() failed and returned a negative value, so the CPU
hotplug state machine rolled back the CPU online process.  Actually,
from the user's perspective, the failure of cpufreq_online() should
not prevent that CPU from being brought up, although cpufreq might
not work on that CPU.

BTW, during system startup cpufreq_online() is not invoked via CPU
online but by the cpufreq device creation process, so the APs can be
brought up even though cpufreq_online() fails in that stage.

This patch ignores the return value of cpufreq_online/offline() and
lets the cpufreq framework deal with the failure.  cpufreq_online()
itself will do a proper rollback in that case and if _PCT is missing,
the ACPI cpufreq driver will print a warning if the corresponding
debug options have been enabled.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581
Fixes: 27622b061e ("cpufreq: Convert to hotplug state machine")
Reported-and-tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-04-13 03:38:44 +02:00
Sebastian Andrzej Siewior 759f534e93 CPUFREQ: Loongson2: drop set_cpus_allowed_ptr()
It is pure mystery to me why we need to be on a specific CPU while
looking up a value in an array.
My best shot at this is that before commit d4019f0a92 ("cpufreq: move
freq change notifications to cpufreq core") it was required to invoke
cpufreq_notify_transition() on a special CPU.

Since it looks like a waste, remove it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: tglx@linutronix.de
Cc: linux-pm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15888/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-04-12 13:52:21 +02:00
Rafael J. Wysocki 69a07f1803 Merge back cpufreq changes for v4.12. 2017-04-06 01:27:21 +02:00
Rafael J. Wysocki 46e1d5e972 Merge branches 'pm-cpufreq-fixes' and 'pm-cpuidle-fixes'
* pm-cpufreq-fixes:
  cpufreq: Fix creation of symbolic links to policy directories

* pm-cpuidle-fixes:
  cpuidle: powernv: Pass correct drv->cpumask for registration
2017-03-31 23:00:53 +02:00
Box, David E 630e57573e cpufreq: intel_pstate: Add support for Gemini Lake
Use same parameters as INTEL_FAM6_ATOM_GOLDMONT to enable
Gemini Lake.

Signed-off-by: Box, David E <david.e.box@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29 22:45:32 +02:00
Rafael J. Wysocki b02aabe8ab cpufreq: intel_pstate: Eliminate intel_pstate_get_min_max()
Some computations in intel_pstate_get_min_max() are not necessary
and one of its two callers doesn't even use the full result.

First off, the fixed-point value of cpu->max_perf represents a
non-negative number between 0 and 1 inclusive and cpu->min_perf
cannot be greater than cpu->max_perf.  It is not necessary to check
those conditions every time the numbers in question are used.

Moreover, since intel_pstate_max_within_limits() only needs the
upper boundary, it doesn't make sense to compute the lower one in
there and returning min and max from intel_pstate_get_min_max()
via pointers doesn't look particularly nice.

For the above reasons, drop intel_pstate_get_min_max(), add a helper
to get the base P-state for min/max computations and carry out them
directly in the previous callers of intel_pstate_get_min_max().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:17 +02:00
Rafael J. Wysocki 2bfc4cbb5f cpufreq: intel_pstate: Do not walk policy->cpus
intel_pstate_hwp_set() is the only function walking policy->cpus
in intel_pstate.  The rest of the code simply assumes one CPU per
policy, including the initialization code.

Therefore it doesn't make sense for intel_pstate_hwp_set() to
walk policy->cpus as it is guaranteed to have only one bit set
for policy->cpu.

For this reason, rearrange intel_pstate_hwp_set() to take the CPU
number as the argument and drop the loop over policy->cpus from it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:16 +02:00
Rafael J. Wysocki 8ca6ce3701 cpufreq: intel_pstate: Introduce pid_in_use()
Add a new function pid_in_use() to return the information on whether
or not the PID-based P-state selection algorithm is in use.

That allows a couple of complicated conditions in the code to be
reduced to simple checks against the new function's return value.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:16 +02:00
Rafael J. Wysocki 2f49afc2a6 cpufreq: intel_pstate: Drop struct cpu_defaults
The cpu_defaults structure is redundant, because it only contains
one member of type struct pstate_funcs which can be used directly
instead of struct cpu_defaults.

For this reason, drop struct cpu_defaults, use struct pstate_funcs
directly instead of it where applicable and rename all of the
variables of that type accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:16 +02:00
Rafael J. Wysocki de4a76cb58 cpufreq: intel_pstate: Move cpu_defaults definitions
Move the definitions of the cpu_defaults structures after the
definitions of utilization update callback routines to avoid
extra declarations of the latter.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:16 +02:00
Rafael J. Wysocki 67dd9bf441 cpufreq: intel_pstate: Add update_util callback to pstate_funcs
Avoid using extra function pointers during P-state selection by
dropping the get_target_pstate member from struct pstate_funcs,
adding a new update_util callback to it (to be registered with
the CPU scheduler as the utilization update callback in the active
mode) and reworking the utilization update callback routines to
invoke specific P-state selection functions directly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:16 +02:00
Rafael J. Wysocki eabd22c657 cpufreq: intel_pstate: Use different utilization update callbacks
Notice that some overhead in the utilization update callbacks
registered by intel_pstate in the active mode can be avoided if
those callbacks are tailored to specific configurations of the
driver.  For example, the utilization update callback for the HWP
enabled case only needs to update the average CPU performance
periodically whereas the utilization update callback for the
PID-based algorithm does not need to take IO-wait boosting into
account and so on.

With that in mind, define three utilization update callbacks for
three different use cases: HWP enabled, the CPU load "powersave"
P-state selection algorithm and the PID-based "powersave" P-state
selection algorithm and modify the driver initialization to
choose the callback matching its current configuration.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:15 +02:00
Rafael J. Wysocki 0042b2c069 cpufreq: intel_pstate: Modify check in intel_pstate_update_status()
One of the checks in intel_pstate_update_status() implicitly relies
on the information that there are only two struct cpufreq_driver
objects available, but it is better to do it directly against the
value it really is about (to make the code easier to follow if
nothing else).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:15 +02:00
Rafael J. Wysocki ee8df89a68 cpufreq: intel_pstate: Drop driver_registered variable
The driver_registered variable in intel_pstate is used for checking
whether or not the driver has been registered, but intel_pstate_driver
can be used for that too (with the rule that the driver is not
registered as long as it is NULL).

That is a bit more straightforward and the code may be simplified
a bit this way, so modify the driver accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:15 +02:00
Rafael J. Wysocki 694cb17347 cpufreq: intel_pstate: Skip unnecessary PID resets on init
PID controller parameters only need to be initialized if the
get_target_pstate_use_performance() P-state selection routine
is going to be used.  It is not necessary to initialize them
otherwise, so don't do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:14 +02:00
Rafael J. Wysocki 7aec5b50e9 cpufreq: intel_pstate: Set HWP sampling interval once
In the HWP enabled case pid_params.sample_rate_ns only needs to be
updated once, because it is global, so do that when setting hwp_active
instead of doing it during the initialization of every CPU.

Moreover, pid_params.sample_rate_ms is never used if HWP is enabled,
so do not update it at all then.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:14 +02:00
Rafael J. Wysocki ff35f02ea1 cpufreq: intel_pstate: Clean up intel_pstate_busy_pid_reset()
intel_pstate_busy_pid_reset() is the only caller of pid_reset(),
pid_p_gain_set(), pid_i_gain_set(), and pid_d_gain_set().  Moreover,
it passes constants as two parameters of pid_reset() and all of
the other routines above essentially contain the same code, so
fold all of them into the caller and drop unnecessary computations.

Introduce percent_fp() for converting integer values in percent
to fixed-point fractions and use it in the above code cleanup.

Finally, rename intel_pstate_busy_pid_reset() to
intel_pstate_pid_reset() as it also is used for the
initialization of PID parameters for every CPU and the
meaning of the "busy" part of the name is not particularly
clear.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:11 +02:00
Rafael J. Wysocki 4ddd0146c7 cpufreq: intel_pstate: Fold intel_pstate_reset_all_pid() into the caller
There is only one caller of intel_pstate_reset_all_pid(), which is
pid_param_set() used in the debugfs interface only, and having that
code split does not make it particularly convenient to follow.

For this reason, move the body of intel_pstate_reset_all_pid() into
its caller and drop that function.

Also change the loop from for_each_online_cpu() (which is obviously
racy with respect to CPU offline/online) to for_each_possible_cpu(),
so that all PID parameters are reset for all CPUs regardless of their
online/offline status (to prevent, for example, a previously offline
CPU from going online with a stale set of PID parameters).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:09 +02:00
Rafael J. Wysocki 5c43905369 cpufreq: intel_pstate: Initialize pid_params statically
Notice that both the existing struct cpu_defaults instances in which
PID parameters are actually initialized use the same values of those
parameters, so it is not really necessary to copy them over to
pid_params dynamically.

Instead, initialize pid_params statically with those values and
drop the unused pid_policy member from struct cpu_defaults along
with copy_pid_params() used for initializing it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:08 +02:00
Rafael J. Wysocki 6404367862 cpufreq: intel_pstate: Drop pointless initialization of PID parameters
The P-state selection algorithm used by intel_pstate for Atom
processors is not based on the PID controller and the initialization
of PID parametrs for those processors is pointless and confusing, so
drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:07 +02:00
Rafael J. Wysocki e14cf8857e cpufreq: intel_pstate: Eliminate struct perf_limits
After recent changes the purpose of struct perf_limits is not
particularly clear any more and the code may be made somewhat
easier to follow by eliminating it, so go for that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:07 +02:00
Rafael J. Wysocki 2f0ba790df cpufreq: Fix creation of symbolic links to policy directories
The cpufreq core only tries to create symbolic links from CPU
directories in sysfs to policy directories in cpufreq_add_dev(),
either when a given CPU is registered or when the cpufreq driver
is registered, whichever happens first.  That is not sufficient,
however, because cpufreq_add_dev() may be called for an offline CPU
whose policy object has not been created yet and, quite obviously,
the symbolic cannot be added in that case.

Fix that by making cpufreq_online() attempt to add symbolic links to
policy objects for the CPUs in the related_cpus mask of every new
policy object created by it.

The cpufreq_driver_lock locking around the for_each_cpu() loop
in cpufreq_online() is dropped, because it is not necessary and the
code is somewhat simpler without it.  Moreover, failures to create
a symbolic link will not be regarded as hard errors any more and
the CPUs without those links will not be taken offline automatically,
but that should not be problematic in practice.

Reported-and-tested-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
2017-03-27 19:33:09 +02:00
Rafael J. Wysocki 80b120ca1a cpufreq: intel_pstate: Avoid transient updates of cpuinfo.max_freq
Both intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
set policy->cpuinfo.max_freq depending on the turbo status, but the
updates made by them are discarded by the core, because the policy
object passed to them by the core is temporary and cpuinfo.max_freq
from that object is not copied to the final policy object in
cpufreq_set_policy().

However, cpufreq_set_policy() passes the temporary policy object
to the ->setpolicy callback of the driver, so intel_pstate_set_policy()
actually sees the policy->cpuinfo.max_freq value updated by
intel_pstate_verify_policy() and not the final one.  It also
updates policy->max sometimes which basically has no effect after
it returns, because the core discards that update.

To avoid confusion, eliminate policy->cpuinfo.max_freq updates from
intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
entirely and check the maximum frequency explicitly in
intel_pstate_update_perf_limits() instead of relying on the
transiently updated policy->cpuinfo.max_freq value.

Moreover, move the max->policy adjustment carried out in
intel_pstate_set_policy() to a separate function and call that
function from the ->verify driver callbacks to ensure that it will
actually be effective.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24 03:04:32 +01:00
Rafael J. Wysocki c5a2ee7dde cpufreq: intel_pstate: Active mode P-state limits rework
The coordination of P-state limits used by intel_pstate in the active
mode (ie. by default) is problematic, because it synchronizes all of
the limits (ie. the global ones and the per-policy ones) so as to use
one common pair of P-state limits (min and max) across all CPUs in
the system.  The drawbacks of that are as follows:

 - If P-states are coordinated in hardware, it is not necessary
   to coordinate them in software on top of that, so in that case
   all of the above activity is in vain.

 - If P-states are not coordinated in hardware, then the processor
   is actually capable of setting different P-states for different
   CPUs and coordinating them at the software level simply doesn't
   allow that capability to be utilized.

 - The coordination works in such a way that setting a per-policy
   limit (eg. scaling_max_freq) for one CPU causes the common
   effective limit to change (and it will affect all of the other
   CPUs too), but subsequent reads from the corresponding sysfs
   attributes for the other CPUs will return stale values (which
   is confusing).

 - Reads from the global P-state limit attributes, min_perf_pct and
   max_perf_pct, return the effective common values and not the last
   values set through these attributes.  However, the last values
   set through these attributes become hard limits that cannot be
   exceeded by writes to scaling_min_freq and scaling_max_freq,
   respectively, and they are not exposed, so essentially users
   have to remember what they are.

All of that is painful enough to warrant a change of the management
of P-state limits in the active mode.

To that end, redesign the active mode P-state limits management in
intel_pstate in accordance with the following rules:

 (1) All CPUs are affected by the global limits (that is, none of
     them can be requested to run faster than the global max and
     none of them can be requested to run slower than the global
     min).

 (2) Each individual CPU is affected by its own per-policy limits
     (that is, it cannot be requested to run faster than its own
     per-policy max and it cannot be requested to run slower than
     its own per-policy min).

 (3) The global and per-policy limits can be set independently.

Also, the global maximum and minimum P-state limits will be always
expressed as percentages of the maximum supported turbo P-state.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24 03:04:31 +01:00
Rafael J. Wysocki 553953453b cpufreq: intel_pstate: Use load-based P-state selection more widely
Extend the set of systems for which intel_pstate will use the
"powersave" P-state selection algorithm based on CPU load in the
active mode by systems with ACPI preferred profile set to "tablet",
"appliance PC", "desktop", or "workstation" (ie. everything with a
specified preferred profile that is not a "server").

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24 03:04:31 +01:00
Rafael J. Wysocki eb5139d1a2 cpufreq: intel_pstate: Support HWP processors in all operation modes
Currently, some processors supporting HWP are only supported by
intel_pstate if HWP is actually going to be used and not supported
otherwise which is confusing.

Specifically, they are not supported if "intel_pstate=no_hwp" is
passed to the kernel in the command line or if the driver is started
in the passive mode ("intel_pstate=passive").

There is no real reason for that, because everything about those
processor is known anyway and the driver can work with them in all
modes, so make that happen, but use the load-based P-state selection
algorithm for the active mode "powersave" policy with them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24 03:04:31 +01:00
Rafael J. Wysocki f1a91645b7 Merge back intel_pstate updates for 4.12. 2017-03-24 03:04:10 +01:00
Rafael J. Wysocki 6488294e4a Merge branches 'pm-cpufreq-fixes', 'pm-cpufreq-sched-fixes' and 'intel_pstate-fixes'
* pm-cpufreq-fixes:
  cpufreq: Restore policy min/max limits on CPU online

* pm-cpufreq-sched-fixes:
  cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()

* intel_pstate-fixes:
  cpufreq: intel_pstate: Fix policy data management in passive mode
  cpufreq: intel_pstate: One set of global limits in active mode
2017-03-24 00:43:26 +01:00
Viresh Kumar ff010472fb cpufreq: Restore policy min/max limits on CPU online
On CPU online the cpufreq core restores the previous governor (or
the previous "policy" setting for ->setpolicy drivers), but it does
not restore the min/max limits at the same time, which is confusing,
inconsistent and real pain for users who set the limits and then
suspend/resume the system (using full suspend), in which case the
limits are reset on all CPUs except for the boot one.

Fix this by making cpufreq_online() restore the limits when an inactive
policy is brought online.

The commit log and patch are inspired from Rafael's earlier work.

Reported-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-22 02:38:27 +01:00
Rafael J. Wysocki 64897b20ee cpufreq: intel_pstate: Fix policy data management in passive mode
The policy->cpuinfo.max_freq and policy->max updates in
intel_cpufreq_turbo_update() are excessive as they are done for no
good reason and may lead to problems in principle, so they should be
dropped.  However, after dropping them intel_cpufreq_turbo_update()
becomes almost entirely pointless, because the check made by it is
made again down the road in intel_pstate_prepare_request().  The
only thing in it that still needs to be done is the call to
update_turbo_state(), so drop intel_cpufreq_turbo_update() altogether
and make its callers invoke update_turbo_state() directly instead of
it.

In addition to that, fix intel_cpufreq_verify_policy() so that it
checks global.no_turbo in addition to global.turbo_disabled when
updating policy->cpuinfo.max_freq to make it consistent with
intel_pstate_verify_policy().

Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-21 22:19:07 +01:00
Rafael J. Wysocki 7de32556df cpufreq: intel_pstate: One set of global limits in active mode
In the active mode intel_pstate currently uses two sets of global
limits, each associated with one of the possible scaling_governor
settings in that mode: "powersave" or "performance".

The driver switches over from one of those sets to the other
depending on the scaling_governor setting for the last CPU whose
per-policy cpufreq interface in sysfs was last used to change
parameters exposed in there.  That obviously leads to no end of
issues when the scaling_governor settings differ between CPUs.

The most recent issue was introduced by commit a240c4aa5d (cpufreq:
intel_pstate: Do not reinit performance limits in ->setpolicy)
that eliminated the reinitialization of "performance" limits in
intel_pstate_set_policy() preventing the max limit from being set
to anything below 100, among other things.

Namely, an undesirable side effect of commit a240c4aa5d is that
now, after setting scaling_governor to "performance" in the active
mode, the per-policy limits for the CPU in question go to the highest
level and stay there even when it is switched back to "powersave"
later.

As it turns out, some distributions set scaling_governor to
"performance" temporarily for all CPUs to speed-up system
initialization, so that change causes them to misbehave later.

To fix that, get rid of the performance/powersave global limits
split and use just one set of global limits for everything.

From the user's persepctive, after this modification, when
scaling_governor is switched from "performance" to "powersave"
or the other way around on one CPU, the limits settings (ie. the
global max/min_perf_pct and per-policy scaling_max/min_freq for
any CPUs) will not change.  Still, switching from "performance"
to "powersave" or the other way around changes the way in which
P-states are selected and in particular "performance" causes the
driver to always request the highest P-state it is allowed to ask
for for the given CPU.

Fixes: a240c4aa5d (cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-18 00:57:39 +01:00
Rafael J. Wysocki 8b766e05d8 Merge branches 'pm-cpufreq-fixes' and 'intel_pstate-fixes'
* pm-cpufreq-fixes:
  cpufreq: Fix and clean up show_cpuinfo_cur_freq()

* intel_pstate-fixes:
  cpufreq: intel_pstate: Avoid percentages in limits-related computations
  cpufreq: intel_pstate: Correct frequency setting in the HWP mode
  cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()
2017-03-18 00:45:09 +01:00
Viresh Kumar 19678ffb9f cpufreq: dbx500: Manage cooling device from cpufreq driver
The best place to register the CPU cooling device is from the cpufreq
driver as we would know if all the resources are already available or
not. That's what is done for the cpufreq-dt.c driver as well.

The cpu-cooling driver for dbx500 platform was just (un)registering
with the thermal framework and that can be handled easily by the cpufreq
driver as well and in proper sequence as well.

Get rid of the cooling driver and its its users and manage everything
from the cpufreq driver instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-16 00:14:31 +01:00
Rafael J. Wysocki 9b4f603e7a cpufreq: Fix and clean up show_cpuinfo_cur_freq()
There is a missing newline in show_cpuinfo_cur_freq(), so add it,
but while at it clean that function up somewhat too.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
2017-03-16 00:12:40 +01:00
Rafael J. Wysocki e4c204ced0 cpufreq: intel_pstate: Avoid percentages in limits-related computations
Currently, intel_pstate_update_perf_limits() first converts the
policy minimum and maximum limits into percentages of the maximum
turbo frequency (rounding up to an integer) and then converts these
percentages to fractions (by using fixed-point arithmetic to divide
them by 100).

That introduces a rounding error unnecessarily, because the fractions
can be obtained by carrying out fixed-point divisions directly on the
input numbers.

Rework the computations in intel_pstate_hwp_set() to use fractions
instead of percentages (and drop redundant local variables from
there) and modify intel_pstate_update_perf_limits() to compute the
fractions directly and percentages out of them.

While at it, introduce percent_ext_fp() for converting percentages
to fractions (with extended number of fraction bits) and use it in
the computations.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-15 16:52:29 +01:00
Srinivas Pandruvada 3f8ed54aee cpufreq: intel_pstate: Correct frequency setting in the HWP mode
In the functions intel_pstate_hwp_set(), min/max range from HWP capability
MSR along with max_perf_pct and min_perf_pct, is used to set the HWP
request MSR. In some cases this doesn't result in the correct HWP max/min
in HWP request.

For example: In the following case:

HWP capabilities from MSR 0x771
0x70a1220

Here cpufreq min/max frequencies from above MSR dump are 700MHz and 3.2GHz
respectively.

This will result in
hwp_min = 0x07
hwp_max = 0x20

To limit max frequency to 2GHz:

perf_limits->max_perf_pct = 63 (2GHz as a percent of 3.2GHz rounded up)

With the current calculation:
adj_range = max_perf_pct * range / 100;
adj_range = 63 * (32 - 7) / 100
adj_range = 15

max = hw_min + adj_range;
max = 7 + 15 = 22

This will result in HWP request of 0x160f, which will result in a
frequency cap of 2.2GHz not 2GHz.

The problem with the above calculation is that hwp_min of 7 is treated
as 0% in the range. But max_perf_pct is calculated with respect to minimum
as 0 and max as 3.2GHz or hwp_max, so adding hwp_min to it will result in
more than the desired.

Since the min_perf_pct and max_perf_pct is already a percent of max
frequency or hwp_max, this min/max HWP request value can be calculated
directly applying these percentage to hwp_max.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-14 03:56:39 +01:00
Rafael J. Wysocki 6e7408acd0 cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()
Fix the debugfs interface for PID tuning to actually update
pid_params.sample_rate_ns on PID parameters updates, as changing
pid_params.sample_rate_ms via debugfs has no effect now.

Fixes: a4675fbc4a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-03-13 23:55:12 +01:00
YuanTian Tang b51d3388e2 cpufreq: qoriq: enhance bus frequency calculation
On some platforms, property device-type may be missed in soc node
in dts which caused the bus-frequency can not be obtained correctly.

This patch enhanced the bus-frequency calculation. When property
device-type is missed in dts, bus-frequency will be obtained by
looking up clock table to get platform clock and hence get its
frequency.

Signed-off-by: Tang Yuantian <andy.tang@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-12 23:10:53 +01:00