HWP previously was only enabled at driver load time, on the boot
CPU, however, HWP must be enabled per package. Move the code to
enable HWP to the cpufreq driver init path so that it will be
called per CPU.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Tested-by: David Zhuang <david.zhuang@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- ACPICA update to upstream revision 20150515 including basic
support for ACPI 6 features: new ACPI tables introduced by
ACPI 6 (STAO, XENV, WPBT, NFIT, IORT), changes related to the
other tables (DTRM, FADT, LPIT, MADT), new predefined names
(_BTH, _CR3, _DSD, _LPI, _MTL, _PRR, _RDI, _RST, _TFP, _TSN),
fixes and cleanups (Bob Moore, Lv Zheng).
- ACPI device power management core code update to follow ACPI 6
which reflects the ACPI device power management implementation
in Windows (Rafael J Wysocki).
- Rework of the backlight interface selection logic to reduce the
number of kernel command line options and improve the handling
of DMI quirks that may be involved in that and to make the
code generally more straightforward (Hans de Goede).
- Fixes for the ACPI Embedded Controller (EC) driver related to
the handling of EC transactions (Lv Zheng).
- Fix for a regression related to the ACPI resources management
and resulting from a recent change of ACPI initialization code
ordering (Rafael J Wysocki).
- Fix for a system initialization regression related to ACPI
introduced during the 3.14 cycle and caused by running the
code that switches the platform over to the ACPI mode too
early in the initialization sequence (Rafael J Wysocki).
- Support for the ACPI _CCA device configuration object related
to DMA cache coherence (Suravee Suthikulpanit).
- ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov).
- ACPI battery driver cleanups (Luis Henriques, Mathias Krause).
- ACPI processor driver cleanups (Hanjun Guo).
- Cleanups and documentation update related to the ACPI device
properties interface based on _DSD (Rafael J Wysocki).
- ACPI device power management fixes (Rafael J Wysocki).
- Assorted cleanups related to ACPI (Dominik Brodowski. Fabian
Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki).
- Fix for a long-standing issue causing General Protection Faults
to be generated occasionally on return to user space after resume
from ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar).
- Fix to make the suspend core code return -EBUSY consistently in
all cases when system suspend is aborted due to wakeup detection
(Ruchi Kandoi).
- Support for automated device wakeup IRQ handling allowing drivers
to make their PM support more starightforward (Tony Lindgren).
- New tracepoints for suspend-to-idle tracing and rework of the
prepare/complete callbacks tracing in the PM core (Todd E Brandt,
Rafael J Wysocki).
- Wakeup sources framework enhancements (Jin Qian).
- New macro for noirq system PM callbacks (Grygorii Strashko).
- Assorted cleanups related to system suspend (Rafael J Wysocki).
- cpuidle core cleanups to make the code more efficient (Rafael J
Wysocki).
- powernv/pseries cpuidle driver update (Shilpasri G Bhat).
- cpufreq core fixes related to CPU online/offline that should
reduce the overhead of these operations quite a bit, unless the
CPU in question is physically going away (Viresh Kumar, Saravana
Kannan).
- Serialization of cpufreq governor callbacks to avoid race
conditions in some cases (Viresh Kumar).
- intel_pstate driver fixes and cleanups (Doug Smythies, Prarit
Bhargava, Joe Konno).
- cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep
Holla, Felipe Balbi, Tang Yuantian).
- Assorted cleanups in cpufreq drivers and core (Shailendra Verma,
Fabian Frederick, Wang Long).
- New Device Tree bindings for representing Operating Performance
Points (Viresh Kumar).
- Updates for the common clock operations support code in the PM
core (Rajendra Nayak, Geert Uytterhoeven).
- PM domains core code update (Geert Uytterhoeven).
- Intel Knights Landing support for the RAPL (Running Average Power
Limit) power capping driver (Dasaratharaman Chandramouli).
- Fixes related to the floor frequency setting on Atom SoCs in the
RAPL power capping driver (Ajay Thomas).
- Runtime PM framework documentation update (Ben Dooks).
- cpupower tool fix (Herton R Krzesinski).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJViJdWAAoJEILEb/54YlRx/9gP/3gHoFevNRycvn0VpKqdufCI
Mxy2LBBLlfyW2uD3+NvqvA2WWSo0Cs/LgXa04eAVxPdU7k48s8w+54U23wSouzjW
gfwAmuHxzDR8v0h8X3h6BxNzmkIQHtmDcQlA/cZdHejY/UUw01yxRGNUUZDNbxlm
WXn2nmlBLmGqXTYq0fpBV+3jicUghJqHHsBCqa3VR2yQioHMJG01F4UZMqYTZunN
OIvDUghxByKz6alzdCqlLl1Y0exV6vwWUAzBsl1qHqmHu/bWFSZn3ujNNVrjqHhw
Kl7/8dC2pQkv3Zo3gEVvfQ0onotwWZxGHzPQRdvmxvRnBunQVCi/wynx90yABX/r
PPb/iBNV0mZskbF0zb0GZT3ZZWGA8Z0p3o5JQv2jV4m62qTzx8w50Y5kbn9N1WT+
5bre7AVbVAlGonWszcS9iE+6TOboRz9OD1CCwPFXHItFutlBkau+1hHfFoLM0o9n
LhpGuyszT/EUa1BHkLzuCckFqO2DpbF3N2CKmuTekw0CdgdsvRL2pRByuerk3j7R
WQhlcvBq5YH6j43AuoEZKp8r1iN8oG/iqlrMYQaYWrW9hJaoQOoU8dGJxp/e7gKN
r/qeYjETI+tIsjCbtH5WQzzxDI3gPISAYAtfqs7G34EEo+Lwp6kyRUAF4kDot2V3
ZIyuKMmTu4cdwDETr/O+
=7jTj
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"The rework of backlight interface selection API from Hans de Goede
stands out from the number of commits and the number of affected
places perspective. The cpufreq core fixes from Viresh Kumar are
quite significant too as far as the number of commits goes and because
they should reduce CPU online/offline overhead quite a bit in the
majority of cases.
From the new featues point of view, the ACPICA update (to upstream
revision 20150515) adding support for new ACPI 6 material to ACPICA is
the one that matters the most as some new significant features will be
based on it going forward. Also included is an update of the ACPI
device power management core to follow ACPI 6 (which in turn reflects
the Windows' device PM implementation), a PM core extension to support
wakeup interrupts in a more generic way and support for the ACPI _CCA
device configuration object.
The rest is mostly fixes and cleanups all over and some documentation
updates, including new DT bindings for Operating Performance Points.
There is one fix for a regression introduced in the 4.1 cycle, but it
adds quite a number of lines of code, it wasn't really ready before
Thursday and you were on vacation, so I refrained from pushing it on
the last minute for 4.1.
Specifics:
- ACPICA update to upstream revision 20150515 including basic support
for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO,
XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM,
FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI,
_MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore,
Lv Zheng).
- ACPI device power management core code update to follow ACPI 6
which reflects the ACPI device power management implementation in
Windows (Rafael J Wysocki).
- rework of the backlight interface selection logic to reduce the
number of kernel command line options and improve the handling of
DMI quirks that may be involved in that and to make the code
generally more straightforward (Hans de Goede).
- fixes for the ACPI Embedded Controller (EC) driver related to the
handling of EC transactions (Lv Zheng).
- fix for a regression related to the ACPI resources management and
resulting from a recent change of ACPI initialization code ordering
(Rafael J Wysocki).
- fix for a system initialization regression related to ACPI
introduced during the 3.14 cycle and caused by running the code
that switches the platform over to the ACPI mode too early in the
initialization sequence (Rafael J Wysocki).
- support for the ACPI _CCA device configuration object related to
DMA cache coherence (Suravee Suthikulpanit).
- ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov).
- ACPI battery driver cleanups (Luis Henriques, Mathias Krause).
- ACPI processor driver cleanups (Hanjun Guo).
- cleanups and documentation update related to the ACPI device
properties interface based on _DSD (Rafael J Wysocki).
- ACPI device power management fixes (Rafael J Wysocki).
- assorted cleanups related to ACPI (Dominik Brodowski, Fabian
Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki).
- fix for a long-standing issue causing General Protection Faults to
be generated occasionally on return to user space after resume from
ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar).
- fix to make the suspend core code return -EBUSY consistently in all
cases when system suspend is aborted due to wakeup detection (Ruchi
Kandoi).
- support for automated device wakeup IRQ handling allowing drivers
to make their PM support more starightforward (Tony Lindgren).
- new tracepoints for suspend-to-idle tracing and rework of the
prepare/complete callbacks tracing in the PM core (Todd E Brandt,
Rafael J Wysocki).
- wakeup sources framework enhancements (Jin Qian).
- new macro for noirq system PM callbacks (Grygorii Strashko).
- assorted cleanups related to system suspend (Rafael J Wysocki).
- cpuidle core cleanups to make the code more efficient (Rafael J
Wysocki).
- powernv/pseries cpuidle driver update (Shilpasri G Bhat).
- cpufreq core fixes related to CPU online/offline that should reduce
the overhead of these operations quite a bit, unless the CPU in
question is physically going away (Viresh Kumar, Saravana Kannan).
- serialization of cpufreq governor callbacks to avoid race
conditions in some cases (Viresh Kumar).
- intel_pstate driver fixes and cleanups (Doug Smythies, Prarit
Bhargava, Joe Konno).
- cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep
Holla, Felipe Balbi, Tang Yuantian).
- assorted cleanups in cpufreq drivers and core (Shailendra Verma,
Fabian Frederick, Wang Long).
- new Device Tree bindings for representing Operating Performance
Points (Viresh Kumar).
- updates for the common clock operations support code in the PM core
(Rajendra Nayak, Geert Uytterhoeven).
- PM domains core code update (Geert Uytterhoeven).
- Intel Knights Landing support for the RAPL (Running Average Power
Limit) power capping driver (Dasaratharaman Chandramouli).
- fixes related to the floor frequency setting on Atom SoCs in the
RAPL power capping driver (Ajay Thomas).
- runtime PM framework documentation update (Ben Dooks).
- cpupower tool fix (Herton R Krzesinski)"
* tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (194 commits)
cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state
x86: Load __USER_DS into DS/ES after resume
PM / OPP: Add binding for 'opp-suspend'
PM / OPP: Allow multiple OPP tables to be passed via DT
PM / OPP: Add new bindings to address shortcomings of existing bindings
ACPI: Constify ACPI device IDs in documentation
ACPI / enumeration: Document the rules regarding the PRP0001 device ID
ACPI / video: Make acpi_video_unregister_backlight() private
acpi-video-detect: Remove old API
toshiba-acpi: Port to new backlight interface selection API
thinkpad-acpi: Port to new backlight interface selection API
sony-laptop: Port to new backlight interface selection API
samsung-laptop: Port to new backlight interface selection API
msi-wmi: Port to new backlight interface selection API
msi-laptop: Port to new backlight interface selection API
intel-oaktrail: Port to new backlight interface selection API
ideapad-laptop: Port to new backlight interface selection API
fujitsu-laptop: Port to new backlight interface selection API
eeepc-laptop: Port to new backlight interface selection API
dell-wmi: Port to new backlight interface selection API
...
During initialization and exit it is possible that the target pstate
might not actually be set. Furthermore, the result can be that the
driver and the processor are out of synch and, under some conditions,
the driver might never send the processor the proper target pstate.
This patch adds a bypass or do_checks flag to the call to
intel_pstate_set_pstate. If bypass, then specifically bypass clamp
checks and the do not send if it is the same as last time check. If
do_checks, then, and as before, do the current policy clamp checks,
and do not do actual send if the new target is the same as the old.
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Reported-by: Marien Zwart <marien.zwart@gmail.com>
Reported-by: Alex Lochmann <alexander.lochmann@tu-dortmund.de>
Reported-by: Piotr Ko?aczkowski <pkolaczk@gmail.com>
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
Tested-by: Marien Zwart <marien.zwart@gmail.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
[ rjw: Dropped pointless symbol definitions, rebased ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit ce717613f3 (intel_pstate: Turn per cpu printk into pr_debug)
turned per cpu printk into pr_debug. However, only half of the change
was done, introducing an inconsistency between entry and exit from
driver pstate control. This patch changes the exit message to pr_debug
also.
The various messages are inconsistent with respect to any identifier
text that can be used to help isolate the desired information from a
huge log. This patch makes a consistent identifier portion of the
string.
Amends: ce717613f3 (intel_pstate: Turn per cpu printk into pr_debug)
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so
remove it from there and fix up the resulting build problems
triggered on x86 {64|32}-bit {def|allmod|allno}configs.
The breakages were triggering in places where x86 builds relied
on vmalloc() facilities but did not include <linux/vmalloc.h>
explicitly and relied on the implicit inclusion via <asm/io.h>.
Also add:
- <linux/init.h> to <linux/io.h>
- <asm/pgtable_types> to <asm/io.h>
... which were two other implicit header file dependencies.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[ Tidied up the changelog. ]
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Colin Cross <ccross@android.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: James E.J. Bottomley <JBottomley@odin.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Suma Ramars <sramars@cisco.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 007bea098b (intel_pstate: Add setting voltage value for
baytrail P states.) introduced byt_set_pstate() with the assumption that
it would always be run by the CPU whose MSR is to be written by it. It
turns out, however, that is not always the case in practice, so modify
byt_set_pstate() to enforce the MSR write done by it to always happen on
the right CPU.
Fixes: 007bea098b (intel_pstate: Add setting voltage value for baytrail P states.)
Signed-off-by: Joe Konno <joe.konno@intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The intel_pstate driver is difficult to debug and investigate without tsc.
Also, it is likely use of tsc, and some version of C0 percentage,
will be re-introdcued in futute.
There have also been occasions where it is desirebale to know, and
confirm, the previous target pstate.
This patch brings back tsc, adds previous target pstate,
and adds both to the trace data collection.
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
I keep seeing
drivers/cpufreq/intel_pstate.c: In function ‘intel_pstate_init’:
drivers/cpufreq/intel_pstate.c:1187:26: warning: initialization from incompatible pointer type
struct cpuinfo_x86 *c = &boot_cpu_data;
when doing randconfig builds.
This is caused by the fact that when !CONFIG_SMP, asm/processor.h
defines cpu_info to boot_cpu_data and the local variable
struct cpu_defaults *cpu_info
overshadows it leading to this unfortunate assignment in the
preprocessed source:
struct cpu_defaults *boot_cpu_data;
struct cpuinfo_x86 *c = &boot_cpu_data;
Rename the local variable and use static_cpu_has_safe() which alleviates
the need for defining a local cpuinfo_x86 pointer.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change the setpoint for the Baytrail and Cherrytrail CPUs. This
will cause more aggressive pstate selection and improves
performance on a variety of workloads with little power penalty.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
1. Add Knights Landing (KNL) CPUID to the list of CPUIDs supported by
the intel_pstate driver.
2. Add a new cpu_default structure for KNL since KNL has a slightly
different mechanism to get turbo pstates from MSRs.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
[ rjw: Subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
x86_match_cpu will not match our cpuid unless APERF/MPERF flag is
set, so there is no need to do the manual check for this MSR.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Allow users the option to disable the driver for any hardware
which does not support HWP.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the user has requested an override of the min_perf_pct via
sysfs, then it should be restored whenever policy is updated,
such as on resume. Take the max of whatever the user requested
and whatever the policy is.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When thermal or other subsystem requests to change the policy,
use that irrepective of whether cpufreq policy is PERFORMANCE or
not. Without this change, when thermal subsystem passive policy wants
to reduce performance, it still runs at 100%.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a sysfs interface to display the total number of supported
pstates. This value is independent of whether turbo has been
enabled or disabled.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch adds "turbo_pct" to the intel_pstate sysfs interface.
turbo_pct will display the percentage of the total supported
pstates that are in the turbo range. This value is independent
of whether turbo has been disabled or not.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a few comments in the code which calculates busyness to
clarify parts of the algorithm.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To force loading on Oracle Sun X86 servers, provide one kernel command line
parameter
intel_pstate = force
For those who are aware of the risk of no power capping capabily working
and try to get better performance with this driver.
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: Linda Knippers <linda.knippers@hp.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Oracle Sun X86 servers have dynamic power capping capability that works via
ACPI _PPC method etc, so skip loading this driver if Sun server has ACPI _PPC
enabled.
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Tested-by: Linda Knippers <linda.knippers@hp.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add BDW-H to the list of supported processors.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.
With HWP enbaled intel_pstate will no longer be responsible for selecting P
states for the processor. intel_pstate will continue to register to
the cpufreq core as the scaling driver for CPUs implementing
HWP. In HWP mode intel_pstate provides three functions reporting
frequency to the cpufreq core, support for the set_policy() interface
from the core and maintaining the intel_pstate sysfs interface in
/sys/devices/system/cpu/intel_pstate. User preferences expressed via
the set_policy() interface or the sysfs interface are forwared to the
CPU via the HWP MSR interface.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Using a VID value that is not high enough for the requested P state can
cause machine checks. Add a ceiling function to ensure calulated VIDs
with fractional values are set to the next highest integer VID value.
The algorythm for calculating the non-trubo VID from the BIOS writers
guide is:
vid_ratio = (vid_max - vid_min) / (max_pstate - min_pstate)
vid = ceiling(vid_min + (req_pstate - min_pstate) * vid_ratio)
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
BYT has a different conversion from P state to frequency than the core
processors. This causes the min/max and current frequency to be
misreported on some BYT SKUs. Tested on BYT N2820, Ivybridge and
Haswell processors.
Link: https://bugzilla.yoctoproject.org/show_bug.cgi?id=6663
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The user may have custom settings don't destroy them during suspend.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=80651
Reported-by: Tobias Jakobi <liquid.acid@gmx.net>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some BIOSes modify the state of MSR_IA32_MISC_ENABLE_TURBO_DISABLE
based on the current power source for the system battery AC vs
battery. Reflect the correct current state and ability to modify the
no_turbo sysfs file based on current state of
MSR_IA32_MISC_ENABLE_TURBO_DISABLE.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=83151
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Code which changes policy to powersave changes also max_policy_pct based on
max_freq. Code which change max_perf_pct has upper limit base on value
max_policy_pct. When policy is changing from powersave back to performance
then max_policy_pct is not changed. Which means that changing max_perf_pct is
not possible to high values if max_freq was too low in powersave policy.
Test case:
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
800000
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3300000
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100
$ echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
$ echo 20 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
800000
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
20
$ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo 3300000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
$ echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3300000
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
24
And now intel_pstate driver allows to set maximal value for max_perf_pct based
on max_policy_pct which is 24 for previous powersave max_freq 800000.
This patch will set default value for max_policy_pct when setting policy to
performance so it will allow to set also max value for max_perf_pct.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Cc: All applicable <stable@vger.kernel.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It should have been removed with commit d1b6848590
("cpufreq / intel_pstate: Optimize intel_pstate_set_policy")
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This is pretty much the same as Intel Baytrail, only the CPU ID is
different. Add the new ID to the supported CPU list.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On larger systems intel_pstate currently spams the boot up
log with its "Intel pstate controlling ..." message for each CPU.
It's the only subsystem that prints a message for each
CPU.
Turn the message into a pr_debug.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The specific rounding adds conditionally only 1/256 to fractional
part of core_pct.
We can safely remove it without any noticeable impact in
calculations.
Use div64_u64 instead of div_u64 to avoid possible overflow of
sample->mperf as divisor
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Simplify the code by removing the inline functions pstate_increase and
pstate_decrease and use directly the intel_pstate_set_pstate.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently we shift right aperf and mperf variables by FRAC_BITS
to prevent overflow when we convert them to fix point numbers
(shift left by FRAC_BITS).
But this is not necessary, because we actually use delta aperf and mperf
which are much less than APERF and MPERF values.
So, use the unmodified APERF and MPERF values in calculation.
This also adds 8 bits in precision, although the gain is insignificant.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
According to Intel 64 and IA-32 Architectures SDM, Volume 3,
Chapter 14.2, "Software needs to exercise care to avoid delays
between the two RDMSRs (for example interrupts)".
So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF.
This should increase the accuracy of the calculations.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Suppress checkpatch.pl --strict warnings:
CHECK: Alignment should match open parenthesis
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove the unnecessary intermediate assignment and use directly the
pid_params.sample_rate_ms variable.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove unnecessary parentheses.
Also, add parentheses in one case for better readability.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We can fit these lines in a single one, under the 80 characters
limit.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
div_s64() accepts the divisor parameter as s32. Helper div_fp()
also accepts divisor as int32_t.
So, remove the unnecessary int64_t type casting.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since we never remove sysfs entry and debugfs files, we can make
the intel_pstate_kobject and debugfs_parent locals.
Also, annotate with __init intel_pstate_sysfs_expose_params()
and intel_pstate_debug_expose_params() in order to be freed
after bootstrap.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ensure that cpu->cpu is set before writing MSR_IA32_PERF_CTL during CPU
initialization. Otherwise only cpu0 has its P-state set and all other
cores are left with their values unchanged.
In most cases, this is not too serious because the P-states will be set
correctly when the timer function is run. But when the default governor
is set to performance, the per-CPU current_pstate stays the same forever
and no attempts are made to write the MSRs again.
Signed-off-by: Vincent Minet <vincent@vincent-minet.net>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If turbo is disabled in the BIOS bit 38 should be set in
MSR_IA32_MISC_ENABLE register per section 14.3.2.1 of the SDM Vol 3
document 325384-050US Feb 2014. If this bit is set do *not* attempt
to disable trubo via the MSR_IA32_PERF_CTL register. On some systems
trying to disable turbo via MSR_IA32_PERF_CTL will cause subsequent
writes to MSR_IA32_PERF_CTL not take affect, in fact reading
MSR_IA32_PERF_CTL will not show the IDA/Turbo DISENGAGE bit(32) as
set. A write of bit 32 to zero returns to normal operation.
Also deal with the case where the processor does not support
turbo and the BIOS does not report the fact in MSR_IA32_MISC_ENABLE
but does report the max and turbo P states as the same value.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=64251
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 21855ff5 (intel_pstate: Set turbo VID for BayTrail) introduced
setting the turbo VID which is required to prevent a machine check on
some Baytrail SKUs under heavy graphics based workloads. The
docmumentation update that brought the requirement to light also
changed the bit mask used for enumerating P state and VID values from
0x7f to 0x3f.
This change returns the mask value to 0x7f.
Tested with the Intel NUC DN2820FYK,
BIOS version FYBYT10H.86A.0034.2014.0513.1413 with v3.16-rc1 and
v3.14.8 kernel versions.
Fixes: 21855ff5 (intel_pstate: Set turbo VID for BayTrail)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77951
Reported-and-tested-by: Rune Reterson <rune@megahurts.dk>
Reported-and-tested-by: Eric Eickmeyer <erich@ericheickmeyer.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There was a mistake in the actual rounding portion this previous patch:
f0fe3cd7e1 (intel_pstate: Correct rounding in busy calculation) such that
the rounding was asymetric and incorrect.
Severity: Not very serious, but can increase target pstate by one extra value.
For real world work flows the issue should self correct (but I have no proof).
It is the equivalent of different PID gains for positive and negative numbers.
Examples:
-3.000000 used to round to -4, rounds to -3 with this patch.
-3.503906 used to round to -5, rounds to -4 with this patch.
Fixes: f0fe3cd7e1 (intel_pstate: Correct rounding in busy calculation)
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We check the CPU ID during driver init. There is no need
to do it again per logical CPU initialization.
So, remove the duplicate check.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This change makes the busy calculation using 64 bit math which prevents
overflow for large values of aperf/mperf.
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The PID assumes that samples are of equal time, which for a deferable
timers this is not true when the system goes idle. This causes the
PID to take a long time to converge to the min P state and depending
on the pattern of the idle load can make the P state appear stuck.
The hold-off value of three sample times before using the scaling is
to give a grace period for applications that have high performance
requirements and spend a lot of time idle, The poster child for this
behavior is the ffmpeg benchmark in the Phoronix test suite.
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Changing to fixed point math throughout the busy calculation in
commit e66c1768 (Change busy calculation to use fixed point
math.) Introduced some inaccuracies by rounding the busy value at two
points in the calculation. This change removes roundings and moves
the rounding to the output of the PID where the calculations are
complete and the value returned as an integer.
Fixes: e66c176837 (intel_pstate: Change busy calculation to use fixed point math.)
Reported-by: Doug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit fcb6a15c (intel_pstate: Take core C0 time into account for core
busy calculation) introduced a regression referenced below. The issue
with "lockup" after suspend that this commit was addressing is now dealt
with in the suspend path.
Fixes: fcb6a15c2e (intel_pstate: Take core C0 time into account for core busy calculation)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581
Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121
Reported-by: Doug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Although, a value is assigned to member name of struct cpudata,
it is never used.
We can safely remove it.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Setting the P state of the core to max at init time is a hold over
from early implementation of intel_pstate where intel_pstate disabled
cpufreq and loaded VERY early in the boot sequence. This was to
ensure that intel_pstate did not affect boot time. This in not needed
now that intel_pstate is a cpufreq driver.
Removing this covers the case where a CPU has gone through a manual
CPU offline/online cycle and the P state is set to MAX on init and the
CPU immediately goes idle. Due to HW coordination the P state request
on the idle CPU will drag all cores to MAX P state until the load is
reevaluated when to core goes non-idle.
Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add support for Broadwell processors.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A documentation update exposed that there is a separate set of VID
values that must be used in the turbo/boost P state range. Add
enumerating and setting the correct VID for P states in the turbo
range.
Cc: v3.13+ <stable@vger.kernel.org> # v3.13+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit d37e2b7644 ("intel_pstate: remove unneeded sample buffers")
we use only one sample. So, there is no need to pass the sample
pointer to intel_pstate_calc_busy. Instead, get the pointer from
cpudata. Also, remove the unused SAMPLE_COUNT macro.
While at it, reformat the first line in this function.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ensure that no timer callback is running since we are about to free
the timer structure. We cannot guarantee that the call back is called
on the CPU where the timer is running.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change to use the new ->stop_cpu() callback to do clean up during CPU
hotplug. The requested P state for an offline core will be used by the
hardware coordination function to select the package P state. If the
core is under load when it is offlined it will fix the package P state
floor to the requested P state of offline core.
Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
commit d253d2a526 (Improve accuracy by not truncating until final
result), changed internal variables of the PID to be fixed point
numbers. Update the pid_reset() to reflect this change.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove unneeded sample buffers, intel_pstate operates on the most
recent sample only. This save some memory and make the code more
readable.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit fcb6a15c2e (intel_pstate: Take core C0 time into account for
core busy calculation) introduced a regression on some processor SKUs
supported by intel_pstate. This was due to the truncation caused by
using integer math to calculate core busy and C0 percentages.
On a i7-4770K processor operating at 800Mhz going to 100% utilization
the percent busy of the CPU using integer math is 22%, but it actually
is 22.85%. This value scaled to the current frequency returned 97
which the PID interpreted as no error and did not adjust the P state.
Tested on i7-4770K, i7-2600, i5-3230M.
Fixes: fcb6a15c2e (intel_pstate: Take core C0 time into account for core busy calculation)
References: https://lkml.org/lkml/2014/2/19/626
References: https://bugzilla.kernel.org/show_bug.cgi?id=70941
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A documentation update exposed the existance of the turbo ratio
register. Update baytrail support to use the turbo range.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
LFM (max efficiency ratio) is the max frequency at minimum voltage
supported by the processor. Using LFM as the minimum P state
increases performmance without affecting power. By not using P states
below LFM we avoid using P states that are less power efficient.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove the reporting of energy since it does not provide any useful
information about the state of the driver and will be a maintainance
headache going forward since the RAPL energy units register is not
architectural and subject to change between micro-architectures
References: https://bugzilla.kernel.org/show_bug.cgi?id=69831
Fixes: b69880f9cc (intel_pstate: Add trace point to report internal state.)
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Take non-idle time into account when calculating core busy time.
This ensures that intel_pstate will notice a decrease in load.
References: https://bugzilla.kernel.org/show_bug.cgi?id=66581
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add perf trace event "power:pstate_sample" to report driver state to
aid in diagnosing issues reported against intel_pstate.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
KVM environments do not support APERF/MPERF MSRs. intel_pstate cannot
operate without these registers.
The previous validity checks in intel_pstate_msrs_not_valid() are
insufficent in nested KVMs.
References: https://bugzilla.redhat.com/show_bug.cgi?id=1046317
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove the periodic P state boost. This code required for some corner
case benchmark tests. The calculation of the required P state was
incorrect/inaccurate and would not allow P state increase.
This was fixed by a combination of commits:
2134ed4 cpufreq / intel_pstate: Change to scale off of max P-state
d253d2a intel_pstate: Improve accuracy by not truncating until final result
References: https://bugzilla.kernel.org/show_bug.cgi?id=64271
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Baytrail requires setting P state and voltage pairs when adjusting the
requested P state. Add function for retrieving the valid voltage
values and modify *_set_pstate() functions to caluclate the
appropriate voltage for the requested P state.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-cpufreq:
intel_pstate: skip the driver if ACPI has power mgmt option
cpufreq: ondemand: Remove redundant return statement
cpufreq: move freq change notifications to cpufreq core
cpufreq: distinguish drivers that do asynchronous notifications
cpufreq/intel_pstate: Add static declarations to internal functions
cpufreq: arm_big_little: reconfigure switcher behavior at run time
cpufreq: arm_big_little: add in-kernel switching (IKS) support
ARM: vexpress/TC2: register vexpress-spc cpufreq device
cpufreq: arm_big_little: add vexpress SPC interface driver
ARM: vexpress/TC2: add cpu clock support
ARM: vexpress/TC2: add support for CPU DVFS
Do not load the Intel pstate driver if the platform firmware
(ACPI BIOS) supports the power management alternatives.
The ACPI BIOS indicates that the OS control mode can be used
if the _PSS (Performance Supported States) is defined in ACPI
table. For the OS control mode, the Intel pstate driver will be
loaded.
HP BIOS has several power management modes (firmware, OS-control and
so on). For the OS control mode in HP BIOS, the Intel p-state driver
will be loaded. When the customer chooses the firmware power
management in HP BIOS, the Intel p-state driver will be ignored.
I put hw_vendor_info vendor_info in case other vendors (Dell, Lenovo...)
have their firmware power management. Vendors should make sure their
firmware power management works properly, and they can go for adding
their vendor info to the variable.
I have verified the patch on HP ProLiant servers. The patch worked
correctly.
Signed-off-by: Adrian Huang <adrianhuang0701@gmail.com>
[rjw: Fixed up !CONFIG_ACPI build]
[Linda Knippers: As Adrian has recently left HP, I retested the
updated patch on an HP ProLiant server and verified that it is
behaving correctly. When the BIOS is configured for OS control for
power management, the intel_pstate driver loads as expected. When
the BIOS is configured to provide the power management, the
intel_pstate driver does not load and we get the pcc_cpufreq driver
instead.]
Signed-off-by: Linda Knippers <linda.knippers@hp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fixes warnings reported by kbuild test robot
sparse warnings: (new ones prefixed by >>)
drivers/cpufreq/intel_pstate.c:729:6: sparse: symbol 'copy_pid_params' was not declared. Should it be static?
drivers/cpufreq/intel_pstate.c:739:6: sparse: symbol 'copy_cpu_funcs' was not declared. Should it be static?
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-cpufreq: (167 commits)
cpufreq: create per policy rwsem instead of per CPU cpu_policy_rwsem
intel_pstate: Add Baytrail support
intel_pstate: Refactor driver to support CPUs with different MSR layouts
cpufreq: Implement light weight ->target_index() routine
PM / OPP: rename header to linux/pm_opp.h
PM / OPP: rename data structures to dev_pm equivalents
PM / OPP: rename functions to dev_pm_opp*
cpufreq / governor: Remove fossil comment
cpufreq: exynos4210: Use the common clock framework to set APLL clock rate
cpufreq: exynos4x12: Use the common clock framework to set APLL clock rate
cpufreq: Detect spurious invocations of update_policy_cpu()
cpufreq: pmac64: enable cpufreq on iMac G5 (iSight) model
cpufreq: pmac64: provide cpufreq transition latency for older G5 models
cpufreq: pmac64: speed up frequency switch
cpufreq: highbank-cpufreq: Enable Midway/ECX-2000
exynos-cpufreq: fix false return check from "regulator_set_voltage"
speedstep-centrino: Remove unnecessary braces
acpi-cpufreq: Add comment under ACPI_ADR_SPACE_SYSTEM_IO case
cpufreq: arm-big-little: use clk_get instead of clk_get_sys
cpufreq: exynos: Show a list of available frequencies
...
Conflicts:
drivers/devfreq/exynos/exynos5_bus.c
Add support for the Baytrail processor.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Non-core processors have a different MSR layout to commumicate P state
information. Refactor the driver to use CPU dependent accessors to
P state information.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The minimum pstate is supposed to be a percentage of the maximum P
state available. Calculate min using max pstate and not the
current max which may have been limited by the user
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch addresses Bug 60727
(https://bugzilla.kernel.org/show_bug.cgi?id=60727)
which was due to the truncation of intermediate values in the
calculations, which causes the code to consistently underestimate the
current cpu frequency, specifically 100% cpu utilization was truncated
down to the setpoint of 97%. This patch fixes the problem by keeping
the results of all intermediate calculations as fixed point numbers
rather scaling them back and forth between integers and fixed point.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60727
Signed-off-by: Brennan Shacklett <bpshacklett@gmail.com>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The expression in line 398 of intel_pstate.c causes the following
warning to be emitted:
drivers/cpufreq/intel_pstate.c:398:3: warning: left shift count >= width of type
which happens because unsigned long is 32-bit on some architectures.
Fix that by using a helper u64 variable and simplify the code
slightly.
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the system is suspended while max_perf_pct is less than 100 percent
or no_turbo set policy->{min,max} will be set incorrectly with scaled
values which turn the scaled values into hard limits.
References: https://bugzilla.kernel.org/show_bug.cgi?id=61241
Reported-by: Patrick Bartels <petzicus@googlemail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.9+ <stable@vger.kernel.org> # 3.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Most of the users of cpufreq_verify_within_limits() calls it for
limiting with min/max from policy->cpuinfo. We can make that code
simple by introducing another routine which will do this for them
automatically.
This patch adds another routine cpufreq_verify_within_cpu_limits()
and updates others to use it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When sysfs for no_turbo is set, then also some p states in turbo regions
are observed. This patch will set IDA Engage bit when no_turbo is set to
explicitly disengage turbo.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Enable the intel_pstate driver for Haswell CPUs. One missing Ivy Bridge
model (0x3E) is also included. Models referenced from
tools/power/x86/turbostat/turbostat.c:has_nehalem_turbo_ratio_limit
Signed-off-by: Nell Hardcastle <nell@spicious.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We don't need to set .owner = THIS_MODULE any more in cpufreq drivers
as this field isn't used any more by the cpufreq core.
This patch removes it and updates all dependent drivers accordingly.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change to using max P-state instead of max turbo P-state. This
change resolves two issues.
On a quiet system intel_pstate can fail to respond to a load change.
On CPU SKUs that have a limited number of P-states and no turbo range
intel_pstate fails to select the highest available P-state.
This change is suitable for stable v3.9+
References: https://bugzilla.kernel.org/show_bug.cgi?id=59481
Reported-and-tested-by: Arjan van de Ven <arjan@linux.intel.com>
Reported-and-tested-by: dsmythies@telus.net
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.9+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
This removes all the drivers/cpufreq uses of the __cpuinit macros
from all C files.
[1] https://lkml.org/lkml/2013/5/20/589
[v2: leave 2nd lines of args misaligned as requested by Viresh]
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: cpufreq@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Add CPU ID for Ivybrigde processor.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use vzalloc() instead of vmalloc() and memset(0).
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The driver can no longer be built as a module remove the compile fence
around cpufreq tracing call.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove dead code from the driver.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The ffmpeg benchmark in the phoronix test suite has threads on
multiple cores that rely on the progress on of threads on other cores
and ping pong back and forth fast enough to make the core appear less
busy than it "should" be. If the core has been at minimum p-state for
a while bump the pstate up to kick the core to see if it is in this
ping pong state. If the core is truly idle the p-state will be
reduced at the next sample time. If the core makes more progress it
will send more work to the thread bringing both threads out of the
ping pong scenario and the p-state will be selected normally.
This fixes a performance regression of approximately 30%
Cc: 3.9+ <stable@vger.kernel.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are two ways that the maximum p-state can be clamped, via a
policy change and via the sysfs file.
The acpi-thermal driver adjusts the p-state policy in response to
thermal events. These changes override the users settings at the
moment.
Use the lowest of the two requested values this ensures that we will
not exceed the requested pstate from either mechanism.
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 3.9+ <stable@vger.kernel.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Idle time is taken into account in the APERF/MPERF ratio calculation
there is no reason for the driver to track it seperately. This
reduces the work in the driver and makes the code more readable.
Removal of the tracking of sample duration removes the possibility of
the divide by zero exception when the duration is sub 1us
References: https://bugzilla.kernel.org/show_bug.cgi?id=56691
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Cc: 3.9+ <stable@vger.kernel.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-cpufreq: (57 commits)
cpufreq: MAINTAINERS: Add co-maintainer
cpufreq: pxa2xx: initialize variables
ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y
cpufreq: cpu0: Put cpu parent node after using it
cpufreq: ARM big LITTLE: Adapt to latest cpufreq updates
cpufreq: ARM big LITTLE: put DT nodes after using them
cpufreq: Don't call __cpufreq_governor() for drivers without target()
cpufreq: exynos5440: Protect OPP search calls with RCU lock
cpufreq: dbx500: Round to closest available freq
cpufreq: Call __cpufreq_governor() with correct policy->cpus mask
cpufreq / intel_pstate: Optimize intel_pstate_set_policy
cpufreq: OMAP: instantiate omap-cpufreq as a platform_driver
arm: exynos: Enable OPP library support for exynos5440
cpufreq: exynos: Remove error return even if no soc is found
cpufreq: exynos: Add cpufreq driver for exynos5440
cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor
cpufreq: ondemand: allow custom powersave_bias_target handler to be registered
cpufreq: convert cpufreq_driver to using RCU
cpufreq: powerpc/platforms/cell: move cpufreq driver to drivers/cpufreq
cpufreq: sparc: move cpufreq driver to drivers/cpufreq
...
Conflicts:
MAINTAINERS (with commit a8e39c3 from pm-cpuidle)
drivers/cpufreq/cpufreq_governor.h (with commit beb0ff3)
This function is called quite often from other subsystems.
Removed unused call to intel_pstate_get_min_max().
Also when "policy->policy == CPUFREQ_POLICY_PERFORMANCE", then
no need to do calculations as the limits will be forced anyway.
Also corrected filename in the header.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The current calculation of the delay time is wrong and a cut and
paste error from a previous experimental driver. This can result in
the timeout being set to jiffies + 1 which setup the driver to race
with itself if the APIC timer interrupt happens at just the right
time.
References: https://bugzilla.redhat.com/show_bug.cgi?id=920289
Reported-by: Adam Williamson <awilliam@redhat.com>
Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
They are defined in coreboot (MSR_PLATFORM) and the other
one is already defined in msr-index.h.
Let's use those.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use the correct pstate value to calculate the effective frequency.
References: https://bugzilla.redhat.com/show_bug.cgi?id=923942
Reported-by: Satish Balay <balay@fastmail.fm>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some VMs seem to try to implement some MSRs but not all the registers
the driver needs. Check to make sure all the MSR that we need are
available. If any of the required MSRs are not available refuse to
load.
References: https://bugzilla.redhat.com/show_bug.cgi?id=922923
Reported-by: Josh Stone <jistone@redhat.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It seems some VMs support the P state MSRs but return zeros. Fail
gracefully if we are running in this environment.
References: https://bugzilla.redhat.com/show_bug.cgi?id=916833
Reported-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If cpufreq_register_driver() fails just free memory that has been
allocated and return. intel_pstate_exit() function is removed since we
are built-in only now there is no reason for a module exit procedure.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When intel_pstate is configured into the kernel it will become the
preferred scaling driver for processors that it supports. Allow the
user to override this by adding:
intel_pstate=disable
on the kernel command line.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fixes 32 bit build.
on i386:
drivers/built-in.o: In function `intel_pstate_timer_func':
intel_pstate.c:(.text+0x4ce97e): undefined reference to `__udivdi3'
drivers/built-in.o: In function `intel_pstate_cpu_init':
intel_pstate.c:(.cpuinit.text+0x974): undefined reference to `__udivdi3'
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a P-state driver for the Intel Sandy bridge processor. In cpufreq
terminology this driver implements a scaling driver with an internal
governor.
When built into the the kernel this driver will be the preferred
scaling driver for Sandy bridge processors.
In addition to the interfaces provided by the cpufreq subsystem for
controlling scaling drivers. The user may control the behavior of the
driver via three sysfs files located in
"/sys/devices/system/cpu/intel_pstate".
max_perf_pct: limits the maximum P state that will be requested by
the driver stated as a percentage of the avail performance.
min_perf_pct: limits the minimum P state that will be requested by
the driver stated as a percentage of the avail performance.
no_turbo: limits the driver to selecting P states below the turbo
frequency range.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>