Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is reported that there are buggy BIOSes in the world: AMI uses an XSDT
compiler for early BIOSes, this compiler will generate XSDT with a NULL
entry. The affected BIOS versions are "AMI BIOS F2-F4".
Original solution on Linux is to use an alternative heathy root table
instead of the ill one. This commit is:
Commit: 671cc68dc6
Subject: ACPICA: Back port and refine validation of the XSDT root table.
This is an example of such XSDT dumped from B85-HD3 (AMI F3 BIOS):
[000h 0000 4] Signature : "XSDT" [Extended System Description Table]
[004h 0004 4] Table Length : 00000074
[008h 0008 1] Revision : 01
[009h 0009 1] Checksum : 18
[00Ah 0010 6] Oem ID : "ALASKA"
[010h 0016 8] Oem Table ID : "A M I"
[018h 0024 4] Oem Revision : 01072009
[01Ch 0028 4] Asl Compiler ID : "AMI "
[020h 0032 4] Asl Compiler Revision : 00010013
[024h 0036 8] ACPI Table Address 0 : 00000000BA5F8180
[02Ch 0044 8] ACPI Table Address 1 : 00000000BA5F8290
[034h 0052 8] ACPI Table Address 2 : 00000000BA5F8308
[03Ch 0060 8] ACPI Table Address 3 : 00000000BA5F8848
[044h 0068 8] ACPI Table Address 4 : 00000000BA5F9320
[04Ch 0076 8] ACPI Table Address 5 : 00000000BA5F9360
[054h 0084 8] ACPI Table Address 6 : 00000000BA5F9398
[05Ch 0092 8] ACPI Table Address 7 : 00000000BA5F9708
[064h d100 8] ACPI Table Address 8 : 00000000BA5FC9A8
[06Ch 0108 8] ACPI Table Address 9 : 0000000000000000
But according to the bug report, the XSDT in fact is not broken. In the
above XSDT, ACPI Table Address 1-8 contains the same value as RSDT. The
differences can only be seen on the following 2 entries:
1. The first entry points to a FADT whose Revision is 5 while the first
entry in RSDT points to a FADT whose Revision is 2.
The FADT dumped from the address indicated by the first entry of XSDT:
FACP @ 0x00000000BA5F8180
0000: 46 41 43 50 0C 01 00 00<05>4B 41 4C 41 53 4B 41 FACP.....KALASKA
...
The FADT dumped from the address indicated by the first entry of RSDT:
FACP @ 0x00000000BA5ED0F0
0000: 46 41 43 50 84 00 00 00<02>A7 41 4C 41 53 4B 41 FACP......ALASKA
...
2. The last entry is a NULL terminator.
According to the test result, the Revision 5 FADT is accessible. Thus the
original solution turns out to be a work around that is preventing the
higher revision tables to be used for such platforms (they are all x86-64
platforms, and should use XSDT and higher revision FADT).
This patch offers a new solution, where a sanity check is performed before
installing a table address from XSDT. If the entry is NULL, it is simply
discarded.
Note that, this patch doesn't remove the original solution, so for Linux
kernel, this commit is actually a no-op, but it allows acpidump to be
working on such platforms. By doing so, we allow another easy revertable
commit to enable this feature so that when that commit is reverted, the
useful sanity check will not be affected. Lv Zheng.
References: https://bugzilla.kernel.org/show_bug.cgi?id=73911
References: https://bugs.archlinux.org/task/39811
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reported-and-tested-by: Bruce Chiarelli <mano155@gmail.com>
Reported-and-tested-by: Spyros Stathopoulos <spystath@gmail.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch adds "-x" and "-x -x" options to disable XSDT for acpidump.
The single "-x" can be used to stop using XSDT, RSDT will be forced to find
static tables, note that XSDT will still be dumped. The double "-x" can
stop dumping XSDT, which is useful when the XSDT address reported by RSDP
is pointing to an invalid address.
It is reported there are platforms having broken XSDT shipped, acpidump
will stop working while accessing such XSDT. This patch adds new option so
that users can force acpidump to dump tables listed in the RSDT. Lv Zheng.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=73911
Buglink: https://bugs.archlinux.org/task/39811
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reported-and-tested-by: Bruce Chiarelli <mano155@gmail.com>
Reported-and-tested-by: Spyros Stathopoulos <spystath@gmail.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch enforces a rule to always use ACPI_VALIDATE_RSDP_SIG for RSDP
signatures passed from table header or ACPI_SIG_RSDP so that truncated
string comparison can be avoided. This could help to fix the issue that
"RSD " matches but "RSD PTR " doesn't match. Lv Zheng.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch fixes an issue that the while loop is not needed as fread()
should return exact the bytes of expected.
The patch is tested by runing diff against the output of "-c" mode and
the normal mode, and only finds the following differences:
1. table addresses: the "-c" mode will always fill 0x0000000000000000 for
the address.
2. RSDP/RSDT/XSDT: there is no generation of such tables for "-c" mode.
So the test result shows the fix is valid. Lv Zheng.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The command "cpupower frequency-info" can be used when using cpupower to
monitor and test processor behaviour to determine if the processor is
behaving as expected. This data can be compared to the output of
/proc/cpuinfo or the output of
/sys/devices/system/cpu/cpuX/cpufreq/scaling_available_frequencies
to determine if the cpu is in an expected state.
When doing this I noticed comparison test failures due to the way the
data is displayed in cpupower. For example,
[root@intel-s3e37-02 cpupower]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
2262000 2261000 2128000 1995000 1862000 1729000 1596000 1463000 1330000
1197000 1064000
compared to
[root@intel-s3e37-02 cpupower]# cpupower frequency-info
analyzing CPU 0:
driver: acpi-cpufreq
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 10.0 us.
hardware limits: 1.06 GHz - 2.26 GHz
available frequency steps: 2.26 GHz, 2.26 GHz, 2.13 GHz, 2.00 GHz, 1.86 GHz, 1.73 GHz, 1.60 GHz, 1.46 GHz, 1.33 GHz, 1.20 GHz, 1.06 GHz
available cpufreq governors: conservative, userspace, powersave, ondemand, performance
current policy: frequency should be within 1.06 GHz and 2.26 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency is 2.26 GHz (asserted by call to hardware).
boost state support:
Supported: yes
Active: yes
shows very different values for the available frequency steps. The cpupower
output rounds off values at 2 decimal points and this causes problems with
test scripts. For example, with the data above,
1.064 is 1.06
1.197 is 1.20
1.596 is 1.60
1.995 is 2.00
2.128 is 2.13
and most confusingly,
2.261 is 2.26
2.262 is 2.26
Truncating these values serves no real purpose other than making the output
pretty. Since the default has been to round off these values I am adding
a -n/--no-rounding option to the cpupower utility that will display the
data without rounding off the still significant digits.
After patch,
analyzing CPU 0:
driver: acpi-cpufreq
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 10.000 us.
hardware limits: 1.064000 GHz - 2.262000 GHz
available frequency steps: 2.262000 GHz, 2.261000 GHz, 2.128000 GHz, 1.995000 GHz, 1.862000 GHz, 1.729000 GHz, 1.596000 GHz, 1.463000 GHz, 1.330000 GHz, 1.197000 GHz, 1.064000 GHz
available cpufreq governors: conservative, userspace, powersave, ondemand, performance
current policy: frequency should be within 1.064000 GHz and 2.262000 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency is 2.262000 GHz (asserted by call to hardware).
boost state support:
Supported: yes
Active: yes
Acked-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[rjw: Subject]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Intel 64 and IA-32 Architectures Software Developer's Manual says
that TjMax is stored in bits 23:16 of MSR_TEMPERATURE TARGET (0x1a2).
That's 8 bits, not 7, so it must be masked with 0xFF rather than 0x7F.
The manual has no mention of which values should be considered valid,
which kind of implies that they all are. Arbitrarily discarding values
outside a specific range is wrong. The upper range check had to be
fixed recently (commit 144b44b1) and the lower range check is just as
wrong. See bug #75071:
https://bugzilla.kernel.org/show_bug.cgi?id=75071
There are many Xeon processor series with TjMax of 70, 71 or 80
degrees Celsius, way below the arbitrary 85 degrees Celsius limit.
There may be other (past or future) models with even lower limits.
So drop this arbitrary check. The only value that would be clearly
invalid is 0. Everything else should be accepted.
After these changes, turbostat is aligned with what the coretemp
driver does.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Len Brown <len.brown@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There has been confusion all the time about which mailing list to follow
for cpufreq activities, linux-pm@vger.kernel.org or cpufreq@vger.kernel.org.
Since patches sent to cpufreq@vger.kernel.org don't go to Patchwork
which is a maintenance workflow problem, make linux-pm@vger.kernel.org
the official mailing list for cpufreq stuff and remove all references
of cpufreq@vger.kernel.org from kernel source.
Later, we can request that the list be dropped entirely.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This userspace tool accesses the EC through the ec_sys debug driver
(through /sys/kernel/debug/ec/ec0/io).
The EC command/data registers cannot be accessed directly, because they
may be manipulated by the AML interpreter in parallel.
The ec_sys driver synchronizes user space (debug) access with the AML
interpreter.
Signed-off-by: Thomas Renninger <trenn@suse.de>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- bindir is created, but sbindir is used -> fix that
- the debug parts are there twice (copy paste bug?). Remove one of the
exact same parts
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch updates man file of acpidump.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[rjw: Subject]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch updates tools Makefile to use new acpidump.
ACPICA's acpidump relies on various ACPICA components/common/os_specific
source code. They are located in various kernel folders, being searched
and compiled using vpath technique in Makefile. These files include:
1. drivers/acpi/acpica/acapps.h
2. tools/power/acpi/common/getopt.c
3. tools/power/acpi/common/cmfsize.c
4. tools/power/acpi/os_specific/service_layers/oslinuxtbl.c
5. tools/power/acpi/os_specific/service_layers/osunixdir.c
6. tools/power/acpi/os_specific/service_layers/osunixmap.c
This patch has been tested on DELL Inspiron Mini, acpidump output can be
successfully generated by typing the following commands:
# cd tools/power/acpi
# make DEBUG=false
# sudo make install DESTDIR=/opt
# sudo make uninstall DESTDIR=/opt
# make clean
Or
# cd tools
# make acpi
# sudo make acpi_install
# sudo make acpi_uninstall
# make acpi_clean
A kernel build test is also performed on DELL Inspiron Mini to verify that
the changes done to actypes.h and aclinux.h won't affect the kernel
build process.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[rjw: Subject]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch is the generation of a commit that updates release automation
with newly added structures and files that are referenced by the acpidump.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The acpidump is initiated by Bob Moore and Chao Guan, fixed and completed
by Lv Zheng.
This patch is a generation of the commit that adds acpidump release
automation into ACPICA release process. Lv Zheng.
Note that this patch doesn't replace the kernel shipped acpidump with the
new acpidump. The replacement is done by further patches.
Original-by: Chao Guan <guanchao@mail.ustc.edu.cn>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use 8 columns for each number ouput.
We don't fit into 80 columns on most machines,
so keep the format simple.
Print frequency in MHz instead of GHz.
We've got 8 columns now, so use them to
show low frequency in a more natural unit.
Many users didn't understand what %c0 meant,
so re-name it to be %Busy.
Add Avg_MHz column, which is the frequency that many
users expect to see -- the total number of cycles executed
over the measurement interval.
People found the previous GHz to be confusing, since
it was the speed only over the non-idle interval.
That measurement has been re-named Bzy_MHz.
Suggested-by: Dirk J. Brandewie
Signed-off-by: Len Brown <len.brown@intel.com>
Pull turbostat updates from Len Brown.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: introduce -s to dump counters
tools/power turbostat: remove unused command line option
turbostat: Add option to report joules consumed per sample
turbostat: run on HSX
turbostat: Add a .gitignore to ignore the compiled turbostat binary
turbostat: Clean up error handling; disambiguate error messages; use err and errx
turbostat: Factor out common function to open file and exit on failure
turbostat: Add a helper to parse a single int out of a file
turbostat: Check return value of fscanf
turbostat: Use GCC's CPUID functions to support PIC
turbostat: Don't attempt to printf an off_t with %zx
turbostat: Don't put unprocessed uapi headers in the include path
The new option allows just run turbostat and get dump of counter values. It's
useful when we have something more than one program to test.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The -s is not used, let's remove it, and update quick help accordingly.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Add "-J" option to report energy consumed in joules per sample. This option
also adds the sample time to the reported values.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Haswell Xeon has slightly different RAPL support than client HSW,
which prevented the previous version of turbostat from running on HSX.
Signed-off-by: Len Brown <len.brown@intel.com>
Most of turbostat's error handling consists of printing an error (often
including an errno) and exiting. Since perror doesn't support a format
string, those error messages are often ambiguous, such as just showing a
file path, which doesn't uniquely identify which call failed.
turbostat already uses _GNU_SOURCE, so switch to the err and errx
functions from err.h, which take a format string.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Several different functions in turbostat contain the same pattern of
opening a file and exiting on failure. Factor out a common fopen_or_die
function for that.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Many different chunks of code in turbostat open a file, parse a single
int out of it, and close it. Factor that out into a common function.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Some systems declare fscanf with the warn_unused_result attribute. On
such systems, turbostat generates the following warnings:
turbostat.c: In function 'get_core_id':
turbostat.c:1203:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
turbostat.c: In function 'get_physical_package_id':
turbostat.c:1186:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
turbostat.c: In function 'cpu_is_first_core_in_package':
turbostat.c:1169:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
turbostat.c: In function 'cpu_is_first_sibling_in_core':
turbostat.c:1148:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
Fix these by checking the return value of those four calls to fscanf.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat uses inline assembly to call cpuid. On 32-bit x86, on systems
that have certain security features enabled by default that make -fPIC
the default, this causes a build error:
turbostat.c: In function ‘check_cpuid’:
turbostat.c:1906:2: error: PIC register clobbered by ‘ebx’ in ‘asm’
asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
^
GCC provides a header cpuid.h, containing a __get_cpuid function that
works with both PIC and non-PIC. (On PIC, it saves and restores ebx
around the cpuid instruction.) Use that instead.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat uses the format %zx to print an off_t. However, %zx wants a
size_t, not an off_t. On 32-bit targets, those refer to different
types, potentially even with different sizes. Use %llx and a cast
instead, since printf does not have a length modifier for off_t.
Without this patch, when compiling for a 32-bit target:
turbostat.c: In function 'get_msr':
turbostat.c:231:3: warning: format '%zx' expects argument of type 'size_t', but argument 4 has type 'off_t' [-Wformat]
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat's Makefile puts arch/x86/include/uapi/ in the include path, so
that it can include <asm/msr.h> from it. It isn't in general safe to
include even uapi headers directly from the kernel tree without
processing them through scripts/headers_install.sh, but asm/msr.h
happens to work.
However, that include path can break with some versions of system
headers, by overriding some system headers with the unprocessed versions
directly from the kernel source. For instance:
In file included from /build/x86-generic/usr/include/bits/sigcontext.h:28:0,
from /build/x86-generic/usr/include/signal.h:339,
from /build/x86-generic/usr/include/sys/wait.h:31,
from turbostat.c:27:
../../../../arch/x86/include/uapi/asm/sigcontext.h:4:28: fatal error: linux/compiler.h: No such file or directory
This occurs because the system bits/sigcontext.h on that build system
includes <asm/sigcontext.h>, and asm/sigcontext.h in the kernel source
includes <linux/compiler.h>, which scripts/headers_install.sh would have
filtered out.
Since turbostat really only wants a single header, just include that one
header rather than putting an entire directory of kernel headers on the
include path.
In the process, switch from msr.h to msr-index.h, since turbostat just
wants the MSR numbers.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
This patch cleans up old tools/power/acpi Makefile for further porting,
make it compiled in a similar way as the other tools. No functional
changes.
The CFLAGS is modified as follows:
1. Previous cc flags:
-Wall -Wstrict-prototypes -Wdeclaration-after-statement -Os -s \
-D_LINUX -DDEFINE_ALTERNATE_TYPES -I../../../include
2. Current cc flags:
DEBUG=false:
-D_LINUX -DDEFINE_ALTERNATE_TYPES -I../../../include -Wall \
-Wstrict-prototypes -Wdeclaration-after-statement -Os \
-fomit-frame-pointer
Normal:
-D_LINUX -DDEFINE_ALTERNATE_TYPES -I../../../include -Wall \
-Wstrict-prototypes -Wdeclaration-after-statement -O1 -g -DDEBUG
There is only one difference: -fomit-frame-pointer.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* acpi-cleanup: (22 commits)
ACPI / tables: Return proper error codes from acpi_table_parse() and fix comment.
ACPI / tables: Check if id is NULL in acpi_table_parse()
ACPI / proc: Include appropriate header file in proc.c
ACPI / EC: Remove unused functions and add prototype declaration in internal.h
ACPI / dock: Include appropriate header file in dock.c
ACPI / PCI: Include appropriate header file in pci_link.c
ACPI / PCI: Include appropriate header file in pci_slot.c
ACPI / EC: Mark the function acpi_ec_add_debugfs() as static in ec_sys.c
ACPI / NVS: Include appropriate header file in nvs.c
ACPI / OSL: Mark the function acpi_table_checksum() as static
ACPI / processor: initialize a variable to silence compiler warning
ACPI / processor: use ACPI_COMPANION() to get ACPI device
ACPI: correct minor typos
ACPI / sleep: Drop redundant acpi_disabled check
ACPI / dock: Drop redundant acpi_disabled check
ACPI / table: Replace '1' with specific error return values
ACPI: remove trailing whitespace
ACPI / IBFT: Fix incorrect <acpi/acpi.h> inclusion in iSCSI boot firmware module
ACPI / i915: Fix incorrect <acpi/acpi.h> inclusions via <linux/acpi_io.h>
SFI / ACPI: Fix warnings reported during builds with W=1
...
Conflicts:
drivers/acpi/nvs.c
drivers/hwmon/asus_atk0110.c
The cpufreq-set tool has a missing length check. This is basically
just correctness but still should get fixed.
One of a set of sscanf problems reported by Jackie Chang
Signed-off-by: Alan Cox <alan@linux.intel.com>
[rjw: Subject]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If a user calls 'cpupower set --perf-bias 15', the process will end with
a SIGSEGV in libc because cpupower-set passes a NULL optarg to the atoi
call. This is because the getopt_long structure currently has all of
the options as having an optional_argument when they really have a
required argument. We change the structure to use required_argument to
match the short options and it resolves the issue.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1000439
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Thomas Renninger <trenn@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace direct inclusions of <acpi/acpi.h>, <acpi/acpi_bus.h> and
<acpi/acpi_drivers.h>, which are incorrect, with <linux/acpi.h>
inclusions and remove some inclusions of those files that aren't
necessary.
First of all, <acpi/acpi.h>, <acpi/acpi_bus.h> and <acpi/acpi_drivers.h>
should not be included directly from any files that are built for
CONFIG_ACPI unset, because that generally leads to build warnings about
undefined symbols in !CONFIG_ACPI builds. For CONFIG_ACPI set,
<linux/acpi.h> includes those files and for CONFIG_ACPI unset it
provides stub ACPI symbols to be used in that case.
Second, there are ordering dependencies between those files that always
have to be met. Namely, it is required that <acpi/acpi_bus.h> be included
prior to <acpi/acpi_drivers.h> so that the acpi_pci_root declarations the
latter depends on are always there. And <acpi/acpi.h> which provides
basic ACPICA type declarations should always be included prior to any other
ACPI headers in CONFIG_ACPI builds. That also is taken care of including
<linux/acpi.h> as appropriate.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (drivers/pci stuff)
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> (Xen stuff)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
idlestates in sysfs are counted from 0.
This fixes a wrong error message.
Current behavior on a machine with 4 sleep states is:
cpupower idle-set -e 4
Idlestate 4 enabled on CPU 0
-----Wrong---------------------
cpupower idle-set -e 5
Idlestate enabling not supported by kernel
-----Must and now will be -----
cpupower idle-set -e 5
Idlestate 6 not available on CPU 0
-------------------------------
cpupower idle-set -e 6
Idlestate 6 not available on CPU 0
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpupower idle-set subcommand was introduce recently.
This patch provides the missing manpage.
If cpupower is properly installed it will show up automatically
(similar to git), when invoking:
cpupower help idle-set
or
cpupower idle-set --help
Some parts have been taken over and adjusted from
git commit 62d6ae880e
documentation submitted by Carsten Emde.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Support the next generation Intel Atom processor
mirco-architecture, formerly called Silvermont.
The server version, formerly called "Avoton",
is named the "Intel(R) Atom(TM) Processor C2000 Product Family".
The client version, formerly called "Bay Trail",
is named the "Intel Atom Processor Z3000 Series",
as well as various "Intel Pentium Processor"
and "Intel Celeron Processor" brands, depending
on form-factor.
Silvermont has a set of MSRs not far off from NHM,
but the RAPL register set is a sub-set of those previously supported.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This specific processor supports 3 new package sleep states.
Provide a monitor, so that the user can see their usage.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add Haswell model numbers to snb_register() as it also supports the
C-states introduced in SandyBridge processors.
[rjw: Changelog]
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Example:
cpupower idle-set -d 3
will disable C-state 3 on all processors (set commands are active on
all CPUs by default), same as:
cpupower -c all idle-set -d 3
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Latest kernel allows to disable C-states via:
/sys/devices/system/cpu/cpuX/cpuidle/stateY/disable
This patch provides lower level sysfs access functions to make use of
this interface. A later patch will implement the higher level stuff.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use unsigned int as the data type for some variables related to CPU
idle states which allows the code to be simplified slightly.
[rjw: Changelog]
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On platforms with C8-C10 support, the additional C-states cause
turbostat to overrun its output buffer of 128 bytes per CPU. Increase
this to 256 bytes per CPU.
[ As a bugfix, this should go into 3.10; however, since the C8-C10
support didn't go in until after 3.9, this need not go into any stable
kernel. ]
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull idle update from Len Brown:
"Add support for new Haswell-ULT CPU idle power states"
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
intel_idle: initial C8, C9, C10 support
tools/power turbostat: display C8, C9, C10 residency
Display residency in the new C-states, C8, C9, C10.
C8, C9, C10 are present on some:
"Fourth Generation Intel(R) Core(TM) Processors",
which are based on Intel(R) microarchitecture code name Haswell.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Correct spelling typos in various part of printk.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The SMI counter is popular -- so display it by default
rather than requiring an option. What the heck,
we've blown the 80 column budget on many systems already...
Note that the value displayed is the delta
during the measurement interval.
The absolute value of the counter can still be seen with
the generic 32-bit MSR option, ie. -m 0x34
Signed-off-by: Len Brown <len.brown@intel.com>
When verbose is enabled, print the C1E-Enable
bit in MSR_IA32_POWER_CTL.
also delete some redundant tests on the verbose variable.
Signed-off-by: Len Brown <len.brown@intel.com>
This patch enables turbostat to run properly on the
next-generation Intel(R) Microarchitecture, code named "Haswell" (HSW).
HSW supports the BCLK and counters found in SNB.
Signed-off-by: Len Brown <len.brown@intel.com>
Change the default location to install acpidump into from /usr/bin
to /usr/sbin, as this tool needs to be run as root.
[rjw: Subject and changelog]
Signed-off-by: Thomas Renninger <trenn@suse.de>
Tested-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull powertool update from Len Brown:
"This updates the tree w/ the latest version of turbostat, which
reports temperature and - on SNB and later - Watts."
Fix up semantic merge conflict as per Len.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools: Allow tools to be installed in a user specified location
tools/power: turbostat: make Makefile a bit more capable
tools/power x86_energy_perf_policy: close /proc/stat in for_every_cpu()
tools/power turbostat: v3.0: monitor Watts and Temperature
tools/power turbostat: fix output buffering issue
tools/power turbostat: prevent infinite loop on migration error path
x86 power: define RAPL MSRs
tools/power/x86/turbostat: share kernel MSR #defines
When building x86_energy_perf_policy or turbostat within the confines of
a packaging system such as RPM, we need to be able to have it install to
the buildroot and not the root filesystem of the build machine. This
adds a DESTDIR variable that when set will act as a prefix for the
install location of these tools.
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The turbostat Makefile is pretty simple, its output is placed in the
same directory as the source, the install rule has no concept of a
prefix or sysroot, and you can set CC to use a specific compiler but
not use the more familiar CROSS_COMPILE. By making a few minor changes
these limitations are removed while leaving the default behavior
matching what it used to be.
Example build with these changes:
make CROSS_COMPILE=i686-wrs-linux-gnu- DESTDIR=/tmp install
or from the tools directory
make CROSS_COMPILE=i686-wrs-linux-gnu- DESTDIR=/tmp turbostat_install
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Instead of returning out of for_every_cpu() we should break out of the loop=
which will then tidy up correctly by closing the file /proc/stat.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Show power in Watts and temperature in Celsius
when hardware support is present.
Intel's Sandy Bridge and Ivy Bridge processor generations support RAPL
(Run-Time-Average-Power-Limiting). Per the Intel SDM
(Intel® 64 and IA-32 Architectures Software Developer Manual)
RAPL provides hardware energy counters and power control MSRs
(Model Specific Registers). RAPL MSRs are designed primarily
as a method to implement power capping. However, they are useful
for monitoring system power whether or not power capping is used.
In addition, Turbostat now shows temperature from DTS
(Digital Thermal Sensor) and PTM (Package Thermal Monitor) hardware,
if present.
As before, turbostat reads MSRs, and never writes MSRs.
New columns are present in turbostat output:
The Pkg_W column shows Watts for each package (socket) in the system.
On multi-socket systems, the system summary on the 1st row shows the sum
for all sockets together.
The Cor_W column shows Watts due to processors cores.
Note that Core_W is included in Pkg_W.
The optional GFX_W column shows Watts due to the graphics "un-core".
Note that GFX_W is included in Pkg_W.
The optional RAM_W column on server processors shows Watts due to DRAM DIMMS.
As DRAM DIMMs are outside the processor package, RAM_W is not included in Pkg_W.
The optional PKG_% and RAM_% columns on server processors shows the % of time
in the measurement interval that RAPL power limiting is in effect on the
package and on DRAM.
Note that the RAPL energy counters have some limitations.
First, hardware updates the counters about once every milli-second.
This is fine for typical turbostat measurement intervals > 1 sec.
However, when turbostat is used to measure events that approach
1ms, the counters are less useful.
Second, the 32-bit energy counters are subject to wrapping.
For example, a counter incrementing 15 micro-Joule units
on a 130 Watt TDP server processor could (in theory)
roll over in about 9 minutes. Turbostat detects and handles
up to 1 counter overflow per measurement interval.
But when the measurement interval exceeds the guaranteed
counter range, we can't detect if more than 1 overflow occured.
So in this case turbostat indicates that the results are
in question by replacing the fractional part of the Watts
in the output with "**":
Pkg_W Cor_W GFX_W
3** 0** 0**
Third, the RAPL counters are energy (Joule) counters -- they sum up
weighted events in the package to estimate energy consumed. They are
not analong power (Watt) meters. In practice, they tend to under-count
because they don't cover every possible use of energy in the package.
The accuracy of the RAPL counters will vary between product generations,
and between SKU's in the same product generation, and with temperature.
turbostat's -v (verbose) option now displays more power and thermal configuration
information -- as shown on the turbostat.8 manual page.
For example, it now displays the Package and DRAM Thermal Design Power (TDP):
cpu0: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.)
cpu0: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.)
cpu8: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.)
cpu8: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.)
Signed-off-by: Len Brown <len.brown@intel.com>
In periodic mode, turbostat writes to stdout,
but users were un-able to re-direct stdout, eg.
turbostat > outputfile
would result in an empty outputfile.
Signed-off-by: Len Brown <len.brown@intel.com>
If an MSR based monitor is run in parallel this is not needed. This is the
default case on all/most Intel machines.
But when only sysfs info is read via cpupower monitor -m Idle_Stats (typically
the case for non root users) or when other monitors are PCI based (AMD),
Idle_Stats, read from sysfs can be totally bogus:
cpupower monitor -m Idle_Stats
PKG |CORE|CPU | POLL | C1-N | C3-N | C6-N
0| 0| 0| 0.00| 0.00| 0.24| 99.81
0| 0| 32| 0.00| 0.00| 0.00| 100.7
...
0| 17| 20| 0.00| 0.00| 0.00| 173.1
0| 17| 52| 0.00| 0.00| 0.07| 173.0
0| 18| 68| 0.00| 0.00| 0.00| 0.00
0| 18| 76| 0.00| 0.00| 0.00| 0.00
...
With the -c option all cores are woken up and the kernel
did update cpuidle statistics before reading out sysfs.
This causes some overhead. Therefore avoid if possible, use
if needed:
cpupower monitor -c -m Idle_Stats
PKG |CORE|CPU | POLL | C1-N | C3-N | C6-N
0| 0| 0| 0.00| 0.00| 0.00| 100.2
0| 0| 32| 0.00| 0.00| 0.00| 100.2
...
0| 8| 8| 0.00| 0.00| 0.00| 99.82
0| 8| 40| 0.00| 0.00| 0.00| 99.81
0| 9| 24| 0.00| 0.00| 0.00| 100.3
0| 9| 56| 0.00| 0.00| 0.00| 100.2
0| 16| 4| 0.00| 0.00| 0.00| 99.75
0| 16| 36| 0.00| 0.00| 0.00| 99.38
...
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The pkgs member of cpupower_topology is being used as the number of
cpu packages. As the comment in get_cpu_topology notes, the package ids
are not guaranteed to be contiguous. So, simply setting pkgs to the value
of the highest physical_package_id doesn't actually provide a count of
the number of cpu packages. Instead, calculate pkgs by setting it to
the number of distinct physical_packge_id values which is pretty easy
to do after the core_info structs are sorted. Calculating pkgs this
way also has the nice benefit of getting rid of a sign comparison warning
that GCC 4.6 was reporting.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpu_info member of cpupower_topology was being declared as an unnamed
structure. This member was then being malloced using the size of the
parent cpupower_topology * the number of cpus. This works
because cpu_info is smaller than cpupower_topology. However, there is
no guarantee that will always be the case. Making cpu_info its own
top level structure (named cpuid_core_info) allows for mallocing the actual
size of this structure. This also lets us get rid of a redefinition of
the structure in topology.c with slightly different field names.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix a variety of issues with sysfs_topology_read_file:
* The return value of sysfs_topology_read_file function was not properly
being checked for failure.
* The function was reading int valued sysfs variables and then returning
their value. So, even if a function was trying to check the return value
of this function, a caller would not be able to tell an failure code apart
from reading a negative value. This also conflicted with the comment on the
function which said that a return value of 0 indicated success.
* The function was parsing int valued sysfs values with strtoul instead of
strtol.
* The function was non-static even though it was only used in the
file it was declared in.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix minor warnings reported with GCC 4.6:
* The sysfs_write_file function is unused - remove it.
* The pr_mon_len in the print_header function is unsed - remove it.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The files generated by the Makefiles in the debug directories aren't listed
in the .gitignore file in the root of the cpupower tool which causes these
files to show up in the output of 'git status'.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The clean targets from the cpupower tools' Makefiles use brace expansion to
remove some generated files. However, the default shells on many systems do
not support this feature resulting in some generated files not being removed
by clean.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Turbostat assumed if it can't migrate to a CPU, then the CPU
must have gone off-line and turbostat should re-initialize
with the new topology.
But if turbostat can not migrate because it is restricted by
a cpuset, then it will fail to migrate even after re-initialization,
resulting in an infinite loop.
Spit out a warning when we can't migrate
and endure only 2 re-initialize cycles in a row
before giving up and exiting.
Signed-off-by: Len Brown <len.brown@intel.com>
Now that turbostat is built in the kernel tree,
it can share MSR #defines with the kernel.
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Remove duplicated include.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Len Brown <len.brown@intel.com>
Pull kbuild fixes from Michal Marek:
"Here are two fixes I intended to send after v3.6-rc7, but failed to do
so. So please pull them for v3.7-rc1 and they will be picked up by
stable.
The first one fixes gcc -x <language> syntax in various build-time
tests, which icecream and possible other gcc wrappers did not
understand (and yes, icecream is going to be fixed as well).
The second one fixes make tar-pkg so that unpacking the tarball does
not replace the /lib -> /usr/lib symlink on recent Fedora releases."
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: Fix gcc -x syntax
kbuild: Do not package /boot and /lib in make tar-pkg
Counting SMIs is popular, so add a dedicated "-s" option to do it,
and juggle some of the other option letters.
-S is now system summary (was -s)
-c is 32 bit counter (was -d)
-C is 64-bit counter (was -D)
-p is 1st thread in core (was -c)
-P is 1st thread in package (was -p)
bump the minor version number
Signed-off-by: Len Brown <len.brown@intel.com>
The correct syntax for gcc -x is "gcc -x assembler", not
"gcc -xassembler". Even though the latter happens to work, the former
is what is documented in the manual page and thus what gcc wrappers
such as icecream do expect.
This isn't a cosmetic change. The missing space prevents icecream from
recognizing compilation tasks it can't handle, leading to silent kernel
miscompilations.
Besides me, credits go to Michael Matz and Dirk Mueller for
investigating the miscompilation issue and tracking it down to this
incorrect -x parameter syntax.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Cc: Bernhard Walle <bernhard@bwalle.de>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
# turbostat -d 0x34
is useful for printing the number of SMI's within an interval
on Nehalem and newer processors.
where
# turbostat -m 0x34
will simply print out the total SMI count since reset.
Suggested-by: Andi Kleen
Signed-off-by: Len Brown <len.brown@intel.com>
The -M option dumps the specified 64-bit MSR with every sample.
Previously it was output at the end of each line.
However, with the v2 style of printing, the lines are now staggered,
making MSR output hard to read.
So move the MSR output column to the left where things are aligned.
Signed-off-by: Len Brown <len.brown@intel.com>
The "turbo-limit" is the maximum opportunistic processor
speed, assuming no electrical or thermal constraints.
For a given processor, the turbo-limit varies, depending
on the number of active cores. Generally, there is more
opportunity when fewer cores are active.
Under the "-v" verbose option, turbostat would
print the turbo-limits for the four cases
of 1 to 4 cores active.
Expand that capability to cover the cases of turbo
opportunities with up to 16 cores active.
Note that not all hardware platforms supply this information,
and that sometimes a valid limit may be specified for
a core which is not actually present.
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20101221, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
This version finds dynamic tables exported by Linux in
/sys/firmware/acpi/tables/dynamic
Signed-off-by: Yakui Zhao <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20101221, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
This version finds dynamic tables exported by Linux in
/sys/firmware/acpi/tables/dynamic
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20071116, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20070714, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20060606, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20051111, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
Signed-off-by: Len Brown <len.brown@intel.com>
Under some conditions, c1% was displayed as very large number,
much higher than 100%.
c1% is not measured, it is derived as "that, which is left over"
from other counters. However, the other counters are not collected
atomically, and so it is possible for c1% to be calaculagted as
a small negative number -- displayed as very large positive.
There was a check for mperf vs tsc for this already,
but it needed to also include the other counters
that are used to calculate c1.
Signed-off-by: Len Brown <len.brown@intel.com>
Measuring large profoundly-idle configurations
requires turbostat to be more lightweight.
Otherwise, the operation of turbostat itself
can interfere with the measurements.
This re-write makes turbostat topology aware.
Hardware is accessed in "topology order".
Redundant hardware accesses are deleted.
Redundant output is deleted.
Also, output is buffered and
local RDTSC use replaces remote MSR access for TSC.
From a feature point of view, the output
looks different since redundant figures are absent.
Also, there are now -c and -p options -- to restrict
output to the 1st thread in each core, and the 1st
thread in each package, respectively. This is helpful
to reduce output on big systems, where more detail
than the "-s" system summary is desired.
Finally, periodic mode output is now on stdout, not stderr.
Turbostat v2 is also slightly more robust in
handling run-time CPU online/offline events,
as it now checks the actual map of on-line cpus rather
than just the total number of on-line cpus.
Signed-off-by: Len Brown <len.brown@intel.com>
Initial IVB support went into turbostat in Linux-3.1:
553575f1ae
(tools turbostat: recognize and run properly on IVB)
However, when running on IVB, turbostat would fail
to report the new couters added with SNB, c7, pc2 and pc7.
So in scenarios where these counters are non-zero on IVB,
turbostat would report erroneous residencey results.
In particular c7 time would be added to c1 time,
since c1 time is calculated as "that which is left over".
Also, turbostat reports MHz capabilities when passed
the "-v" option, and it would incorrectly report 133MHz
bclk instead of 100MHz bclk for IVB, which would inflate
GHz reported with that option.
This patch is a backport of a fix already included in turbostat v2.
Signed-off-by: Len Brown <len.brown@intel.com>
Linux 3.4 included a modification to turbostat to
lower cross-call overhead by using scheduler affinity:
15aaa34654
(tools turbostat: reduce measurement overhead due to IPIs)
In the use-case where turbostat forks a child program,
that change had the un-intended side-effect of binding
the child to the last cpu in the system.
This change removed the binding before forking the child.
This is a back-port of a fix already included in turbostat v2.
Signed-off-by: Len Brown <len.brown@intel.com>
It's been broken forever (i.e. it's not scheduling in a power
aware fashion), as reported by Suresh and others sending
patches, and nobody cares enough to fix it properly ...
so remove it to make space free for something better.
There's various problems with the code as it stands today, first
and foremost the user interface which is bound to topology
levels and has multiple values per level. This results in a
state explosion which the administrator or distro needs to
master and almost nobody does.
Furthermore large configuration state spaces aren't good, it
means the thing doesn't just work right because it's either
under so many impossibe to meet constraints, or even if
there's an achievable state workloads have to be aware of
it precisely and can never meet it for dynamic workloads.
So pushing this kind of decision to user-space was a bad idea
even with a single knob - it's exponentially worse with knobs
on every node of the topology.
There is a proposal to replace the user interface with a single
3 state knob:
sched_balance_policy := { performance, power, auto }
where 'auto' would be the preferred default which looks at things
like Battery/AC mode and possible cpufreq state or whatever the hw
exposes to show us power use expectations - but there's been no
progress on it in the past many months.
Aside from that, the actual implementation of the various knobs
is known to be broken. There have been sporadic attempts at
fixing things but these always stop short of reaching a mergable
state.
Therefore this wholesale removal with the hopes of spurring
people who care to come forward once again and work on a
coherent replacement.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull ACPI & Power Management changes from Len Brown:
- ACPI 5.0 after-ripples, ACPICA/Linux divergence cleanup
- cpuidle evolving, more ARM use
- thermal sub-system evolving, ditto
- assorted other PM bits
Fix up conflicts in various cpuidle implementations due to ARM cpuidle
cleanups (ARM at91 self-refresh and cpu idle code rewritten into
"standby" in asm conflicting with the consolidation of cpuidle time
keeping), trivial SH include file context conflict and RCU tracing fixes
in generic code.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (77 commits)
ACPI throttling: fix endian bug in acpi_read_throttling_status()
Disable MCP limit exceeded messages from Intel IPS driver
ACPI video: Don't start video device until its associated input device has been allocated
ACPI video: Harden video bus adding.
ACPI: Add support for exposing BGRT data
ACPI: export acpi_kobj
ACPI: Fix logic for removing mappings in 'acpi_unmap'
CPER failed to handle generic error records with multiple sections
ACPI: Clean redundant codes in scan.c
ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed()
ACPI: consistently use should_use_kmap()
PNPACPI: Fix device ref leaking in acpi_pnp_match
ACPI: Fix use-after-free in acpi_map_lsapic
ACPI: processor_driver: add missing kfree
ACPI, APEI: Fix incorrect APEI register bit width check and usage
Update documentation for parameter *notrigger* in einj.txt
ACPI, APEI, EINJ, new parameter to control trigger action
ACPI, APEI, EINJ, limit the range of einj_param
ACPI, APEI, Fix ERST header length check
cpuidle: power_usage should be declared signed integer
...
Sometimes users have turbostat running in interval mode
when they take processors offline/online.
Previously, turbostat would survive, but not gracefully.
Tighten up the error checking so turbostat notices
changesn sooner, and print just 1 line on change:
turbostat: re-initialized with num_cpus %d
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat uses /dev/cpu/*/msr interface to read MSRs.
For modern systems, it reads 10 MSR/CPU. This can
be observed as 10 "Function Call Interrupts"
per CPU per sample added to /proc/interrupts.
This overhead is measurable on large idle systems,
and as Yoquan Song pointed out, it can even trick
cpuidle into thinking the system is busy.
Here turbostat re-schedules itself in-turn to each
CPU so that its MSR reads will always be local.
This replaces the 10 "Function Call Interrupts"
with a single "Rescheduling interrupt" per sample
per CPU.
On an idle 32-CPU system, this shifts some residency from
the shallow c1 state to the deeper c7 state:
# ./turbostat.old -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.29 2.29 0.95 0.02 0.00 98.77 20.23 0.00 77.41 0.00
0.25 1.24 2.29 0.98 0.02 0.00 98.75 20.34 0.03 77.74 0.00
0.27 1.22 2.29 0.54 0.00 0.00 99.18 20.64 0.00 77.70 0.00
0.26 1.22 2.29 1.22 0.00 0.00 98.52 20.22 0.00 77.74 0.00
0.26 1.38 2.29 0.78 0.02 0.00 98.95 20.51 0.05 77.56 0.00
^C
i# ./turbostat.new -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.20 2.29 0.24 0.01 0.00 99.49 20.58 0.00 78.20 0.00
0.27 1.22 2.29 0.25 0.00 0.00 99.48 20.79 0.00 77.85 0.00
0.27 1.20 2.29 0.25 0.02 0.00 99.46 20.71 0.03 77.89 0.00
0.28 1.26 2.29 0.25 0.01 0.00 99.46 20.89 0.02 77.67 0.00
0.27 1.20 2.29 0.24 0.01 0.00 99.48 20.65 0.00 78.04 0.00
cc: Youquan Song <youquan.song@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat -s
cuts down on the amount of output, per user request.
also treak some output whitespace and the man page.
Signed-off-by: Len Brown <len.brown@intel.com>
This patch allows cpupower tool to generate its output files in a
seperate directory. This is now possible by passing the 'O=<path>' to
the command line.
This can be usefull for a normal user if the kernel source code is
located in a read only location.
This is patch stole some bits of the perf makefile.
[linux@dominikbrodowski.net: fix commit message]
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Use the '-p' and '-o' switches to specify the pathname of the output
file to xgettext(1). This avoids to move manually the output file if
xgettext(1) succeeds.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
UTIL_BINS and IDLE_OBJS variables are not defined at all, so
there's no need to remove their content from the 'clean' target.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Looks like some not needed debug code slipped in.
Also this code:
tmp = sysfs_get_idlestate_name(cpu, idlestates - 1);
performs a strdup and the mem was not freed again.
-> delete it.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The number of idle states was wrong.
The POLL idle state (on X86) was missed out:
Number of idle states: 4
Available idle states: C1-NHM C3-NHM C6-NHM
While the POLL is not a real idle state, its
statistics should still be shown. It's now also
explained in a detailed manpage.
This should fix a bug of missing the first idle
state on other archs.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
cpupower-frequency-* manpages slightly differed from the others.
- Use uppercase letters in the title
- Show cpupower Manual in the header
- Remove Mattia from left down corner of the manpage, he is already
listed as author
- Remove --help, prints this message -> not needed
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The last missing manpage for cpupower tools.
More info about other architecture's sleep state specialities would be great.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The name of the monitor is updated at runtime to the name of the
CPU type.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
AMD's BKDG (Bios and Kernel Developers Guide) talks in the CPU spec of their
CPU families about PCI registers defined by "device" (slot) and func(tion).
Assuming that CPU specific configuration PCI devices are always on domain
and bus zero a pci_slot_func_init() func which gets the slot and func of
the desired PCI device passed looks like the most convenient way.
This also obsoletes the PCI device id maintenance.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
This includes initial support for the recently published ACPI 5.0 spec.
In particular, support for the "hardware-reduced" bit that eliminates
the dependency on legacy hardware.
APEI has patches resulting from testing on real hardware.
Plus other random fixes.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits)
acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec
intel_idle: Split up and provide per CPU initialization func
ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2
ACPI processor: Remove unneeded cpuidle_unregister_driver call
intel idle: Make idle driver more robust
intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle
ACPI: kernel-parameters.txt : Add intel_idle.max_cstate
intel_idle: remove redundant local_irq_disable() call
ACPI processor: Fix error path, also remove sysdev link
ACPI: processor: fix acpi_get_cpuid for UP processor
intel_idle: fix API misuse
ACPI APEI: Convert atomicio routines
ACPI: Export interfaces for ioremapping/iounmapping ACPI registers
ACPI: Fix possible alignment issues with GAS 'address' references
ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64)
ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64)
ACPI: Store SRAT table revision
ACPI, APEI, Resolve false conflict between ACPI NVS and APEI
ACPI, Record ACPI NVS regions
ACPI, APEI, EINJ, Refine the fix of resource conflict
...
Field names were shortened: "pkg" is now "pk", "core" is now "cr"
Signed-off-by: Arun Thomas <arun.thomas@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Instead of printing something non-formatted to stdout, call
man(1) to show the man page for the proper subcommand.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Loosely based on a patch for cpufrequtils, submittted by
Sergey Dryabzhinsky <sergey.dryabzhinsky@gmail.com> and
signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
This allows for example:
cpupower -c 2-4,6 monitor -m Mperf
|Mperf
PKG |CORE|CPU | C0 | Cx | Freq
0| 8| 4| 2.42| 97.58| 1353
0| 16| 2| 14.38| 85.62| 1928
0| 24| 6| 1.76| 98.24| 1442
1| 16| 3| 15.53| 84.47| 1650
CPUs always get resorted for package, core then cpu id if it could get read out
(or however you name these topology levels...).
Still this is a nice way to keep the overview if a test binary is bound to
a specific CPU or if one wants to show all CPUs inside a package or similar.
Still missing: Do not measure not available cores to reduce the overhead
and achieve better results.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Before, checking for offlined CPUs was done dirty and
it was checked whether topology parsing returned -1 values.
But this is a valid case on a Xen (and possibly other) kernels.
Do proper online/offline checking, also take CONFIG_HOTPLUG_CPU
option into account (no /sys/devices/../cpuX/online file).
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Which makes the implementation independent from cpufreq drivers.
Therefore this would also work on a Xen kernel where the hypervisor
is doing frequency switching and idle entering.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
* 'tools-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
tools/power turbostat: fit output into 80 columns on snb-ep
tools/power x86_energy_perf_policy: fix print of uninitialized string
Reduce columns for package number to 1.
If you can afford more than 9 packages,
you can also afford a terminal with more than 80 columns:-)
Also shave a column also off the package C-states
Signed-off-by: Len Brown <len.brown@intel.com>
IA32-Intel Devel guide Volume 3A - 14.3.2.1
-------------------------------------------
...
Opportunistic processor performance operation can be disabled by setting bit 38 of
IA32_MISC_ENABLES. This mechanism is intended for BIOS only. If
IA32_MISC_ENABLES[38] is set, CPUID.06H:EAX[1] will return 0.
Better detect things via cpuid, this cleans up the code a bit
and the MSR parts were not working correctly anyway.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: lenb@kernel.org
CC: linux@dominikbrodowski.net
CC: cpufreq@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
This adds the last piece missing from turbostat (if called with -v).
It shows on Intel machines supporting Turbo Boost how many cores
have to be active/idle to enter which boost mode (frequency).
Whether the HW really enters these boost modes can be verified via
./cpupower monitor.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: lenb@kernel.org
CC: linux@dominikbrodowski.net
CC: cpufreq@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
larger sysfs data (>255 bytes) was truncated and thus used improperly
[linux@dominikbrodowski.net: adapted to cpupowerutils]
Signed-off-by: Roman Vasiyarov <rvasiyarov@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
As cpupowerutils is intended to be included into the kernel sources,
use the kernel versioning instead of a custom version.
The script utils/version-gen.sh is largely based on the script already
found in tools/perf/util/PERF-VERSION-GEN .
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Use the quiet/verbose mechanism found in kernel tools, without
relying on the special tool "ccdv"
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
CPU power consumption vs performance tuning is no longer
limited to CPU frequency switching anymore: deep sleep states,
traditional dynamic frequency scaling and hidden turbo/boost
frequencies are tied close together and depend on each other.
The first two exist on different architectures like PPC, Itanium and
ARM, the latter (so far) only on X86. On X86 the APU (CPU+GPU) will
only run most efficiently if CPU and GPU has proper power management
in place.
Users and Developers want to have *one* tool to get an overview what
their system supports and to monitor and debug CPU power management
in detail. The tool should compile and work on as many architectures
as possible.
Once this tool stabilizes a bit, it is intended to replace the
Intel-specific tools in tools/power/x86
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Follow kernel coding style traditions more closely.
Delete typedef, re-name "per cpu counters" to
simply be counters etc.
This patch changes no functionality.
Suggested-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
bug could cause false positive on indicating
presence of invarient TSC or APERF support.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
It is expected to become increasingly important in subsequent generations.
x86_energy_perf_policy is a user-space utility to set the
hardware energy vs performance policy hint in the processor.
Most systems would benefit from "x86_energy_perf_policy normal"
at system startup, as the hardware default is maximum performance
at the expense of energy efficiency.
See x86_energy_perf_policy.8 man page for more information.
Background:
Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
without actually modifying the MSR.
In March, 2010, Venkatesh Pallipadi proposed a small driver
that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
the cpufreq governor in use. It also offered
a boot-time cmdline option to override.
http://lkml.org/lkml/2010/3/4/457
But hiding the hardware policy behind the
governor choice was deemed "kinda icky".
In June, 2010, I proposed a generic user/kernel API to
generalize the power/performance policy trade-off.
"RFC: /sys/power/policy_preference"
http://lkml.org/lkml/2010/6/16/399
That is my preference for implementing this capability,
but I received no support on the list.
So in September, 2010, I sent x86_energy_perf_policy.c to LKML,
a user-space utility that scribbles directly to the MSR.
http://lkml.org/lkml/2010/9/28/246
Here is that same utility, after responding to some review feedback,
to live in tools/power/, where it is easily found.
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat is a Linux tool to observe proper operation
of Intel(R) Turbo Boost Technology.
turbostat displays the actual processor frequency
on x86 processors that include APERF and MPERF MSRs.
Note that turbostat is of limited utility on Linux
kernels 2.6.29 and older, as acpi_cpufreq cleared
APERF/MPERF up through that release.
On Intel Core i3/i5/i7 (Nehalem) and newer processors,
turbostat also displays residency in idle power saving states,
which are necessary for diagnosing any cpuidle issues
that may have an effect on turbo-mode.
See the turbostat.8 man page for example usage.
Signed-off-by: Len Brown <len.brown@intel.com>