There is no point in having an extra type for extra confusion. u64 is
unambiguous.
Conversion was done with the following coccinelle script:
@rem@
@@
-typedef u64 cycle_t;
@fix@
typedef cycle_t;
@@
-cycle_t
+u64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Roland reported that his DELL T5810 sports a value add BIOS which
completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with
random negative TSC_ADJUST values, different on all CPUs. That renders the
TSC useless because the sycnchronization check fails.
Roland tested the new TSC_ADJUST mechanism. While it manages to readjust
the TSCs he needs to disable the TSC deadline timer, otherwise the machine
just stops booting.
Deeper investigation unearthed that the TSC deadline timer is sensitive to
the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an
interrupt storm caused by the TSC deadline timer.
This does not make any sense and it's hard to imagine what kind of hardware
wreckage is behind that misfeature, but it's reliably reproducible on other
systems which have TSC_ADJUST and TSC deadline timer.
While it would be understandable that a big enough negative value which
moves the resulting TSC readout into the negative space could have the
described effect, this happens even with a adjust value of -1, which keeps
the TSC readout definitely in the positive space. The compare register for
the TSC deadline timer is set to a positive value larger than the TSC, but
despite not having reached the deadline the interrupt is raised
immediately. If this happens on the boot CPU, then the machine dies
silently because this setup happens before the NMI watchdog is armed.
Further experiments showed that any other adjustment of TSC_ADJUST works as
expected as long as it stays in the positive range. The direction of the
adjustment has no influence either. See the lkml link for further analysis.
Yet another proof for the theory that timers are designed by janitors and
the underlying (obviously undocumented) mechanisms which allow BIOSes to
wreckage them are considered a feature. Well done Intel - NOT!
To address this wreckage add the following sanity measures:
- If the TSC_ADJUST value on the boot cpu is not 0, set it to 0
- If the TSC_ADJUST value on any cpu is negative, set it to 0
- Prevent the cross package synchronization mechanism from setting negative
TSC_ADJUST values.
Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
Cc: Kevin Stanton <kevin.b.stanton@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Allen Hung <allen_hung@dell.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some 'feature' BIOSes fiddle with the TSC_ADJUST register during
suspend/resume which renders the TSC unusable.
Add sanity checks into the resume path and restore the
original value if it was adjusted.
Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
Cc: Kevin Stanton <kevin.b.stanton@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Allen Hung <allen_hung@dell.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161213131211.317654500@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful
in a two aspects:
1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the
cycles spent by storing the TSC value at SMM entry and restoring it at
SMM exit. On affected machines the TSCs run slowly out of sync up to the
point where the clocksource watchdog (if available) detects it.
The TSC_ADJUST MSR allows to detect the TSC modification before that and
eventually restore it. This is also important for SoCs which have no
watchdog clocksource and therefore TSC wreckage cannot be detected and
acted upon.
2) All threads in a package are required to have the same TSC_ADJUST
value. Broken BIOSes break that and as a result the TSC synchronization
check fails.
The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
online. If detected set it to the value of an already online CPU in the
same package. This also allows to reduce the number of sync tests
because with that in place the test is only required for the first CPU
in a package.
In principle all CPUs in a system should have the same TSC_ADJUST value
even across packages, but with physical CPU hotplug this assumption is
not true because the TSC starts with power on, so physical hotplug has
to do some trickery to bring the TSC into sync with already running
packages, which requires to use an TSC_ADJUST value different from CPUs
which got powered earlier.
A final enhancement is the opportunity to compensate for unsynced TSCs
accross nodes at boot time and make the TSC usable that way. It won't
help for TSCs which run apart due to frequency skew between packages,
but this gets detected by the clocksource watchdog later.
The first step toward this is to store the TSC_ADJUST value of a starting
CPU and compare it with the value of an already online CPU in the same
package. If they differ, emit a warning and adjust it to the reference
value. The !SMP version just stores the boot value for later verification.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.655323776@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The art detection uses rdmsrl_safe() to detect the availablity of the
TSC_ADJUST MSR.
That's pointless because we have a feature bit for this. Use it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.483561692@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
All places which used the TSC_RELIABLE to skip the delayed calibration
have been converted to use the TSC_KNOWN_FREQ flag.
Make the immeditate clocksource registration, which skips the long term
calibration, solely depend on TSC_KNOWN_FREQ.
The TSC_RELIABLE now merily removes the requirement for a watchdog
clocksource.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bin Gao <bin.gao@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
On Intel GOLDMONT Atom SoC TSC is the only available clocksource, so there
is no way to do software calibration or have a watchdog clocksource for it.
Software calibration is already disabled via the TSC_KNOWN_FREQ flag, but
the watchdog requirement still persists, so such systems cannot switch to
high resolution/nohz mode.
Mark it reliable, so it becomes usable. Hardware teams confirmed that this
is safe on that SoC.
Signed-off-by: Bin Gao <bin.gao@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1479241644-234277-4-git-send-email-bin.gao@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CPUs/SoCs with CPUID leaf 0x15 come with a known frequency and will report
the frequency to software via CPUID instruction. This hardware provided
frequency is the "real" frequency of TSC.
Set the X86_FEATURE_TSC_KNOWN_FREQ flag for such systems to skip the
software calibration process.
A 24 hours test on one of the CPUID 0x15 capable platforms was
conducted. PIT calibrated frequency resulted in more than 3 seconds drift
whereas the CPUID determined frequency showed less than 0.5 second
drift.
Signed-off-by: Bin Gao <bin.gao@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1479241644-234277-3-git-send-email-bin.gao@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The X86_FEATURE_TSC_RELIABLE flag in Linux kernel implies both reliable
(at runtime) and trustable (at calibration). But reliable running and
trustable calibration independent of each other.
Add a new flag X86_FEATURE_TSC_KNOWN_FREQ, which denotes that the frequency
is known (via MSR/CPUID). This flag is only meant to skip the long term
calibration on systems which have a known frequency.
Add X86_FEATURE_TSC_KNOWN_FREQ to the skip the delayed calibration and
leave X86_FEATURE_TSC_RELIABLE in place.
After converting the existing users of X86_FEATURE_TSC_RELIABLE to use
either both flags or just X86_FEATURE_TSC_KNOWN_FREQ we can seperate the
functionality.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bin Gao <bin.gao@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1479241644-234277-2-git-send-email-bin.gao@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via
CPUID") added code to retrieve the crystal and TSC frequency from CPUID
leaves. If the crystal freqency is enumerated as 0,the resulting TSC
frequency is 0 as well. For CPUs with a known fixed crystal frequency a
quirk list is available to set the frequency,
Kabylake and SkylakeX CPUs are missing in the list of CPUs which need this
quirk. Add them so the TSC frequency can be calculated correctly.
[ tglx: Removed the silly default case as the switch() is only invoked when
cpu_khz is 0. Massaged changelog. ]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/1474289501-31717-3-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
asm/intel-family.h contains defines for cpu ids which should be used
instead of hex constants. Convert the switch case in native_calibrate_tsc()
to use the defines before adding more cpu models.
[ tglx: Massaged changelog ]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/1474289501-31717-2-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch eliminates a source of imprecise APIC timer interrupts,
which imprecision may result in double interrupts or even late
interrupts.
The TSC deadline clockevent devices' configuration and registration
happens before the TSC frequency calibration is refined in
tsc_refine_calibration_work().
This results in the TSC clocksource and the TSC deadline clockevent
devices being configured with slightly different frequencies: the former
gets the refined one and the latter are configured with the inaccurate
frequency detected earlier by means of the "Fast TSC calibration using PIT".
Within the APIC code, introduce the notifier function
lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
clockevent devices with the current tsc_khz.
Call it from the TSC code after TSC calibration refinement has happened.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
[ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 header cleanups from Ingo Molnar:
"This tree is a cleanup of the x86 tree reducing spurious uses of
module.h - which should improve build performance a bit"
* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads
x86/apic: Remove duplicated include from probe_64.c
x86/ce4100: Remove duplicated include from ce4100.c
x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t
x86/platform: Delete extraneous MODULE_* tags fromm ts5500
x86: Audit and remove any remaining unnecessary uses of module.h
x86/kvm: Audit and remove any unnecessary uses of module.h
x86/xen: Audit and remove any unnecessary uses of module.h
x86/platform: Audit and remove any unnecessary uses of module.h
x86/lib: Audit and remove any unnecessary uses of module.h
x86/kernel: Audit and remove any unnecessary uses of module.h
x86/mm: Audit and remove any unnecessary uses of module.h
x86: Don't use module.h just for AUTHOR / LICENSE tags
check_tsc_disabled() was introduced by commit:
c73deb6aec ("perf/x86: Add ability to calculate TSC from perf sample timestamps")
The only caller was arch_perf_update_userpage(), which had been refactored
by commit:
d8b11a0cbd ("perf/x86: Clean up cap_user_time* setting")
... so no need keep and export it any more.
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: adrian.hunter@intel.com
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1468570330-25810-1-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.
Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed. Build testing
revealed some implicit header usage that was fixed up accordingly.
Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Hard code the BXT crystal clock (aka ART - Always Running Timer)
to 19.200 MHz, and use CPUID leaf 0x15 to determine the BXT TSC frequency.
Use tsc_khz to sanity check BXT cpu_khz,
which can be erroneous in some configurations.
(I simplified the original patch from Bin Gao.)
Original-From: Bin Gao <bin.gao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/bf4e7c175acd6d09719c47c319b10ff1f0627ff8.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Skylake CPU base-frequency and TSC frequency may differ
by up to 2%.
Enumerate CPU and TSC frequencies separately, allowing
cpu_khz and tsc_khz to differ.
The existing CPU frequency calibration mechanism is unchanged.
However, CPUID extensions are preferred, when available.
CPUID.0x16 is preferred over MSR and timer calibration
for CPU frequency discovery.
CPUID.0x15 takes precedence over CPU-frequency
for TSC frequency discovery.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/b27ec289fd005833b27d694d9c2dbb716c5cdff7.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove the irqoff/irqon around MSR-based TSC enumeration,
as it is not necessary.
Also rename: try_msr_calibrate_tsc() to cpu_khz_from_msr(),
as that better describes what the routine does.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/a6b5c3ecd3b068175d2309599ab28163fc34215e.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The topology_core_cpumask is used to find a neighbour cpu in
calibrate_delay_is_known(). It might not be allocated at the first invocation
of that function on the boot cpu, when CONFIG_CPUMASK_OFFSTACK is set.
The mask is allocated later in native_smp_prepare_cpus. As a consequence the
underlying find_next_bit() call dereferences a NULL pointer.
Add a proper check to prevent this.
Fixes: c25323c073 "x86/tsc: Use topology functions"
Reported-and-tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603180843270.3978@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull timer updates from Thomas Gleixner:
"The timer department delivers this time:
- Support for cross clock domain timestamps in the core code plus a
first user. That allows more precise timestamping for PTP and
later for audio and other peripherals.
The ptp/e1000e patches have been acked by the relevant maintainers
and are carried in the timer tree to avoid merge ordering issues.
- Support for unregistering the current clocksource watchdog. That
lifts a limitation for switching clocksources which has been there
from day 1
- The usual pile of fixes and updates to the core and the drivers.
Nothing outstanding and exciting"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
time/timekeeping: Work around false positive GCC warning
e1000e: Adds hardware supported cross timestamp on e1000e nic
ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping
x86/tsc: Always Running Timer (ART) correlated clocksource
hrtimer: Revert CLOCK_MONOTONIC_RAW support
time: Add history to cross timestamp interface supporting slower devices
time: Add driver cross timestamp interface for higher precision time synchronization
time: Remove duplicated code in ktime_get_raw_and_real()
time: Add timekeeping snapshot code capturing system time and counter
time: Add cycles to nanoseconds translation
jiffies: Use CLOCKSOURCE_MASK instead of constant
clocksource: Introduce clocksource_freq2mult()
clockevents/drivers/exynos_mct: Implement ->set_state_oneshot_stopped()
clockevents/drivers/arm_global_timer: Implement ->set_state_oneshot_stopped()
clockevents/drivers/arm_arch_timer: Implement ->set_state_oneshot_stopped()
clocksource/drivers/arm_global_timer: Register delay timer
clocksource/drivers/lpc32xx: Support timer-based ARM delay
clocksource/drivers/lpc32xx: Support periodic mode
clocksource/drivers/lpc32xx: Don't use the prescaler counter for clockevents
clocksource/drivers/rockchip: Add err handle for rk_timer_init
...
On modern Intel systems TSC is derived from the new Always Running Timer
(ART). ART can be captured simultaneous to the capture of
audio and network device clocks, allowing a correlation between timebases
to be constructed. Upon capture, the driver converts the captured ART
value to the appropriate system clock using the correlated clocksource
mechanism.
On systems that support ART a new CPUID leaf (0x15) returns parameters
“m” and “n” such that:
TSC_value = (ART_value * m) / n + k [n >= 1]
[k is an offset that can adjusted by a privileged agent. The
IA32_TSC_ADJUST MSR is an example of an interface to adjust k.
See 17.14.4 of the Intel SDM for more details]
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
[jstultz: Tweaked to fix build issue, also reworked math for
64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build
fixes]
Signed-off-by: John Stultz <john.stultz@linaro.org>
It's simpler to look at the topology mask than iterating over all online cpus
to find a cpu on the same package.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Commit:
b20112edea ("perf/x86: Improve accuracy of perf/sched clock")
allowed the time_shift value in perf_event_mmap_page to be as much
as 32. Unfortunately the documented algorithms for using time_shift
have it shifting an integer, whereas to work correctly with the value
32, the type must be u64.
In the case of perf tools, Intel PT decodes correctly but the timestamps
that are output (for example by perf script) have lost 32-bits of
granularity so they look like they are not changing at all.
Fix by limiting the shift to 31 and adjusting the multiplier accordingly.
Also update the documentation of perf_event_mmap_page so that new code
based on it will be more future-proof.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: b20112edea ("perf/x86: Improve accuracy of perf/sched clock")
Link: http://lkml.kernel.org/r/1445001845-13688-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
- misc fixes all around the map
- block non-root vm86(old) if mmap_min_addr != 0
- two small debuggability improvements
- removal of obsolete paravirt op
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform: Fix Geode LX timekeeping in the generic x86 build
x86/apic: Serialize LVTT and TSC_DEADLINE writes
x86/ioapic: Force affinity setting in setup_ioapic_dest()
x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method
x86/ldt: Fix small LDT allocation for Xen
x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text
x86/cpu: Print family/model/stepping in hex
x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0
x86/alternatives: Make optimize_nops() interrupt safe and synced
x86/mm/srat: Print non-volatile flag in SRAT
x86/cpufeatures: Enable cpuid for Intel SHA extensions
In 2007, commit 07190a08ee ("Mark TSC on GeodeLX reliable")
bypassed verification of the TSC on Geode LX. However, this code
(now in the check_system_tsc_reliable() function in
arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was
set.
OpenWRT has recently started building its generic Geode target
for Geode GX, not LX, to include support for additional
platforms. This broke the timekeeping on LX-based devices,
because the TSC wasn't marked as reliable:
https://dev.openwrt.org/ticket/20531
By adding a runtime check on is_geode_lx(), we can also include
the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus
fixing the problem.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When TSC is stable perf/sched clock is based on it.
However the conversion from cycles to nanoseconds
is not as accurate as it could be. Because
CYC2NS_SCALE_FACTOR is 10, the accuracy is +/- 1/2048
The change is to calculate the maximum shift that
results in a multiplier that is still a 32-bit number.
For example all frequencies over 1 GHz will have
a shift of 32, making the accuracy of the conversion
+/- 1/(2^33). That is achieved by using the
'clocks_calc_mult_shift()' function.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1440147918-22250-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull locking and atomic updates from Ingo Molnar:
"Main changes in this cycle are:
- Extend atomic primitives with coherent logic op primitives
(atomic_{or,and,xor}()) and deprecate the old partial APIs
(atomic_{set,clear}_mask())
The old ops were incoherent with incompatible signatures across
architectures and with incomplete support. Now every architecture
supports the primitives consistently (by Peter Zijlstra)
- Generic support for 'relaxed atomics':
- _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
- atomic_read_acquire()
- atomic_set_release()
This came out of porting qwrlock code to arm64 (by Will Deacon)
- Clean up the fragile static_key APIs that were causing repeat bugs,
by introducing a new one:
DEFINE_STATIC_KEY_TRUE(name);
DEFINE_STATIC_KEY_FALSE(name);
which define a key of different types with an initial true/false
value.
Then allow:
static_branch_likely()
static_branch_unlikely()
to take a key of either type and emit the right instruction for the
case. To be able to know the 'type' of the static key we encode it
in the jump entry (by Peter Zijlstra)
- Static key self-tests (by Jason Baron)
- qrwlock optimizations (by Waiman Long)
- small futex enhancements (by Davidlohr Bueso)
- ... and misc other changes"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
jump_label/x86: Work around asm build bug on older/backported GCCs
locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
locking/static_keys: Make verify_keys() static
jump label, locking/static_keys: Update docs
locking/static_keys: Provide a selftest
jump_label: Provide a self-test
s390/uaccess, locking/static_keys: employ static_branch_likely()
x86, tsc, locking/static_keys: Employ static_branch_likely()
locking/static_keys: Add selftest
locking/static_keys: Add a new static_key interface
locking/static_keys: Rework update logic
locking/static_keys: Add static_key_{en,dis}able() helpers
...
Pull x86 asm changes from Ingo Molnar:
"The biggest changes in this cycle were:
- Revamp, simplify (and in some cases fix) Time Stamp Counter (TSC)
primitives. (Andy Lutomirski)
- Add new, comprehensible entry and exit handlers written in C.
(Andy Lutomirski)
- vm86 mode cleanups and fixes. (Brian Gerst)
- 32-bit compat code cleanups. (Brian Gerst)
The amount of simplification in low level assembly code is already
palpable:
arch/x86/entry/entry_32.S | 130 +----
arch/x86/entry/entry_64.S | 197 ++-----
but more simplifications are planned.
There's also the usual laudry mix of low level changes - see the
changelog for details"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (83 commits)
x86/asm: Drop repeated macro of X86_EFLAGS_AC definition
x86/asm/msr: Make wrmsrl() a function
x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
x86/asm: Add MONITORX/MWAITX instruction support
x86/traps: Weaken context tracking entry assertions
x86/asm/tsc: Add rdtscll() merge helper
selftests/x86: Add syscall_nt selftest
selftests/x86: Disable sigreturn_64
x86/vdso: Emit a GNU hash
x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks
x86/entry/32: Migrate to C exit path
x86/entry/32: Remove 32-bit syscall audit optimizations
x86/vm86: Rename vm86->v86flags and v86mask
x86/vm86: Rename vm86->vm86_info to user_vm86
x86/vm86: Clean up vm86.h includes
x86/vm86: Move the vm86 IRQ definitions to vm86.h
x86/vm86: Use the normal pt_regs area for vm86
x86/vm86: Eliminate 'struct kernel_vm86_struct'
x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'
x86/vm86: Move vm86 fields out of 'thread_struct'
...
PEBSv3 has a raw TSC time stamp in its memory buffer that
later needs to to be converted to perf_clock.
Add a native_sched_clock_from_tsc() that works the same
as native_sched_clock(), but starts with an already given
TSC value.
Paravirt is ignored, it will just get the native clock.
But there isn't a para virtualized PEBS anyway.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Because of the static_key restrictions we had to take an unconditional
jump for the most likely case, causing $I bloat.
Rewrite to use the new primitives.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are two logical changes here. First, this removes a check
for cpu_has_tsc. That check is unnecessary, as we don't
register the TSC as a clocksource on systems that have no TSC.
Second, it adds a barrier, thus preventing observable
non-monotonicity.
I suspect that the missing barrier was never a problem in
practice because system calls themselves were heavy enough
barriers to prevent user code from observing time warps due to
speculation. (Without the corresponding barrier in the vDSO,
however, non-monotonicity is easy to detect.)
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/c6ff621a053127a65b70f175443578db7a0711be.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that the ->read_tsc() paravirt hook is gone, rdtscll() is
just a wrapper around native_read_tsc(). Unwrap it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/d2449ae62c1b1fb90195bcfb19ef4a35883a04dc.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the following commit:
cdc7957d19 ("x86: move native_read_tsc() offline")
... native_read_tsc() was moved out of line, presumably for some
now-obsolete vDSO-related reason. Undo it.
The entire rdtsc, shl, or sequence is only 11 bytes, and calls
via rdtscl() and similar helpers were already inlined.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/d05ffe2aaf8468ca475ebc00efad7b2fa174af19.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If it takes longer than 12us to read the PIT counter lsb/msb,
then the error margin will never fall below 500ppm within 50ms,
and Fast TSC calibration will always fail.
This patch detects when that will happen and fails fast. Note
the failure message is not printed in that case because:
1. it will always happen on that class of hardware
2. the absence of the message is more informative than its
presence
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/556EB717.9070607@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Many users see this message when booting without knowning that it is
of no importance and that TSC calibration may have succeeded by
another way.
As explained by Paul Bolle in
http://lkml.kernel.org/r/1348488259.1436.22.camel@x61.thuisdomein
"Fast TSC calibration failed" should not be considered as an error
since other calibration methods are being tried afterward. At most,
those send a warning if they fail (not an error). So let's change
the message from error to warning.
[ tglx: Make if pr_info. It's really not important at all ]
Fixes: c767a54ba0 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1418106470-6906-1-git-send-email-alexandre.f.demers@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If the TSC is unusable or disabled, then this patch fixes:
- Confusion while trying to clear old APIC interrupts.
- Division by zero and incorrect programming of the TSC deadline
timer.
This fixes boot if the CPU has a TSC deadline timer but a missing or
broken TSC. The failure to boot can be observed with qemu using
-cpu qemu64,-tsc,+tsc-deadline
This also happens to me in nested KVM for unknown reasons.
With this patch, I can boot cleanly (although without a TSC).
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Bandan Das <bsd@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull timer and time updates from Thomas Gleixner:
"A rather large update of timers, timekeeping & co
- Core timekeeping code is year-2038 safe now for 32bit machines.
Now we just need to fix all in kernel users and the gazillion of
user space interfaces which rely on timespec/timeval :)
- Better cache layout for the timekeeping internal data structures.
- Proper nanosecond based interfaces for in kernel users.
- Tree wide cleanup of code which wants nanoseconds but does hoops
and loops to convert back and forth from timespecs. Some of it
definitely belongs into the ugly code museum.
- Consolidation of the timekeeping interface zoo.
- A fast NMI safe accessor to clock monotonic for tracing. This is a
long standing request to support correlated user/kernel space
traces. With proper NTP frequency correction it's also suitable
for correlation of traces accross separate machines.
- Checkpoint/restart support for timerfd.
- A few NOHZ[_FULL] improvements in the [hr]timer code.
- Code move from kernel to kernel/time of all time* related code.
- New clocksource/event drivers from the ARM universe. I'm really
impressed that despite an architected timer in the newer chips SoC
manufacturers insist on inventing new and differently broken SoC
specific timers.
[ Ed. "Impressed"? I don't think that word means what you think it means ]
- Another round of code move from arch to drivers. Looks like most
of the legacy mess in ARM regarding timers is sorted out except for
a few obnoxious strongholds.
- The usual updates and fixlets all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
timekeeping: Fixup typo in update_vsyscall_old definition
clocksource: document some basic timekeeping concepts
timekeeping: Use cached ntp_tick_length when accumulating error
timekeeping: Rework frequency adjustments to work better w/ nohz
timekeeping: Minor fixup for timespec64->timespec assignment
ftrace: Provide trace clocks monotonic
timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
seqcount: Add raw_write_seqcount_latch()
seqcount: Provide raw_read_seqcount()
timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
timekeeping: Create struct tk_read_base and use it in struct timekeeper
timekeeping: Restructure the timekeeper some more
clocksource: Get rid of cycle_last
clocksource: Move cycle_last validation to core code
clocksource: Make delta calculation a function
wireless: ath9k: Get rid of timespec conversions
drm: vmwgfx: Use nsec based interfaces
drm: i915: Use nsec based interfaces
timekeeping: Provide ktime_get_raw()
hangcheck-timer: Use ktime_get_ns()
...
Pull x86 build/cleanup/debug updates from Ingo Molnar:
"Robustify the build process with a quirk to avoid GCC reordering
related bugs.
Two code cleanups.
Simplify entry_64.S CFI annotations, by Jan Beulich"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, build: Change code16gcc.h from a C header to an assembly header
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Simplify __HAVE_ARCH_CMPXCHG tests
x86/tsc: Get rid of custom DIV_ROUND() macro
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Drop several unnecessary CFI annotations
The only user of the cycle_last validation is the x86 TSC. In order to
provide NMI safe accessor functions for clock monotonic and
monotonic_raw we need to do that in the core.
We can't do the TSC specific
if (now < cycle_last)
now = cycle_last;
for the other wrapping around clocksources, but TSC has
CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
now is less than cycle_last the subtraction will give a negative
result. So we can check for that in clocksource_delta() and return 0
for that case.
Implement and enable it for x86
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver
locked up when doing cpu hotplug.
Because we called set_cyc2ns_scale() from the time_cpufreq_notifier()
unconditionally, it gets called multiple times for each freq change,
instead of only the once, when the tsc_khz value actually changes.
Because it gets called more than once, we run out of cyc2ns data slots
and stall, waiting for a free one, but because we're half way offline,
there's no consumers to free slots.
By placing the call inside the condition that actually changes tsc_khz
we avoid superfluous calls and avoid the problem.
Reported-by: Mauro <registosites@hotmail.com>
Tested-by: Mauro <registosites@hotmail.com>
Fixes: 20d1c86a57 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs")
Cc: <stable@vger.kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Bin Gao <bin.gao@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>