OpenCloudOS-Kernel/arch/x86/kernel
Waiman Long f99fd22e4d x86/hpet: Reduce HPET counter read contention
On a large system with many CPUs, using HPET as the clock source can
have a significant impact on the overall system performance because
of the following reasons:
 1) There is a single HPET counter shared by all the CPUs.
 2) HPET counter reading is a very slow operation.

Using HPET as the default clock source may happen when, for example,
the TSC clock calibration exceeds the allowable tolerance. Something
the performance slowdown can be so severe that the system may crash
because of a NMI watchdog soft lockup, for example.

During the TSC clock calibration process, the default clock source
will be set temporarily to HPET. For systems with many CPUs, it is
possible that NMI watchdog soft lockup may occur occasionally during
that short time period where HPET clocking is active as is shown in
the kernel log below:

[   71.646504] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[   71.655313] Switching to clocksource hpet
[   95.679135] BUG: soft lockup - CPU#144 stuck for 23s! [swapper/144:0]
[   95.693363] BUG: soft lockup - CPU#145 stuck for 23s! [swapper/145:0]
[   95.695580] BUG: soft lockup - CPU#582 stuck for 23s! [swapper/582:0]
[   95.698128] BUG: soft lockup - CPU#357 stuck for 23s! [swapper/357:0]

This patch addresses the above issues by reducing HPET read contention
using the fact that if more than one CPUs are trying to access HPET at
the same time, it will be more efficient when only one CPU in the group
reads the HPET counter and shares it with the rest of the group instead
of each group member trying to read the HPET counter individually.

This is done by using a combination quadword that contains a 32-bit
stored HPET value and a 32-bit spinlock.  The CPU that gets the lock
will be responsible for reading the HPET counter and storing it in
the quadword. The others will monitor the change in HPET value and
lock status and grab the latest stored HPET value accordingly. This
change is only enabled on 64-bit SMP configuration.

On a 4-socket Haswell-EX box with 144 threads (HT on), running the
AIM7 compute workload (1500 users) on a 4.8-rc1 kernel (HZ=1000)
with and without the patch has the following performance numbers
(with HPET or TSC as clock source):

TSC		= 1042431 jobs/min
HPET w/o patch	=  798068 jobs/min
HPET with patch	= 1029445 jobs/min

The perf profile showed a reduction of the %CPU time consumed by
read_hpet from 11.19% without patch to 1.24% with patch.

[ tglx: It's really sad that we need to have such hacks just to deal with
  	the fact that cpu vendors have not managed to fix the TSC wreckage
  	within 15+ years. Were They Forgetting? ]

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Randy Wright <rwright@hpe.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1473182530-29175-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-09 15:16:19 +02:00
..
acpi Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
apic x86/apic: Do not init irq remapping if ioapic is disabled 2016-08-24 09:45:40 +02:00
cpu x86/AMD: Apply erratum 665 on machines without a BIOS fix 2016-09-02 20:42:28 +02:00
fpu x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation 2016-08-10 16:12:26 +02:00
kprobes kprobes/x86: Clear TF bit in fault on single-stepping 2016-06-14 12:00:54 +02:00
.gitignore
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching 2016-05-17 17:11:27 -07:00
alternative.c x86/asm: Stop depending on ptrace.h in alternative.h 2016-04-29 11:56:40 +02:00
amd_gart_64.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
amd_nb.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
apb_timer.c x86/apb_timer: Convert to hotplug state machine 2016-07-15 10:40:22 +02:00
aperture_64.c param: convert some "on"/"off" users to strtobool 2016-03-17 15:09:34 -07:00
apm_32.c x86/apm32: Remove paravirt_enabled() use 2016-04-22 10:29:03 +02:00
asm-offsets.c x86/uaccess: Move thread_info::addr_limit to thread_struct 2016-07-15 10:26:30 +02:00
asm-offsets_32.c x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup 2016-03-10 09:48:14 +01:00
asm-offsets_64.c x86/syscalls: Add syscall entry qualifiers 2016-01-29 09:46:38 +01:00
audit_64.c
bootflag.c x86: don't use module_init for non-modular core bootflag code 2015-06-16 14:12:34 -04:00
check.c Linux 4.2-rc8 2015-08-25 09:59:19 +02:00
cpuid.c new helpers: no_seek_end_llseek{,_size}() 2015-12-23 10:41:31 -05:00
crash.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
crash_dump_32.c
crash_dump_64.c
devicetree.c x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage 2016-04-13 11:37:41 +02:00
doublefault.c
dumpstack.c Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 18:18:04 -07:00
dumpstack_32.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
dumpstack_64.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
e820.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 09:32:27 -07:00
early-quirks.c Linux 4.7 2016-07-26 17:26:29 +10:00
early_printk.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
ebda.c x86/boot: Simplify EBDA-vs-BIOS reservation logic 2016-07-22 11:46:01 +02:00
espfix_64.c x86: get rid of superfluous __GFP_REPEAT 2016-06-24 17:23:52 -07:00
ftrace.c Nothing major this round. Mostly small clean ups and fixes. 2016-03-24 10:52:25 -07:00
head32.c x86/boot: Run reserve_bios_regions() after we initialize the memory map 2016-08-11 11:14:59 +02:00
head64.c x86/boot: Run reserve_bios_regions() after we initialize the memory map 2016-08-11 11:14:59 +02:00
head_32.S Merge branch 'x86/urgent' into x86/asm, to refresh the tree 2016-04-29 11:55:04 +02:00
head_64.S x86/mm: Enable KASLR for physical mapping memory regions 2016-07-08 17:35:15 +02:00
hpet.c x86/hpet: Reduce HPET counter read contention 2016-09-09 15:16:19 +02:00
hw_breakpoint.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
i386_ksyms_32.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
i8237.c
i8253.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
i8259.c x86/irq: Probe for PIC presence before allocating descs for legacy IRQs 2015-11-07 10:37:37 +01:00
io_delay.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
ioport.c x86/iopl: Fix iopl capability check on Xen PV 2016-03-17 09:49:27 +01:00
irq.c x86/irq: Do not substract irq_tlb_count from irq_call_count 2016-08-11 11:14:59 +02:00
irq_32.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
irq_64.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
irqinit.c x86/irq: Store irq descriptor in vector array 2015-08-06 00:14:59 +02:00
jump_label.c x86/asm: Stop depending on ptrace.h in alternative.h 2016-04-29 11:56:40 +02:00
kdebugfs.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
kexec-bzimage64.c KEYS: Generalise system_verify_data() to provide access to internal content 2016-04-06 16:14:24 +01:00
kgdb.c x86/asm: Stop depending on ptrace.h in alternative.h 2016-04-29 11:56:40 +02:00
ksysfs.c
kvm.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
kvmclock.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
ldt.c x86/mm: Factor out LDT init from context init 2016-02-18 19:46:31 +01:00
machine_kexec_32.c
machine_kexec_64.c kexec: provide arch_kexec_protect(unprotect)_crashkres() 2016-05-23 17:04:14 -07:00
mcount_64.S ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it 2016-05-20 13:28:40 -04:00
mmconf-fam10h_64.c
module.c x86/asm: Stop depending on ptrace.h in alternative.h 2016-04-29 11:56:40 +02:00
mpparse.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
msr.c x86/cpufeature: Carve out X86_FEATURE_* 2016-01-30 11:22:17 +01:00
nmi.c x86: include linux/ratelimit.h in nmi.c 2016-06-06 17:10:15 +02:00
nmi_selftest.c
paravirt-spinlocks.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
paravirt.c x86/paravirt: Do not trace _paravirt_ident_*() functions 2016-09-02 09:40:47 -07:00
paravirt_patch_32.c x86/paravirt: Remove the unused irq_enable_sysexit pv op 2015-11-23 10:48:16 +01:00
paravirt_patch_64.c x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op 2015-11-23 10:48:16 +01:00
pci-calgary_64.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
pci-dma.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
pci-iommu_table.c x86: Fix non-static inlines 2016-04-16 13:21:40 +02:00
pci-nommu.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
pci-swiotlb.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
pcspeaker.c
perf_regs.c perf/x86/64: Report regs_user->ax too in get_regs_user() 2015-04-11 13:08:53 +02:00
platform-quirks.c x86/boot: Reorganize and clean up the BIOS area reservation code 2016-07-21 10:11:57 +02:00
pmem.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
probe_roms.c
process.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
process_32.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
process_64.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
ptrace.c x86/ptrace: Stop setting TS_COMPAT in ptrace code 2016-07-27 11:09:43 +02:00
pvclock.c pvclock: introduce seqcount-like API 2016-08-04 13:52:21 +02:00
quirks.c timers/x86/hpet: Type adjustments 2015-10-21 11:17:32 +02:00
reboot.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
reboot_fixups_32.c
relocate_kernel_32.S
relocate_kernel_64.S x86/asm: Replace "MOVQ $imm, %reg" with MOVL 2015-04-01 13:17:39 +02:00
resource.c
rtc.c char/genrtc: x86: remove remnants of asm/rtc.h 2016-06-04 00:20:07 +02:00
setup.c x86/boot: Defer setup_real_mode() to early_initcall time 2016-08-11 11:15:00 +02:00
setup_percpu.c x86/acpi: store ACPI ids from MADT for future usage 2016-07-25 13:30:53 +01:00
signal.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-06 09:04:35 -04:00
signal_compat.c x86/signals: Add build-time checks to the siginfo compat code 2016-06-14 12:19:24 +02:00
smp.c x86/smp: Remove single IPI wrapper 2015-11-05 13:07:54 +01:00
smpboot.c x86/smp: Fix __max_logical_packages value setup 2016-08-18 10:14:48 +02:00
stacktrace.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
step.c Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes 2015-08-18 09:39:47 +02:00
sys_x86_64.c x86/mm: Improve AMD Bulldozer ASLR workaround 2015-03-31 10:01:17 +02:00
sysfb.c
sysfb_efi.c Merge branch 'linus' into efi/core, to pick up fixes 2016-05-07 07:00:07 +02:00
sysfb_simplefb.c
tboot.c x86/tboot: Convert to hotplug state machine 2016-07-15 10:40:30 +02:00
tce_64.c x86/cpufeature: Remove cpu_has_clflush 2016-03-31 13:35:09 +02:00
test_nx.c x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option 2016-02-22 08:51:38 +01:00
test_rodata.c x86: Don't use module.h just for AUTHOR / LICENSE tags 2016-07-14 13:04:20 +02:00
time.c
tls.c x86/tls: Synchronize segment registers in set_thread_area() 2016-04-29 11:56:42 +02:00
tls.h
topology.c x86: Drop bogus __ref / __refdata annotations 2015-07-20 18:57:20 +02:00
trace_clock.c x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites 2015-07-06 15:23:29 +02:00
tracepoint.c
traps.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
tsc.c Merge branch 'linus' into timers/urgent, to pick up fixes 2016-08-10 14:36:23 +02:00
tsc_msr.c x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration 2016-07-11 21:30:12 +02:00
tsc_sync.c x86/asm/tsc/sync: Use rdtsc_ordered() in check_tsc_warp() and drop extra barriers 2015-07-06 15:23:29 +02:00
uprobes.c uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions 2016-08-12 08:29:24 +02:00
verify_cpu.S x86/cpufeature: Carve out X86_FEATURE_* 2016-01-30 11:22:17 +01:00
vm86_32.c x86, bitops: remove use of "sbb" to return CF 2016-06-08 12:41:20 -07:00
vmlinux.lds.S x86/boot: Move compressed kernel to the end of the decompression buffer 2016-04-29 11:03:29 +02:00
vsmp_64.c x86: replace __init_or_module with __init in non-modular vsmp_64.c 2015-06-16 14:12:41 -04:00
x86_init.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
x8664_ksyms_64.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00