linux-sg2042/arch/x86/kernel
Jeremy Fitzhardinge 545ac13892 x86, spinlock: Replace pv spinlocks with pv ticketlocks
Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Results:
=======
setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
base = 3.11-rc
patched = base + pvspinlock V12

+-----------------+----------------+--------+
 dbench (Throughput in MB/sec. Higher is better)
+-----------------+----------------+--------+
|   base (stdev %)|patched(stdev%) | %gain  |
+-----------------+----------------+--------+
| 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
|  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
|   848.6   (4.3) |  967.8   (4.3) |  14.0  |
|   652.9   (3.5) |  685.3   (3.7) |   5.0  |
+-----------------+----------------+--------+

pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
and undercommits results are flat

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
[ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09 07:53:05 -07:00
..
acpi Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-07-18 17:39:05 -07:00
apic x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
cpu x86/mce: Fix mce regression from recent cleanup 2013-07-29 11:23:27 -07:00
kprobes kprobes: Fix arch_prepare_kprobe to handle copy insn failures 2013-06-20 14:25:48 +02:00
.gitignore
Makefile Merge branch 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-07-02 16:31:49 -07:00
alternative.c x86, cpu: Expand cpufeature facility to include cpu bugs 2013-04-02 10:12:52 -07:00
amd_gart_64.c x86, mm: use pfn_range_is_mapped() with gart 2012-11-17 11:59:10 -08:00
amd_nb.c Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-04-30 08:42:45 -07:00
apb_timer.c Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-19 20:11:07 -08:00
aperture_64.c x86/mm/gart: Drop unnecessary check 2013-04-16 10:54:40 +02:00
apm_32.c cpuidle: remove en_core_tk_irqen flag 2013-04-23 13:45:22 +02:00
asm-offsets.c x86, um/x86: switch to generic sys_execve and kernel_execve 2012-09-30 22:53:32 -04:00
asm-offsets_32.c x86: Get rid of ->hard_math and all the FPU asm fu 2013-06-06 14:32:04 -07:00
asm-offsets_64.c x86, gdt, hibernate: Store/load GDT for hibernate path. 2013-05-02 11:27:35 -07:00
audit_64.c
bootflag.c
check.c x86: kernel/check.c simple_strtoul cleanup 2012-05-15 15:36:41 -07:00
cpuid.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
crash.c x86/kexec: crash_vmclear_local_vmcss needs __rcu 2012-12-11 19:55:23 -02:00
crash_dump_32.c
crash_dump_64.c
devicetree.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
doublefault.c x86: Extend #DF debugging aid to 64-bit 2013-05-13 13:42:44 -07:00
dumpstack.c dump_stack: consolidate dump_stack() implementations and unify their behaviors 2013-04-30 17:04:02 -07:00
dumpstack_32.c dump_stack: unify debug information printed by show_regs() 2013-04-30 17:04:02 -07:00
dumpstack_64.c dump_stack: unify debug information printed by show_regs() 2013-04-30 17:04:02 -07:00
e820.c x86, mm: Let "memmap=" take more entries one time 2012-11-17 11:59:51 -08:00
early-quirks.c iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets 2013-04-18 17:00:47 +02:00
early_printk.c early_printk: consolidate random copies of identical code 2013-04-29 18:28:13 -07:00
entry_32.S x86, trace: Add irq vector tracepoints 2013-06-20 22:25:34 -07:00
entry_64.S Merge branch 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-07-02 16:31:49 -07:00
ftrace.c x86/ftrace: Use __pa_symbol instead of __pa on C visible symbols 2012-11-16 16:42:09 -08:00
head.c x86: Make sure we can boot in the case the BDA contains pure garbage 2013-02-27 13:38:57 -08:00
head32.c x86: Merge early kernel reserve for 32bit and 64bit 2013-01-29 19:32:58 -08:00
head64.c x86: Fix bit corruption at CPU resume time 2013-05-20 11:36:03 -07:00
head_32.S x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
head_64.S x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
hpet.c x86, hpet: Introduce x86_msi_ops.setup_hpet_msi 2013-01-28 10:48:30 +01:00
hw_breakpoint.c ptrace/x86: flush_ptrace_hw_breakpoint() shoule clear the virtual debug registers 2013-07-09 10:33:26 -07:00
i386_ksyms_32.c x86-32: Add support for 64bit get_user() 2013-02-07 15:07:28 -08:00
i387.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
i8237.c
i8253.c
i8259.c x86/irq/i8259: Fix incorrect comment 2012-08-22 09:34:24 +02:00
io_delay.c
ioport.c x86: get rid of pt_regs argument of iopl(2) 2013-02-03 18:16:24 -05:00
irq.c trace,x86: Move creation of irq tracepoints from apic.c to irq.c 2013-06-21 10:33:28 -04:00
irq_32.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
irq_64.c
irq_work.c x86, trace: Add irq vector tracepoints 2013-06-20 22:25:34 -07:00
irqinit.c KVM: VMX: Register a new IPI for posted interrupt 2013-04-16 16:32:39 -03:00
jump_label.c
kdebugfs.c arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case 2012-07-26 15:07:20 +02:00
kgdb.c kgdb,x86: fix warning about unused variable 2012-10-12 06:37:34 -05:00
kvm.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
kvmclock.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
ldt.c Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
machine_kexec_32.c Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
machine_kexec_64.c x86, kexec, 64bit: Only set ident mapping for ram. 2013-01-29 15:26:35 -08:00
microcode_amd.c x86, microcode, amd: Early microcode patch loading support for AMD 2013-05-30 20:19:25 -07:00
microcode_amd_early.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_core.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_core_early.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_intel.c x86/microcode_intel.h: Define functions and macros for early loading ucode 2013-01-31 13:18:50 -08:00
microcode_intel_early.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_intel_lib.c x86/microcode_intel_lib.c: Early update ucode on Intel's CPU 2013-01-31 13:19:14 -08:00
mmconf-fam10h_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
module.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-07-24 13:34:56 -07:00
mpparse.c Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-05-29 20:14:53 -07:00
msr.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
nmi.c perf/x86: Fix incorrect use of do_div() in NMI warning 2013-07-12 14:13:04 +02:00
nmi_selftest.c x86/nmi: Clean up register_nmi_handler() usage 2012-06-20 14:23:17 +02:00
paravirt-spinlocks.c x86, spinlock: Replace pv spinlocks with pv ticketlocks 2013-08-09 07:53:05 -07:00
paravirt.c Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-04-30 08:41:21 -07:00
paravirt_patch_32.c
paravirt_patch_64.c
pci-calgary_64.c x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> 2012-06-06 09:17:22 +02:00
pci-dma.c x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES 2013-01-24 17:34:18 +01:00
pci-iommu_table.c
pci-nommu.c X86: integrate CMA with DMA-mapping subsystem 2012-05-21 15:09:38 +02:00
pci-swiotlb.c X86 & IA64: adapt for dma_map_ops changes 2012-03-28 16:36:31 +02:00
pcspeaker.c
perf_regs.c perf: Fix off by one test in perf_reg_value() 2012-09-19 17:08:40 +02:00
probe_roms.c x86/pci/probe_roms: Add missing __iomem annotation to pci_map_biosrom() 2012-09-05 10:52:25 +02:00
process.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
process_32.c Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-07-02 16:25:06 -07:00
process_64.c Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-07-02 16:25:06 -07:00
ptrace.c ptrace/x86: cleanup ptrace_set_debugreg() 2013-07-09 10:33:26 -07:00
pvclock.c x86/kvm: Fix pvclock vsyscall fixmap 2013-02-28 08:50:11 +02:00
quirks.c x86, quirks: Shut-up a long-standing gcc warning 2013-04-02 16:03:34 -07:00
reboot.c reboot: move arch/x86 reboot= handling to generic kernel 2013-07-09 10:33:29 -07:00
reboot_fixups_32.c
relocate_kernel_32.S x86, asm, cleanup: Replace open-coded control register values with symbolic 2013-06-25 16:26:06 -07:00
relocate_kernel_64.S x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S 2013-06-20 21:30:04 -07:00
resource.c
rtc.c x86: Increase precision of x86_platform.get/set_wallclock() 2013-05-28 14:00:59 -07:00
setup.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
setup_percpu.c x86: Add read_mostly declaration/definition to variables from smp.h 2012-06-14 12:42:11 +02:00
signal.c x86/signals: Merge EFLAGS bit clearing into a single statement 2013-05-28 08:46:53 +02:00
smp.c x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt() 2013-07-02 09:52:31 +02:00
smpboot.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
stacktrace.c
step.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
sys_x86_64.c x86: Fix a typo 2013-01-24 16:22:10 +01:00
syscall_32.c
syscall_64.c
tboot.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
tce_64.c Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
test_nx.c
test_rodata.c x86, extable: Remove open-coded exception table entries in arch/x86/kernel/test_rodata.c 2012-04-20 13:51:38 -07:00
time.c MCA: delete all remaining traces of microchannel bus support. 2012-05-17 19:06:13 -04:00
tls.c make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
tls.h
topology.c x86, topology: Debug CPU0 hotplug 2012-11-14 15:28:11 -08:00
trace_clock.c tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
tracepoint.c x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
traps.c x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
tsc.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
tsc_sync.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
uprobes.c uretprobes/x86: Hijack return address 2013-04-13 15:31:55 +02:00
verify_cpu.S
vm86_32.c x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) 2013-05-02 20:36:32 -04:00
vmlinux.lds.S x86: Drop always empty .text..page_aligned section 2013-03-11 15:07:56 +01:00
vsmp_64.c x86/apic/x2apic: Limit the vector reservation to the user specified mask 2012-07-06 11:00:22 +02:00
vsyscall_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
vsyscall_emu_64.S
vsyscall_trace.h
x86_init.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
x8664_ksyms_64.c x86: Improve __phys_addr performance by making use of carry flags and inlining 2012-11-16 16:42:08 -08:00
xsave.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00