Commit Graph

19427 Commits

Author SHA1 Message Date
Jim Mattson 80112c89ed KVM: Synthesize G bit for all segments.
We have noticed that qemu-kvm hangs early in the BIOS when runnning nested
under some versions of VMware ESXi.

The problem we believe is because KVM assumes that the platform preserves
the 'G' but for any segment register. The SVM specification itemizes the
segment attribute bits that are observed by the CPU, but the (G)ranularity bit
is not one of the bits itemized, for any segment. Though current AMD CPUs keep
track of the (G)ranularity bit for all segment registers other than CS, the
specification does not require it. VMware's virtual CPU may not track the
(G)ranularity bit for any segment register.

Since kvm already synthesizes the (G)ranularity bit for the CS segment. It
should do so for all segments. The patch below does that, and helps get rid of
the hangs. Patch applies on top of Linus' tree.

Signed-off-by: Jim Mattson <jmattson@vmware.com>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:11:56 +02:00
Nadav Amit 98eff52ab5 KVM: x86: Fix lapic.c debug prints
In two cases lapic.c does not use the apic_debug macro correctly. This patch
fixes them.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:57 +02:00
Tomasz Grabiec 0d3da0d26e KVM: x86: fix TSC matching
I've observed kvmclock being marked as unstable on a modern
single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0.

The culprit was failure in TSC matching because of overflow of
kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes
in a single synchronization cycle.

Turns out that qemu does multiple TSC writes during init, below is the
evidence of that (qemu-2.0.0):

The first one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]

The second one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
 #4  cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
 #5  cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
 #6  main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390

The third one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
 #4  cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
 #5  cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512
 #6  main  at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482

The fix is to count each vCPU only once when matched, so that
nr_vcpus_matched_tsc holds the size of the matched set. This is
achieved by reusing generation counters. Every vCPU with
this_tsc_generation == cur_tsc_generation is in the matched set. The
match set is cleared by setting cur_tsc_generation to a value which no
other vCPU is set to (by incrementing it).

I needed to bump up the counter size form u8 to u64 to ensure it never
overflows. Otherwise in cases TSC is not written the same number of
times on each vCPU the counter could overflow and incorrectly indicate
some vCPUs as being in the matched set. This scenario seems unlikely
but I'm not sure if it can be disregarded.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:57 +02:00
Jan Kiszka 6cbc5f5a80 KVM: nSVM: Set correct port for IOIO interception evaluation
Obtaining the port number from DX is bogus as a) there are immediate
port accesses and b) user space may have changed the register content
while processing the PIO access. Forward the correct value from the
instruction emulator instead.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:56 +02:00
Jan Kiszka 6493f1574e KVM: nSVM: Fix IOIO size reported on emulation
The access size of an in/ins is reported in dst_bytes, and that of
out/outs in src_bytes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:56 +02:00
Jan Kiszka 9bf418335e KVM: nSVM: Fix IOIO bitmap evaluation
First, kvm_read_guest returns 0 on success. And then we need to take the
access size into account when testing the bitmap: intercept if any of
bits corresponding to the access is set.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:55 +02:00
Jan Kiszka 62baf44cad KVM: nSVM: Do not report CLTS via SVM_EXIT_WRITE_CR0 to L1
CLTS only changes TS which is not monitored by selected CR0
interception. So skip any attempt to translate WRITE_CR0 to
CR0_SEL_WRITE for this instruction.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:55 +02:00
Rickard Strandqvist 9f6226a762 arch: x86: kvm: x86.c: Cleaning up variable is set more than once
A struct member variable is set to the same value more than once

This was found using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-30 16:52:04 +02:00
Paolo Bonzini dc720f9593 Merge commit '33b458d276bb' into kvm-next
Fix bad x86 regression introduced during merge window.
2014-06-30 16:51:07 +02:00
Jan Kiszka 33b458d276 KVM: SVM: Fix CPL export via SS.DPL
We import the CPL via SS.DPL since ae9fedc793. However, we fail to
export it this way so far. This caused spurious guest crashes, e.g. of
Linux when accessing the vmport from guest user space which triggered
register saving/restoring to/from host user space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-30 16:45:28 +02:00
Nadav Amit 27e6fb5dae KVM: vmx: vmx instructions handling does not consider cs.l
VMX instructions use 32-bit operands in 32-bit mode, and 64-bit operands in
64-bit mode.  The current implementation is broken since it does not use the
register operands correctly, and always uses 64-bit for reads and writes.
Moreover, write to memory in vmwrite only considers long-mode, so it ignores
cs.l. This patch fixes this behavior.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:15 +02:00
Nadav Amit 1e32c07955 KVM: vmx: handle_cr ignores 32/64-bit mode
On 32-bit mode only bits [31:0] of the CR should be used for setting the CR
value.  Otherwise, the host may incorrectly assume the value is invalid if bits
[63:32] are not zero.  Moreover, the CR is currently being read twice when CR8
is used.  Last, nested mov-cr exiting is modified to handle the CR value
correctly as well.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:15 +02:00
Nadav Amit a449c7aa51 KVM: x86: Hypercall handling does not considers opsize correctly
Currently, the hypercall handling routine only considers LME as an indication
to whether the guest uses 32/64-bit mode. This is incosistent with hyperv
hypercalls handling and against the common sense of considering cs.l as well.
This patch uses is_64_bit_mode instead of is_long_mode for that matter. In
addition, the result is masked in respect to the guest execution mode. Last, it
changes kvm_hv_hypercall to use is_64_bit_mode as well to simplify the code.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:14 +02:00
Nadav Amit 5777392e83 KVM: x86: check DR6/7 high-bits are clear only on long-mode
When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and
otherwise injects a #GP exception. This exception should only be injected only
if running in long-mode.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:14 +02:00
Jan Kiszka 5381417f6a KVM: nVMX: Fix returned value of MSR_IA32_VMX_VMCS_ENUM
Many real CPUs get this wrong as well, but ours is totally off: bits 9:1
define the highest index value.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:13 +02:00
Jan Kiszka 2996fca069 KVM: nVMX: Allow to disable VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS
Allow L1 to "leak" its debug controls into L2, i.e. permit cleared
VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS. This requires to manually
transfer the state of DR7 and IA32_DEBUGCTLMSR from L1 into L2 as both
run on different VMCS.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:13 +02:00
Jan Kiszka 560b7ee12c KVM: nVMX: Fix returned value of MSR_IA32_VMX_PROCBASED_CTLS
SDM says bits 1, 4-6, 8, 13-16, and 26 have to be set.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:12 +02:00
Jan Kiszka 3dcdf3ec6e KVM: nVMX: Allow to disable CR3 access interception
We already have this control enabled by exposing a broken
MSR_IA32_VMX_PROCBASED_CTLS value. This will properly advertise our
capability once the value is fixed by clearing the right bits in
MSR_IA32_VMX_TRUE_PROCBASED_CTLS. We also have to ensure to test the
right value on L2 entry.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:12 +02:00
Jan Kiszka 3dbcd8da7b KVM: nVMX: Advertise support for MSR_IA32_VMX_TRUE_*_CTLS
We already implemented them but failed to advertise them. Currently they
all return the identical values to the capability MSRs they are
augmenting. So there is no change in exposed features yet.

Drop related comments at this chance that are partially incorrect and
redundant anyway.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:11 +02:00
Jan Kiszka e4aa5288ff KVM: x86: Fix constant value of VM_{EXIT_SAVE,ENTRY_LOAD}_DEBUG_CONTROLS
The spec says those controls are at bit position 2 - makes 4 as value.

The impact of this mistake is effectively zero as we only use them to
ensure that these features are set at position 2 (or, previously, 1) in
MSR_IA32_VMX_{EXIT,ENTRY}_CTLS - which is and will be always true
according to the spec.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:11 +02:00
Nadav Amit a825f5cc4a KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX
On long-mode the current NOP (0x90) emulation still writes back to RAX.  As a
result, EAX is zero-extended and the high 32-bits of RAX are cleared.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:10 +02:00
Nadav Amit 140bad89fd KVM: x86: emulation of dword cmov on long-mode should clear [63:32]
Even if the condition of cmov is not satisfied, bits[63:32] should be cleared.
This is clearly stated in Intel's CMOVcc documentation.  The solution is to
reassign the destination onto itself if the condition is unsatisfied.  For that
matter the original destination value needs to be read.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:10 +02:00
Nadav Amit 9e8919ae79 KVM: x86: Inter-privilege level ret emulation is not implemeneted
Return unhandlable error on inter-privilege level ret instruction.  This is
since the current emulation does not check the privilege level correctly when
loading the CS, and does not pop RSP/SS as needed.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:09 +02:00
Nadav Amit ee212297cd KVM: x86: Wrong emulation on 'xadd X, X'
The emulator does not emulate the xadd instruction correctly if the two
operands are the same.  In this (unlikely) situation the result should be the
sum of X and X (2X) when it is currently X.  The solution is to first perform
writeback to the source, before writing to the destination.  The only
instruction which should be affected is xadd, as the other instructions that
perform writeback to the source use the extended accumlator (e.g., RAX:RDX).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:09 +02:00
Nadav Amit 7dec5603b6 KVM: x86: bit-ops emulation ignores offset on 64-bit
The current emulation of bit operations ignores the offset from the destination
on 64-bit target memory operands. This patch fixes this behavior.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:08 +02:00
Fabian Frederick bc39c4db71 arch/x86/kvm/vmx.c: use PAGE_ALIGNED instead of IS_ALIGNED(PAGE_SIZE
use mm.h definition

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:08 +02:00
Paolo Bonzini bdc907222c KVM: emulate: fix harmless typo in MMX decoding
It was using the wrong member of the union.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:07 +02:00
Paolo Bonzini 9688897717 KVM: emulate: simplify BitOp handling
Memory is always the destination for BitOp instructions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:07 +02:00
Paolo Bonzini a5457e7bcf KVM: emulate: POP SS triggers a MOV SS shadow too
We did not do that when interruptibility was added to the emulator,
because at the time pop to segment was not implemented.  Now it is,
add it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:20 +02:00
Nadav Amit 32e94d0696 KVM: x86: smsw emulation is incorrect in 64-bit mode
In 64-bit mode, when the destination is a register, the assignment is done
according to the operand size. Otherwise (memory operand or no 64-bit mode), a
16-bit assignment is performed.

Currently, 16-bit assignment is always done to the destination.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:19 +02:00
Nadav Amit aaa05f2437 KVM: x86: Return error on cmpxchg16b emulation
cmpxchg16b is currently unimplemented in the emulator. The least we can do is
return error upon the emulation of this instruction.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:19 +02:00
Nadav Amit 67f4d4288c KVM: x86: rdpmc emulation checks the counter incorrectly
The rdpmc emulation checks that the counter (ECX) is not higher than 2, without
taking into considerations bits 30:31 role (e.g., bit 30 marks whether the
counter is fixed). The fix uses the pmu information for checking the validity
of the pmu counter.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:18 +02:00
Nadav Amit 3b32004a66 KVM: x86: movnti minimum op size of 32-bit is not kept
If the operand-size prefix (0x66) is used in 64-bit mode, the emulator would
assume the destination operand is 64-bit, when it should be 32-bit.

Reminder: movnti does not support 16-bit operands and its default operand size
is 32-bit.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:18 +02:00
Nadav Amit 37c564f285 KVM: x86: cmpxchg emulation should compare in reverse order
The current implementation of cmpxchg does not update the flags correctly,
since the accumulator should be compared with the destination and not the other
way around. The current implementation does not update the flags correctly.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:17 +02:00
Nadav Amit 606b1c3e87 KVM: x86: sgdt and sidt are not privilaged
The SGDT and SIDT instructions are not privilaged, i.e. they can be executed
with CPL>0.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:17 +02:00
Nadav Amit 2eedcac8a9 KVM: x86: Loading segments on 64-bit mode may be wrong
The current emulator implementation ignores the high 32 bits of the base in
long-mode.  During segment load from the LDT, the base of the LDT is calculated
incorrectly and may cause the wrong segment to be loaded.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:16 +02:00
Nadav Amit e37a75a13c KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR
The current implementation ignores the LDTR/TR base high 32-bits on long-mode.
As a result the loaded segment descriptor may be incorrect.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:16 +02:00
Nadav Amit 7fe864dc94 KVM: x86: Mark VEX-prefix instructions emulation as unimplemented
Currently the emulator does not recognize vex-prefix instructions.  However, it
may incorrectly decode lgdt/lidt instructions and try to execute them. This
patch returns unhandlable error on their emulation.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:15 +02:00
Linus Torvalds c728762e06 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso fixes from Peter Anvin:
 "Fixes for x86/vdso.

  One is a simple build fix for bigendian hosts, one is to make "make
  vdso_install" work again, and the rest is about working around a bug
  in Google's Go language -- two are documentation patches that improves
  the sample code that the Go coders took, modified, and broke; the
  other two implements a workaround that keeps existing Go binaries from
  segfaulting at least"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Fix vdso_install
  x86/vdso: Hack to keep 64-bit Go programs working
  x86/vdso: Add PUT_LE to store little-endian values
  x86/vdso/doc: Make vDSO examples more portable
  x86/vdso/doc: Rename vdso_test.c to vdso_standalone_test_x86.c
  x86, vdso: Remove one final use of htole16()
2014-06-14 14:46:29 -07:00
Andy Lutomirski a934fb5bc9 x86/vdso: Fix vdso_install
"make vdso_install" installs unstripped versions of the vdso objects
for the benefit of the debugger.  This was broken by checkin:

6f121e548f x86, vdso: Reimplement vdso.so preparation in build-time C

The filenames are different now, so update the Makefile to cope.

This still installs the 64-bit vdso as vdso64.so.  We believe this
will be okay, as the only known user is a patched gdb which is known
to use build-ids, but if it turns out to be a problem we may have to
add a link.

Inspired by a patch from Sam Ravnborg.

Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/b10299edd8ba98d17e07dafcd895b8ecf4d99eff.1402586707.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-06-13 10:31:48 -07:00
Linus Torvalds 71998d1be4 Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq fixes from Ingo Molnar:
 "Two changes: a cpu-hotplug/irq race fix, plus a HyperV related fix"

* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Fix fixup_irqs() error handling
  x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately
2014-06-12 20:03:47 -07:00
Linus Torvalds 3737a12761 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Ingo Molnar:
 "A second round of perf updates:

   - wide reaching kprobes sanitization and robustization, with the hope
     of fixing all 'probe this function crashes the kernel' bugs, by
     Masami Hiramatsu.

   - uprobes updates from Oleg Nesterov: tmpfs support, corner case
     fixes and robustization work.

   - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo
     et al:
        * Add support to accumulate hist periods (Namhyung Kim)
        * various fixes, refactorings and enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  perf: Differentiate exec() and non-exec() comm events
  perf: Fix perf_event_comm() vs. exec() assumption
  uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates
  perf/documentation: Add description for conditional branch filter
  perf/x86: Add conditional branch filtering support
  perf/tool: Add conditional branch filter 'cond' to perf record
  perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND'
  uprobes: Teach copy_insn() to support tmpfs
  uprobes: Shift ->readpage check from __copy_insn() to uprobe_register()
  perf/x86: Use common PMU interrupt disabled code
  perf/ARM: Use common PMU interrupt disabled code
  perf: Disable sampled events if no PMU interrupt
  perf: Fix use after free in perf_remove_from_context()
  perf tools: Fix 'make help' message error
  perf record: Fix poll return value propagation
  perf tools: Move elide bool into perf_hpp_fmt struct
  perf tools: Remove elide setup for SORT_MODE__MEMORY mode
  perf tools: Fix "==" into "=" in ui_browser__warning assignment
  perf tools: Allow overriding sysfs and proc finding with env var
  perf tools: Consider header files outside perf directory in tags target
  ...
2014-06-12 19:18:49 -07:00
Andy Lutomirski e0bf7b86da x86/vdso: Hack to keep 64-bit Go programs working
The Go runtime has a buggy vDSO parser that currently segfaults.
This writes an empty SHT_DYNSYM entry that causes Go's runtime to
malfunction by thinking that the vDSO is empty rather than
malfunctioning by running off the end and segfaulting.

This affects x86-64 only as far as we know, so we do not need this for
the i386 and x32 vdsos.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/d10618176c4bd39b457a5e85c497295c90cab1bc.1402620737.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-06-12 19:02:30 -07:00
Andy Lutomirski b4b31f6101 x86/vdso: Add PUT_LE to store little-endian values
Add PUT_LE() by analogy with GET_LE() to write littleendian values in
addition to reading them.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/3d9b27e92745b27b6fda1b9a98f70dc9c1246c7a.1402620737.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-06-12 19:01:51 -07:00
Linus Torvalds c29deef32e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more locking changes from Ingo Molnar:
 "This is the second round of locking tree updates for v3.16, offering
  large system scalability improvements:

 - optimistic spinning for rwsems, from Davidlohr Bueso.

 - 'qrwlocks' core code and x86 enablement, from Waiman Long and PeterZ"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, locking/rwlocks: Enable qrwlocks on x86
  locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks
  locking/mutexes: Documentation update/rewrite
  locking/rwsem: Fix checkpatch.pl warnings
  locking/rwsem: Fix warnings for CONFIG_RWSEM_GENERIC_SPINLOCK
  locking/rwsem: Support optimistic spinning
2014-06-12 18:48:15 -07:00
Linus Torvalds f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Linus Torvalds 682b7c1c8e Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is the main drm merge window pull request, changes all over the
  place, mostly normal levels of churn.

  Highlights:

  Core drm:
     More cleanups, fix race on connector/encoder naming, docs updates,
     object locking rework in prep for atomic modeset

  i915:
     mipi DSI support, valleyview power fixes, cursor size fixes,
     execlist refactoring, vblank improvements, userptr support, OOM
     handling improvements

  radeon:
     GPUVM tuning and large page size support, gart fixes, deep color
     HDMI support, HDMI audio cleanups

  nouveau:
     - displayport rework should fix lots of issues
     - initial gk20a support
     - gk110b support
     - gk208 fixes

  exynos:
     probe order fixes, HDMI changes, IPP consolidation

  msm:
     debugfs updates, misc fixes

  ast:
     ast2400 support, sync with UMS driver

  tegra:
     cleanups, hdmi + hw cursor for Tegra 124.

  panel:
     fixes existing panels add some new ones.

  ipuv3:
     moved from staging to drivers/gpu"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (761 commits)
  drm/nouveau/disp/dp: fix tmds passthrough on dp connector
  drm/nouveau/dp: probe dpcd to determine connectedness
  drm/nv50-: trigger update after all connectors disabled
  drm/nv50-: prepare for attaching a SOR to multiple heads
  drm/gf119-/disp: fix debug output on update failure
  drm/nouveau/disp/dp: make use of postcursor when its available
  drm/g94-/disp/dp: take max pullup value across all lanes
  drm/nouveau/bios/dp: parse lane postcursor data
  drm/nouveau/dp: fix support for dpms
  drm/nouveau: register a drm_dp_aux channel for each dp connector
  drm/g94-/disp: add method to power-off dp lanes
  drm/nouveau/disp/dp: maintain link in response to hpd signal
  drm/g94-/disp: bash and wait for something after changing lane power regs
  drm/nouveau/disp/dp: split link config/power into two steps
  drm/nv50/disp: train PIOR-attached DP from second supervisor
  drm/nouveau/disp/dp: make use of existing output data for link training
  drm/gf119/disp: start removing direct vbios parsing from supervisor
  drm/nv50/disp: start removing direct vbios parsing from supervisor
  drm/nouveau/disp/dp: maintain receiver caps in response to hpd signal
  drm/nouveau/disp/dp: create subclass for dp outputs
  ...
2014-06-12 11:32:30 -07:00
Alexei Starovoitov e430f34ee5 net: filter: cleanup A/X name usage
The macro 'A' used in internal BPF interpreter:
 #define A regs[insn->a_reg]
was easily confused with the name of classic BPF register 'A', since
'A' would mean two different things depending on context.

This patch is trying to clean up the naming and clarify its usage in the
following way:

- A and X are names of two classic BPF registers

- BPF_REG_A denotes internal BPF register R0 used to map classic register A
  in internal BPF programs generated from classic

- BPF_REG_X denotes internal BPF register R7 used to map classic register X
  in internal BPF programs generated from classic

- internal BPF instruction format:
struct sock_filter_int {
        __u8    code;           /* opcode */
        __u8    dst_reg:4;      /* dest register */
        __u8    src_reg:4;      /* source register */
        __s16   off;            /* signed offset */
        __s32   imm;            /* signed immediate constant */
};

- BPF_X/BPF_K is 1 bit used to encode source operand of instruction
In classic:
  BPF_X - means use register X as source operand
  BPF_K - means use 32-bit immediate as source operand
In internal:
  BPF_X - means use 'src_reg' register as source operand
  BPF_K - means use 32-bit immediate as source operand

Suggested-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 00:13:16 -07:00
Linus Torvalds dfb945473a Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
 "This contains:
   - addition of the Intel MID watchdog
   - removal of W83697HF and W83697UG drivers (code was merged into
     w83627hf_wdt driver)
   - addition of Armada 375/380 SoC support
   - conversion of imx2_wdt to regmap API and to watchdog core API
   - lots of other small improvements and fixes"

[ Wim was also tagged by gmail as a spammer, but not delayed by days
  unlike Ben ]

* git://www.linux-watchdog.org/linux-watchdog: (25 commits)
  x86: intel-mid: add watchdog platform code for Merrifield
  watchdog: add Intel MID watchdog driver support
  watchdog: sp805: Set watchdog_device->timeout from ->set_timeout()
  booke/watchdog: refine and clean up the codes
  watchdog: iop_wdt only builds for mach-iop13xx
  watchdog: Remove drivers for W83697HF and W83697UG
  watchdog: w83627hf_wdt: Add early_disable module parameter
  ARM: mvebu: Add A375/A380 watchdog binding documentation
  watchdog: orion: Add Armada 375/380 SoC support
  watchdog: orion: Introduce per-SoC enabled() function
  watchdog: orion: Introduce per-SoC stop() function
  watchdog: orion: Remove unneeded atomic access
  watchdog: orion: Introduce a SoC-specific RSTOUT mapping
  watchdog: orion: Move the register ioremap'ing to its own function
  watchdog: xilinx: Make of_device_id array const
  watchdog: imx2_wdt: convert to watchdog core api
  watchdog: imx2_wdt: convert to use regmap API.
  watchdog: imx2_wdt: Sort the header files alphabetically
  watchdog: ath79_wdt: switch to clk_prepare/clk_disable
  watchdog: ath79_wdt: avoid spurious restarts on AR934x
  ...
2014-06-10 19:16:36 -07:00
H. Peter Anvin 4d048b0255 x86, vdso: Remove one final use of htole16()
One final use of the macros from <endian.h> which are not available on
older system.  In this case we had one sole case of *writing* a
littleendian number, but the number is SHN_UNDEF which is the constant
zero, so rather than dealing with the general case of littleendian
puts here, just document that the constant is zero and be done with
it.

Reported-and-Tested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/20140610135051.c3c34165f73d67d218b62bd9@linux-foundation.org
2014-06-10 15:55:05 -07:00