linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Paolo Bonzini	fc57ac2c9c	KVM: lapic: sync highest ISR to hardware apic on EOI When Hyper-V enlightenments are in effect, Windows prefers to issue an Hyper-V MSR write to issue an EOI rather than an x2apic MSR write. The Hyper-V MSR write is not handled by the processor, and besides being slower, this also causes bugs with APIC virtualization. The reason is that on EOI the processor will modify the highest in-service interrupt (SVI) field of the VMCS, as explained in section 29.1.4 of the SDM; every other step in EOI virtualization is already done by apic_send_eoi or on VM entry, but this one is missing. We need to do the same, and be careful not to muck with the isr_count and highest_isr_cache fields that are unused when virtual interrupt delivery is enabled. Cc: stable@vger.kernel.org Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-27 10:21:09 +02:00
Linus Torvalds	5fa6a683c0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "It looks like a sizeble collection but this is nearly 3 weeks of bug fixing while you were away. 1) Fix crashes over IPSEC tunnels with NAT, the latter can reroute the packet through a non-IPSEC protected path and the code has to be able to handle SKBs attached to routes lacking an attached xfrm state. From Steffen Klassert. 2) Fix OOPSs in ipv4 and ipv6 ipsec layers for unsupported sub-protocols, also from Steffen Klassert. 3) Set local_df on fragmented netfilter skbs otherwise we won't be able to forward successfully, from Florian Westphal. 4) cdc_mbim ipv6 neighbour code does __vlan_find_dev_deep without holding RCU lock, from Bjorn Mork. 5) local_df test in ip_may_fragment is inverted, from Florian Westphal. 6) jme driver doesn't check for DMA mapping failures, from Neil Horman. 7) qlogic driver doesn't calculate number of TX queues properly, from Shahed Shaikh. 8) fib_info_cnt can drift irreversibly positive if we fail to allocate the fi->fib_metrics array, from Sergey Popovich. 9) Fix use after free in ip6_route_me_harder(), also from Sergey Popovich. 10) When SYSCTL is disabled, we don't handle local_port_range and ping_group_range defaults properly at all, from Cong Wang. 11) Unaccelerated VLAN tagged frames improperly handled by cdc_mbim driver, fix from Bjorn Mork. 12) cassini driver needs nested lock annotations for TX locking, from Emil Goode. 13) On init error ipv6 VTI driver can unregister pernet ops twice, oops. Fix from Mahtias Krause. 14) If macvlan device is down, don't propagate IFF_ALLMULTI changes, from Peter Christensen. 15) Missing NULL pointer check while parsing netlink config options in ip6_tnl_validate(). From Susant Sahani. 16) Fix handling of neighbour entries during ipv6 router reachability probing, from Duan Jiong. 17) x86 and s390 JIT address randomization has some address calculation bugs leading to crashes, from Alexei Starovoitov and Heiko Carstens. 18) Clear up those uglies with nop patching and net_get_random_once(), from Hannes Frederic Sowa. 19) Option length miscalculated in ip6_append_data(), fix also from Hannes Frederic Sowa. 20) A while ago we fixed a race during device unregistry when a namespace went down, turns out there is a second place that needs similar protection. From Cong Wang. 21) In the new Altera TSE driver multicast filtering isn't working, disable it and just use promisc mode until the cause is found. From Vince Bridgers. 22) When we disable router enabling in ipv6 we have to flush the cached routes explicitly, from Duan Jiong. 23) NBMA tunnels should not cache routes on the tunnel object because the key is variable, from Timo Teräs. 24) With stacked devices GRO information in skb->cb[] can be not setup properly, make sure it is in all code paths. From Eric Dumazet. 25) Really fix stacked vlan locking, multiple levels of nesting with intervening non-vlan devices are possible. From Vlad Yasevich. 26) Fallback ipip tunnel device's mtu is not setup properly, from Steffen Klassert. 27) The packet scheduler's tcindex filter can crash because we structure copy objects with list_head's inside, oops. From Cong Wang. 28) Fix CHECKSUM_COMPLETE handling for ipv6 GRE tunnels, from Eric Dumazet. 29) In some configurations 'itag' in __mkroute_input() can end up being used uninitialized because of how fib_validate_source() works. Fix it by explitly initializing itag to zero like all the other fib_validate_source() callers do, from Li RongQing" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) batman: fix a bogus warning from batadv_is_on_batman_iface() ipv4: initialise the itag variable in __mkroute_input bonding: Send ALB learning packets using the right source bonding: Don't assume 802.1Q when sending alb learning packets. net: doc: Update references to skb->rxhash stmmac: Remove unbalanced clk_disable call ipv6: gro: fix CHECKSUM_COMPLETE support net_sched: fix an oops in tcindex filter can: peak_pci: prevent use after free at netdev removal ip_tunnel: Initialize the fallback device properly vlan: Fix build error wth vlan_get_encap_level() can: c_can: remove obsolete STRICT_FRAME_ORDERING Kconfig option MAINTAINERS: Pravin Shelar is Open vSwitch maintainer. bnx2x: Convert return 0 to return rc bonding: Fix alb mode to only use first level vlans. bonding: Fix stacked device detection in arp monitoring macvlan: Fix lockdep warnings with stacked macvlan devices vlan: Fix lockdep warning with stacked vlan devices. net: Allow for more then a single subclass for netif_addr_lock net: Find the nesting level of a given device by type. ...	2014-05-23 15:29:43 -07:00
Linus Torvalds	e6a32c3ad1	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "The biggest changes are fixes for races that kept triggering Trinity crashes, plus liblockdep build fixes and smaller misc fixes. The liblockdep bits in perf/urgent are a pull mistake - they should have been in locking/urgent - but by the time I noticed other commits were added and testing was done :-/ Sorry about that" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix a race between ring_buffer_detach() and ring_buffer_attach() perf: Prevent false warning in perf_swevent_add perf: Limit perf_event_attr::sample_period to 63 bits tools/liblockdep: Remove all build files when doing make clean tools/liblockdep: Build liblockdep from tools/Makefile perf/x86/intel: Fix Silvermont's event constraints perf: Fix perf_event_init_context() perf: Fix race in removing an event	2014-05-23 10:02:34 -07:00
Bjorn Helgaas	c96ec95315	x86/gart: Tidy messages and add bridge device info Print the AGP bridge info the same way as the rest of the kernel, e.g., "0000:00:04.0" instead of "00:04:00". Also print the AGP aperture address range the same way we print resources, and label it explicitly as a bus address range. No functional change except the message changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-05-23 10:47:19 -06:00
Bjorn Helgaas	a5d3244a0b	x86/gart: Replace printk() with pr_info() Replace printk() with pr_info(), pr_err(), etc. Define pr_fmt() to prefix output with "AGP: ". No functional change except the addition of "AGP: " prefix in dmesg output. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-05-23 10:47:19 -06:00
Bjorn Helgaas	adc429d699	x86/PCI: Move pcibios_assign_resources() annotation to definition Move the pcibios_assign_resources() fs_initcall annotation next to the function definition. No functional change. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-05-23 10:47:19 -06:00
Nadav Amit	1f85411255	KVM: vmx: DR7 masking on task switch emulation is wrong The DR7 masking which is done on task switch emulation should be in hex format (clearing the local breakpoints enable bits 0,2,4 and 6). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:47:18 +02:00
Dave Hansen	65a7f03f6b	x86: fix page fault tracing when KVM guest support enabled I noticed on some of my systems that page fault tracing doesn't work: cd /sys/kernel/debug/tracing echo 1 > events/exceptions/enable cat trace; # nothing shows up I eventually traced it down to CONFIG_KVM_GUEST. At least in a KVM VM, enabling that option breaks page fault tracing, and disabling fixes it. I tried on some old kernels and this does not appear to be a regression: it never worked. There are two page-fault entry functions today. One when tracing is on and another when it is off. The KVM code calls do_page_fault() directly instead of calling the traced version: > dotraplinkage void __kprobes > do_async_page_fault(struct pt_regs *regs, unsigned long > error_code) > { > enum ctx_state prev_state; > > switch (kvm_read_and_reset_pf_reason()) { > default: > do_page_fault(regs, error_code); > break; > case KVM_PV_REASON_PAGE_NOT_PRESENT: I'm also having problems with the page fault tracing on bare metal (same symptom of no trace output). I'm unsure if it's related. Steven had an alternative to this which has zero overhead when tracing is off where this includes the standard noops even when tracing is disabled. I'm unconvinced that the extra complexity of his apporach: http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home is worth it, expecially considering that the KVM code is already making page fault entry slower here. This solution is dirt-simple. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gleb Natapov <gleb@redhat.com> Cc: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:47:17 +02:00
Paolo Bonzini	ae9fedc793	KVM: x86: get CPL from SS.DPL CS.RPL is not equal to the CPL in the few instructions between setting CR0.PE and reloading CS. And CS.DPL is also not equal to the CPL for conforming code segments. However, SS.DPL is always equal to the CPL except for the weird case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the value in the STAR MSR, but force CPL=3 (Intel instead forces SS.DPL=SS.RPL=CPL=3). So this patch: - modifies SVM to update the CPL from SS.DPL rather than CS.RPL; the above case with SYSRET is not broken further, and the way to fix it would be to pass the CPL to userspace and back - modifies VMX to always return the CPL from SS.DPL (except forcing it to 0 if we are emulating real mode via vm86 mode; in vm86 mode all DPLs have to be 3, but real mode does allow privileged instructions). It also removes the CPL cache, which becomes a duplicate of the SS access rights cache. This fixes doing KVM_IOCTL_SET_SREGS exactly after setting CR0.PE=1 but before CS has been reloaded. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:47:17 +02:00
Paolo Bonzini	5045b46803	KVM: x86: check CS.DPL against RPL during task switch Table 7-1 of the SDM mentions a check that the code segment's DPL must match the selector's RPL. This was not done by KVM, fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:47:17 +02:00
Paolo Bonzini	fb5e336b97	KVM: x86: drop set_rflags callback Not needed anymore now that the CPL is computed directly during task switch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:47:16 +02:00
Paolo Bonzini	2356aaeb2f	KVM: x86: use new CS.RPL as CPL during task switch During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition to all the other requirements) and will be the new CPL. So far this worked by carefully setting the CS selector and flag before doing the task switch; setting CS.selector will already change the CPL. However, this will not work once we get the CPL from SS.DPL, because then you will have to set the full segment descriptor cache to change the CPL. ctxt->ops->cpl(ctxt) will then return the old CPL during the task switch, and the check that SS.DPL == CPL will fail. Temporarily assume that the CPL comes from CS.RPL during task switch to a protected-mode task. This is the same approach used in QEMU's emulation code, which (until version 2.0) manually tracks the CPL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:45:38 +02:00
Greg Kroah-Hartman	69c1f05379	Merge 3.15-rc6 into staging-next. This resolves the conflicts in the files: drivers/iio/adc/Kconfig drivers/staging/rtl8723au/os_dep/usb_ops_linux.c Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-05-22 23:27:17 +09:00
Ingo Molnar	65c2ce7004	Linux 3.15-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTfR2zAAoJEHm+PkMAQRiG3noH/2s+KUge3qO2M+AmxttUo74B +npAMdbqYR3MdEiwxYZfsHcMu4Ye/IKLcrh4pydB5hI2mdjtGkH1bnmia0f1ve/c Z/a0256+W8gWp7mcUBqSNztqLPAWa7wKOqNdLjj5idr1BSj6u8im+fQ9FBh2woki 1fyYAuq/60lq4CMOKJvkA95V1Ome/jO+8tS4PguOgsCETQxCVFGurZcBbG3Mx5Y3 v+ioCqeRc6GvxPFR6YngnTZCrsLxSRT3tnO2Qy5zX7dxjIQkCEbvIckpBQv01Y3R wNUaX+2Jae207igxrEv8CjmCFnmZFuUI15aWWCy6fOS/j8bjuk6ThYJO8N4ZBM0= =2ShG -----END PGP SIGNATURE----- Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-22 10:28:56 +02:00
Andy Lutomirski	577ed45ec5	x86_64, entry: Merge paranoidzeroentry_ist into idtentry One more specialized entry function is now gone. Again, this seems to only change line numbers in entry_64.o. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/f54854f07ff3be8162b166124dbead23feeefe10.1400709717.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-21 16:23:02 -07:00
Andy Lutomirski	cb5dd2c5ee	x86_64, entry: Merge most 64-bit asm entry macros I haven't touched the device interrupt code, which is different enough that it's probably not worth merging, and I haven't done anything about paranoidzeroentry_ist yet. This appears to produce an entry_64.o file that differs only in the debug info line numbers. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e7a6acfb130471700370e77af9e4b4b6ed46f5ef.1400709717.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-21 16:22:57 -07:00
Andy Lutomirski	1bd24efc8b	x86_64, entry: Add missing 'DEFAULT_FRAME 0' entry annotations The paranoidzeroentry macros were missing them. I'm not at all convinced that these annotations are correct and/or necessary, but this makes the macros more consistent with each other. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/10ad65f534f8bc62e77f74fe15f68e8d4a59d8b3.1400709717.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-21 16:22:51 -07:00
Suravee Suthikulpanit	94d4bb5b13	x86/PCI: Work around AMD Fam15h BIOSes that fail to provide _PXM The BIOS is supposed to provide ACPI _PXM methods for PCI host bridges if it cares about platform topology. But some BIOSes do not, so add Fam15h to the list of CPUs for which we fall back to reading node numbers from the hardware. Note that pci_acpi_scan_root() warns about the BIOS bug if we use this information because (1) the hardware node numbers are not necessarily compatible with other logical node numbers from ACPI, and (2) the lack of _PXM forces OS updates that would not otherwise be required. [bhelgaas: changelog, comments] Link: https://bugzilla.kernel.org/show_bug.cgi?id=72051 Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@suse.de> Cc: Robert Richter <rric@kernel.org> Cc: Daniel J Blueman <daniel@numascale.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>	2014-05-21 12:34:01 -06:00
Myron Stowe	3367310133	x86/PCI: Warn if we have to "guess" host bridge node information The vast majority of platforms are not supplying ACPI _PXM (proximity) information corresponding to host bridge (PNP0A03/PNP0A08) devices resulting in sysfs "numa_node" values of -1 (NUMA_NO_NODE): # for i in /sys/devices/pci0000\:00//numa_node; do cat $i; done \| uniq -1 # find /sys/ -name "numa_node" \| while read fname; do cat $fname; \ done \| uniq -1 AMD based platforms provide a fall-back for this situation via amd_bus.c. These platforms snoop out the information by directly reading specific registers from the Northbridge and caching them via alloc_pci_root_info(). Later during boot processing when host bridges are discovered - pci_acpi_scan_root() - the kernel looks for their corresponding ACPI _PXM method - drivers/acpi/numa.c::acpi_get_node(). If the BIOS supplied a _PXM method then that node (proximity) value is associated. If the BIOS did not supply a _PXM method and* the platform is AMD-based, the fall-back cached values obtained directly from the Northbridge are used; otherwise, "NUMA_NO_NODE" is associated. There are a number of issues with this fall-back mechanism the most notable being that amd_bus.c extracts a 3-bit number from a CPU register and uses it as the node number. The node numbers used by Linux are logical and there's no reason they need to be identical to settings in the CPU registers. So if we have some node information obtained in the normal way (from _PXM, SLIT, SRAT, etc.) and some from amd_bus.c, there's no reason to believe they will be compatible. This patch warns when this situation occurs: pci_root PNP0A08:00: [Firmware Bug]: no _PXM; falling back to node 0 from hardware (may be inconsistent with ACPI node numbers) Link: https://bugzilla.kernel.org/show_bug.cgi?id=72051 Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-05-21 12:32:16 -06:00
Borislav Petkov	65cef1311d	x86, microcode: Add a disable chicken bit Add a cmdline param which disables the microcode loader. This is useful mostly in debugging situations where we want to turn off microcode loading, both early from the initrd and late, as a means to be able to rule out its influence on the machine. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1400525957-11525-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-20 20:21:27 -07:00
Borislav Petkov	1b1ded57a4	x86, boot: Carve out early cmdline parsing function Carve out early cmdline parsing function into .../lib/cmdline.c so it can be used by early code in the kernel proper as well. Adapted from arch/x86/boot/cmdline.c. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1400525957-11525-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-20 20:21:24 -07:00
Rob Herring	6e87b7030e	Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty into for-next Conflicts: arch/arm64/kernel/early_printk.c	2014-05-20 14:22:54 -05:00
Stephane Eranian	722e76e60f	fix Haswell precise store data source encoding This patch fixes a bug in precise_store_data_hsw() whereby it would set the data source memory level to the wrong value. As per the the SDM Vol 3b Table 18-41 (Layout of Data Linear Address Information in PEBS Record), when status bit 0 is set this is a L1 hit, otherwise this is a L1 miss. This patch encodes the memory level according to the specification. In V2, we added the filtering on the store events. Only the following events produce L1 information: * MEM_UOPS_RETIRED.STLB_MISS_STORES * MEM_UOPS_RETIRED.LOCK_STORES * MEM_UOPS_RETIRED.SPLIT_STORES * MEM_UOPS_RETIRED.ALL_STORES Cc: mingo@elte.hu Cc: acme@ghostprotocols.net Cc: jolsa@redhat.com Cc: jmario@redhat.com Cc: ak@linux.intel.com Tested-and-Reviewed-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140515155644.GA3884@quad Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-05-19 21:52:59 +09:00
Alan	9b17aeec23	goldfish: Allow 64bit builds We can now enable the 64bit option for the Goldfish 64bit emulator. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-05-15 13:19:01 -07:00
David Vrabel	f59c5145dc	x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range() _PAGE_IOMAP is used in xen_remap_domain_mfn_range() to prevent the pfn_pte() call in remap_area_mfn_pte_fn() from using the p2m to translate the MFN. If mfn_pte() is used instead, the p2m look up is avoided and the use of _PAGE_IOMAP is no longer needed. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-05-15 16:16:40 +01:00
David Vrabel	25b884a83d	x86/xen: set regions above the end of RAM as 1:1 PCI devices may have BARs located above the end of RAM so mark such frames as identity frames in the p2m (instead of the default of missing). PFNs outside the p2m (above MAX_P2M_PFN) are also considered to be identity frames for the same reason. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-05-15 16:15:18 +01:00
David Vrabel	2dcc9a3de1	x86/xen: only warn once if bad MFNs are found during setup In xen_add_extra_mem(), if the WARN() checks for bad MFNs trigger it is likely that they will trigger at lot, spamming the log. Use WARN_ONCE() instead. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-05-15 16:15:01 +01:00
David Vrabel	3cb83e46d0	x86/xen: compactly store large identity ranges in the p2m Large (multi-GB) identity ranges currently require a unique middle page (filled with p2m_identity entries) per 1 GB region. Similar to the common p2m_mid_missing middle page for large missing regions, introduce a p2m_mid_identity page (filled with p2m_identity entries) which can be used instead. set_phys_range_identity() thus only needs to allocate new middle pages at the beginning and end of the range. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-05-15 16:14:44 +01:00
David Vrabel	a9b5bff66b	x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN Allow set_phys_range_identity() to work with a range that overlaps MAX_P2M_PFN by clamping pfn_e to MAX_P2M_PFN. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-05-15 16:13:23 +01:00
David Vrabel	fcca2e3119	x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle() early_p2m_alloc_middle() allocates a new leaf page and early_p2m_alloc() allocates a new middle page. This is confusing. Swap the names so they match what the functions actually do. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-05-15 16:12:25 +01:00
Radim Krčmář	bc5eb20161	xen/x86: set panic notifier priority to minimum Execution is not going to continue after telling Xen about the crash. Let other panic notifiers run by postponing the final hypercall as much as possible. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-05-15 15:54:57 +01:00
Linus Torvalds	fa81511bb0	x86-64, modify_ldt: Make support for 16-bit segments a runtime option Checkin: `b3b42ac2cb` x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels disabled 16-bit segments on 64-bit kernels due to an information leak. However, it does seem that people are genuinely using Wine to run old 16-bit Windows programs on Linux. A proper fix for this ("espfix64") is coming in the upcoming merge window, but as a temporary fix, create a sysctl to allow the administrator to re-enable support for 16-bit segments. It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If you hit this issue and care about your old Windows program more than you care about a kernel stack address information leak, you can do echo 1 > /proc/sys/abi/ldt16 as root (add it to your startup scripts), and you should be ok. The sysctl table is only added if you have COMPAT support enabled on x86-64, but I assume anybody who runs old windows binaries very much does that ;) Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com Cc: <stable@vger.kernel.org>	2014-05-14 16:33:54 -07:00
Marcelo Tosatti	16a9602158	KVM: x86: disable master clock if TSC is reset during suspend Updating system_time from the kernel clock once master clock has been enabled can result in time backwards event, in case kernel clock frequency is lower than TSC frequency. Disable master clock in case it is necessary to update it from the resume path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-14 17:59:21 +02:00
Rob Herring	eafd370dfe	Merge branch 'dt-bus-name' into for-next	2014-05-13 18:34:35 -05:00
Anthony Iliopoulos	9844f54623	x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow() The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp Suggested-by: Shay Goikhman <shay.goikhman@huawei.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v2.6.16+ (!)	2014-05-13 16:34:09 -07:00
Alexei Starovoitov	773cd38f40	net: filter: x86: fix JIT address randomization bpf_alloc_binary() adds 128 bytes of room to JITed program image and rounds it up to the nearest page size. If image size is close to page size (like 4000), it is rounded to two pages: round_up(4000 + 4 + 128) == 8192 then 'hole' is computed as 8192 - (4000 + 4) = 4188 If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(header) then kernel will crash during bpf_jit_free(): kernel BUG at arch/x86/mm/pageattr.c:887! Call Trace: [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0 [<ffffffff8106c90c>] worker_thread+0x11c/0x370 since bpf_jit_free() does: unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; struct bpf_binary_header header = (void *)addr; to compute start address of 'bpf_binary_header' and header->pages will pass junk to: set_memory_rw(addr, header->pages); Fix it by making sure that &header->image[prandom_u32() % hole] and &header are in the same page Fixes: `314beb9bca` ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:31:13 -04:00
Jan Kiszka	d9f89b88f5	KVM: x86: Fix CR3 reserved bits check in long mode Regression of 346874c9: PAE is set in long mode, but that does not mean we have valid PDPTRs. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-12 20:04:01 +02:00
David Vrabel	aa8532c322	xen: refactor suspend pre/post hooks New architectures currently have to provide implementations of 5 different functions: xen_arch_pre_suspend(), xen_arch_post_suspend(), xen_arch_hvm_post_suspend(), xen_mm_pin_all(), and xen_mm_unpin_all(). Refactor the suspend code to only require xen_arch_pre_suspend() and xen_arch_post_suspend(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2014-05-12 17:19:56 +01:00
H. Peter Anvin	7a5091d584	x86, rdrand: When nordrand is specified, disable RDSEED as well One can logically expect that when the user has specified "nordrand", the user doesn't want any use of the CPU random number generator, neither RDRAND nor RDSEED, so disable both. Reported-by: Stephan Mueller <smueller@chronox.de> Cc: Theodore Ts'o <tytso@mit.edu> Link: http://lkml.kernel.org/r/21542339.0lFnPSyGRS@myon.chronox.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-11 20:25:20 -07:00
Ong Boon Leong	04725ad594	x86, iosf: Add PCI ID macros for better readability Introduce PCI IDs macro for the list of supported product: BayTrail & Quark X1000. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-5-git-send-email-david.e.box@linux.intel.com Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-09 14:57:35 -07:00
Ong Boon Leong	90916e048c	x86, iosf: Add Quark X1000 PCI ID Add PCI device ID, i.e. that of the Host Bridge, for IOSF MBI driver. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-4-git-send-email-david.e.box@linux.intel.com Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-09 14:57:23 -07:00
Ong Boon Leong	7ef1def800	x86, iosf: Added Quark MBI identifiers Added all the MBI units below and their associated read/write opcodes: - Host Bridge Arbiter - Host Bridge - Remote Management Unit - Memory Manager & eSRAM - SoC Unit Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-3-git-send-email-david.e.box@linux.intel.com Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-09 14:57:08 -07:00
David E. Box	6b8f0c8780	x86, iosf: Make IOSF driver modular and usable by more drivers Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF driver on SOC's without selecting it which forces an unnecessary and limiting dependency. Provides dummy functions to allow these modules to conditionally use the driver on IOSF equipped platforms without impacting their ability to compile and load on non-IOSF platforms. Build default m to ensure availability on x86 SOC's. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-09 14:56:15 -07:00
Boris Ostrovsky	28b92e09e2	x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall() With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall() may lose upper bits or, worse, add them since compiler will do this: (u64)(tk->wall_to_monotonic.tv_nsec << tk->shift) instead of ((u64)tk->wall_to_monotonic.tv_nsec << tk->shift) So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in the subsequent 'while' loop. We need an explicit cast. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1399648287-15178-1-git-send-email-boris.ostrovsky@oracle.com Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-09 08:45:52 -07:00
Andres Freund	c45f77364b	x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro The spuriously added semicolon didn't have any effect because the macro isn't currently in use. `c0a639ad0b` Signed-off-by: Andres Freund <andres@anarazel.de> Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de Cc: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-09 08:42:47 -07:00
Andres Freund	722a0d22d0	x86: Fix typo preventing msr_set/clear_bit from having an effect Due to a typo the msr accessor function introduced in `22085a66c2` didn't have any lasting effects because they accidentally wrote the old value back. After `c0a639ad0b` this at the very least this causes cpuid limits not to be lifted on some cpus leading to missing capabilities for those. Signed-off-by: Andres Freund <andres@anarazel.de> Link: http://lkml.kernel.org/r/1399598957-7011-2-git-send-email-andres@anarazel.de Cc: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-09 08:42:32 -07:00
Vivek Goyal	a9a17104a1	x86, boot: Remove misc.h inclusion from compressed/string.c Given the fact that we removed inclusion of boot.h from boot/string.c does not look like we need misc.h inclusion in compressed/string.c. So remove it. misc.h was also pulling in string_32.h which in turn had macros for memcmp and memcpy. So we don't need to #undef memcmp and memcpy anymore. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/1398447972-27896-3-git-send-email-vgoyal@redhat.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-08 08:00:06 -07:00
Vivek Goyal	3d379225c4	x86, boot: Do not include boot.h in string.c string.c does not require whole of boot.h. Just inclusion of linux/types.h and ctypes.h seems to be sufficient. Keep list of stuff being included in string.c to bare minimal so that string.c can be included in other places easily. For example, Currently boot/compressed/string.c includes boot/string.c but looks like it does not want boot/boot.h. Hence there is a define in boot/compressed/misc.h "define BOOT_BOOT_H" which prevents inclusion of boot.h in compressed/string.c. And compressed/string.c is forced to include misc.h just for that reason. So by removing inclusion of boot.h, we can also get rid of inclusion of misch.h in compressed/misc.c. This also enables including of boot/string.c in purgatory/ code relatively easily. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/1398447972-27896-2-git-send-email-vgoyal@redhat.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-05-08 08:00:01 -07:00
Gabriel L. Somlo	87c00572ba	kvm: x86: emulate monitor and mwait instructions as nop Treat monitor and mwait instructions as nop, which is architecturally correct (but inefficient) behavior. We do this to prevent misbehaving guests (e.g. OS X <= 10.7) from crashing after they fail to check for monitor/mwait availability via cpuid. Since mwait-based idle loops relying on these nop-emulated instructions would keep the host CPU pegged at 100%, do NOT advertise their presence via cpuid, to prevent compliant guests from using them inadvertently. Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-08 15:40:49 +02:00
Peter Zijlstra	f80c5b39b8	sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word. This will allow us, using fetch_or(), to both set NEED_RESCHED and check for POLLING_NRFLAG in a single operation and avoid pointless wakeups. Changing from the non-atomic thread_info::status flags to the atomic thread_info::flags shouldn't be a big issue since most polling state changes were followed/preceded by a full memory barrier anyway. Also, fix up the apm_32 idle function, clearly that was forgotten in the last conversion. The default idle state is !POLLING so just kill the lot. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Link: http://lkml.kernel.org/n/tip-7yksmqtlv4nfowmlqr1rifoi@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-08 09:16:56 +02:00
Feng Tang	62187910b0	x86/intel: Add quirk to disable HPET for the Baytrail platform HPET on current Baytrail platform has accuracy problem to be used as reliable clocksource/clockevent, so add a early quirk to disable it. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1398327498-13163-2-git-send-email-feng.tang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-08 08:15:34 +02:00
Feng Tang	f10f383d84	x86/hpet: Make boot_hpet_disable extern HPET on some platform has accuracy problem. Making "boot_hpet_disable" extern so that we can runtime disable the HPET timer by using quirk to check the platform. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-08 08:15:34 +02:00
George Spelvin	14262d67fe	x86-64, build: Fix stack protector Makefile breakage with 32-bit userland If you are using a 64-bit kernel with 32-bit userland, then scripts/gcc-x86_64-has-stack-protector.sh invokes 32-bit gcc with -mcmodel=kernel, which produces: <stdin>:1:0: error: code model 'kernel' not supported in the 32 bit mode and trips the "broken compiler" test at arch/x86/Makefile:120. There are several places a fix is possible, but the following seems cleanest. (But it's minimal; it would also be possible to factor out a bunch of stuff from the two branches of the if.) Signed-off-by: George Spelvin <linux@horizon.com> Link: http://lkml.kernel.org/r/20140507210552.7581.qmail@ns.horizon.com Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-07 14:14:44 -07:00
Michael S. Tsirkin	b63cf42fd1	kvm/x86: implement hv EOI assist It seems that it's easy to implement the EOI assist on top of the PV EOI feature: simply convert the page address to the format expected by PV EOI. Notes: -"No EOI required" is set only if interrupt injected is edge triggered; this is true because level interrupts are going through IOAPIC which disables PV EOI. In any case, if guest triggers EOI the bit will get cleared on exit. -For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE seems sufficient In any case, bit is cleared on exit so worst case it's never re-enabled -no handling of PV EOI data is performed at HV_X64_MSR_EOI write; HV_X64_MSR_EOI is a separate optimization - it's an X2APIC replacement that lets you do EOI with an MSR and not IO. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-07 18:00:49 +02:00
Nadav Amit	5f7dde7bbb	KVM: x86: Mark bit 7 in long-mode PDPTE according to 1GB pages support In long-mode, bit 7 in the PDPTE is not reserved only if 1GB pages are supported by the CPU. Currently the bit is considered by KVM as always reserved. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-07 17:25:22 +02:00
Nadav Amit	a4ab9d0cf1	KVM: vmx: handle_dr does not handle RSP correctly The RSP register is not automatically cached, causing mov DR instruction with RSP to fail. Instead the regular register accessing interface should be used. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-07 17:24:59 +02:00
Paolo Bonzini	696dfd95ba	KVM: vmx: disable APIC virtualization in nested guests While running a nested guest, we should disable APIC virtualization controls (virtualized APIC register accesses, virtual interrupt delivery and posted interrupts), because we do not expose them to the nested guest. Reported-by: Hu Yaohui <loki2441@gmail.com> Suggested-by: Abel Gordon <abel@stratoscale.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-07 13:46:02 +02:00
Ingo Molnar	37b16beaa9	Merge branch 'perf/urgent' into perf/core, to avoid conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 13:39:22 +02:00
Yan, Zheng	a4b4f11b27	perf/x86/intel: Fix Silvermont's event constraints Event 0x013c is not the same as fixed counter2, remove it from Silvermont's event constraints. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 11:33:16 +02:00
Christian Gmeiner	aadca6fa40	x86/reboot: Add reboot quirk for Certec BPC600 Certec BPC600 needs reboot=pci to actually reboot. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Li Aubrey <aubrey.li@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Jones <davej@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1399446114-2147-1-git-send-email-christian.gmeiner@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 11:22:10 +02:00
Bandan Das	4291b58885	KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptr Some checks are common to all, and moreover, according to the spec, the check for whether any bits beyond the physical address width are set are also applicable to all of them Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-06 19:00:43 +02:00
Bandan Das	96ec146330	KVM: nVMX: fail on invalid vmclear/vmptrld pointer The spec mandates that if the vmptrld or vmclear address is equal to the vmxon region pointer, the instruction should fail with error "VMPTRLD with VMXON pointer" or "VMCLEAR with VMXON pointer" Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-06 19:00:37 +02:00
Bandan Das	3573e22cfe	KVM: nVMX: additional checks on vmxon region Currently, the vmxon region isn't used in the nested case. However, according to the spec, the vmxon instruction performs additional sanity checks on this region and the associated pointer. Modify emulated vmxon to better adhere to the spec requirements Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-06 19:00:27 +02:00
Bandan Das	19677e32fe	KVM: nVMX: rearrange get_vmx_mem_address Our common function for vmptr checks (in 2/4) needs to fetch the memory address Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-06 18:59:57 +02:00
Andi Kleen	2605fc216f	asmlinkage, x86: Add explicit __visible to arch/x86/* As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-05 16:07:44 -07:00
H. Peter Anvin	ac008fe0a3	x86, build: Don't get confused by local symbols arch/x86/crypto/sha1_avx2_x86_64_asm.S introduced _end as a local symbol, which broke the build under certain circumstances. Although the wisdom of _end as a local symbol can definitely be questioned, the build should not break for that reason. Thus, filter the output of nm to only get global symbols of appropriate type. Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-uxm3j3w3odglcwhafwq5tjqu@git.kernel.org	2014-05-05 15:23:35 -07:00
Ulrich Obergfell	1171903d89	KVM: x86: improve the usability of the 'kvm_pio' tracepoint This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated() and emulator_pio_out_emulated(), and it adds an argument (a pointer to the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched from 'pio_data' (depending on 'size'), and the value is included in the trace record ('val'). If 'count' is greater than one, this is indicated by the string "(...)" in the trace output. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-05 22:42:05 +02:00
Ingo Molnar	0214196ce0	* Fix earlyprintk=efi,keep support by switching to an ioremap() mapping of the framebuffer when early_ioremap() is no longer available and dropping __init from functions that may be invoked after free_initmem() - Dave Young -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTZIL0AAoJEC84WcCNIz1Vr9gP/RCHnmo9+w88ujYMjXtoq+/b qDX/Fl8/as/gJ8cKhOVlQpC/t4VbC28mRkxV3J8NS/AklY0mU2R8TatprIyUoKAI oPZwdSbuEIS8ehCr/D+6aAIGLtFYaLD8VK27niNHEHVytZytPqQGpDKARgphin5l AqtEUv9NNfLaN/aHUuMV33xlD4r25BoWlj3RD2h+Rpnu2/vBXs14NTBN1r+SrLFh r8htTDsbm3NjDCvboYyPJjnFZvlYqxtLCBC2vVD8fBvaXcBmj/vLP6WmFd3sxbTZ 4CLmRMShaqh87JH9gdg0m/xJ5sEgRqqvMiqjcaAuJzAew0eE6gUZjE9+fawWYHwT XU0kcsM9wn/014f9fUdqaqM38o/XbnVcW+D5iSrwcx6hhNHzf7nFGnSndN2tednQ k3z3tpX/GB9u5l0064Clru6GbSnV2cSfayaoIc4sULDrp7KBmyrlwBtsQ67C/JfV 0gJ4ridzbFllHBiw3Cyw8vzLDPgQ6t2DGw6RkzUpbMwLZG5YMRcyNODWewcTuH7g VcMMaDKVw7uCrItFyTscMuUe1nVnbZANdLu9znF8TejgX1MzwwmdetqAE/WPR+3V vZoYGNE5zAwGhqF34BLSof9BHoeOjucx1qgaV3QYhrdtgtTXaGf++TvwOhpCVNOC vhUguxcrMLOM68He6o5H =BzhM -----END PGP SIGNATURE----- Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fix from Matt Fleming: " * Fix earlyprintk=efi,keep support by switching to an ioremap() mapping of the framebuffer when early_ioremap() is no longer available and dropping __init from functions that may be invoked after free_initmem() - Dave Young " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-04 20:20:42 +02:00
Linus Torvalds	0384dcae2b	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "This udpate delivers: - A fix for dynamic interrupt allocation on x86 which is required to exclude the GSI interrupts from the dynamic allocatable range. This was detected with the newfangled tablet SoCs which have GPIOs and therefor allocate a range of interrupts. The MSI allocations already excluded the GSI range, so we never noticed before. - The last missing set_irq_affinity() repair, which was delayed due to testing issues - A few bug fixes for the armada SoC interrupt controller - A memory allocation fix for the TI crossbar interrupt controller - A trivial kernel-doc warning fix" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: irq-crossbar: Not allocating enough memory irqchip: armanda: Sanitize set_irq_affinity() genirq: x86: Ensure that dynamic irq allocation does not conflict linux/interrupt.h: fix new kernel-doc warnings irqchip: armada-370-xp: Fix releasing of MSIs irqchip: armada-370-xp: implement the ->check_device() msi_chip operation irqchip: armada-370-xp: fix invalid cast of signed value into unsigned variable	2014-05-03 08:32:48 -07:00
Dave Young	5f35eb0e29	x86/efi: earlyprintk=efi,keep fix earlyprintk=efi,keep will cause kernel hangs while freeing initmem like below: VFS: Mounted root (ext4 filesystem) readonly on device 254:2. devtmpfs: mounted Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000) It is caused by efi earlyprintk use __init function which will be freed later. Such as early_efi_write is marked as __init, also it will use early_ioremap which is init function as well. To fix this issue, I added early initcall early_efi_map_fb which maps the whole efi fb for later use. OTOH, adding a wrapper function early_efi_map which calls early_ioremap before ioremap is available. With this patch applied efi boot ok with earlyprintk=efi,keep console=efi Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2014-05-03 06:39:06 +01:00
Linus Torvalds	0845e11c2a	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Two very small changes: one fix for the vSMP Foundation platform, and one to help LLVM not choke on options it doesn't understand (although it probably should)" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vsmp: Fix irq routing x86: LLVMLinux: Wrap -mno-80387 with cc-option	2014-05-02 14:04:52 -07:00
Roland Dreier	c81c8a1eee	x86, ioremap: Speed up check for RAM pages In __ioremap_caller() (the guts of ioremap), we loop over the range of pfns being remapped and checks each one individually with page_is_ram(). For large ioremaps, this can be very slow. For example, we have a device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+ seconds -- sometimes long enough to trigger the soft lockup detector! Internally, page_is_ram() calls walk_system_ram_range() on a single page. Instead, we can make a single call to walk_system_ram_range() from __ioremap_caller(), and do our further checks only for any RAM pages that we find. For the common case of MMIO, this saves an enormous amount of work, since the range being ioremapped doesn't intersect system RAM at all. With this change, ioremap on our 256 GiB BAR takes less than 1 second. Signed-off-by: Roland Dreier <roland@purestorage.com> Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-02 11:52:26 -07:00
Linus Torvalds	e7e6d2a4a1	- Fix for a Haswell regression in nested virtualization, introduced during the merge window. - A fix from Oleg to async page faults. - A bunch of small ARM changes. - A trivial patch to use the new MSI-X API introduced during the merge window. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTY5anAAoJEBvWZb6bTYby6EQQAIWbOJCrLO3NQxwE9M7d8YvN oviaLFv7vJh1vaXVo7SNjBRXTq4pzWrhg9rwWlHBg1KnxJ0/sc9Tn07Fe+0bxWDh 1XIFXNEkO9+Bpl43VnKGC7sbkYE9m3+jpGCWjF01vBCh+BY73wUOsPD0Zw9YQojN TKBtiQEjb8avuoTUR0JSOTwLZw4DlDRmRLHkNwlqqvbPdvuIWI/LG2wFUvY7/eq8 dWxIPBjLKaIv2aUs9wGNNiz4Kb92uyH5L6bI6SK8VxphRA+51BOjMcBbzdY+Q1XL c4CTaL9ybAyUi4SRv41qWnM09YbI1FayUW93k9xz/vEplXOHp5R/lyUdZETd/d83 GxaooTLcy9nOYeZ75buiH/0EG5HxI7On/QfUBEE3qIf8KfGgxb479HbRw6RnX4bf EhQzf7eyZvvk43Xk3OYwq8Ux1SOiXQEo+8TpCSaM/KN57cJbjGB4GCUK6JX8qJCx 7MfXBdrhkAdw5V4lEBQMYKp4pdUdgYKRXavhLevm0qFjX1Swl6LIHxLtjFTKyX9S Xfxi09J7EUs7SsI35pdlMtPQkklEUXE96S/W3RCEpR+OfgbVMkYkcQI8TGb7ib3l xLNJrSgFDSlP5F3rN5SYIItAqboXb7iLp7SiF2ByXV43yexIrzTH0bwdwPwpZHhk 2ziVieX5WXEX4tgzZkRj =+bLo -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: - Fix for a Haswell regression in nested virtualization, introduced during the merge window. - A fix from Oleg to async page faults. - A bunch of small ARM changes. - A trivial patch to use the new MSI-X API introduced during the merge window. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: ARM: vgic: Fix the overlap check action about setting the GICD & GICC base address. KVM: arm/arm64: vgic: fix GICD_ICFGR register accesses KVM: async_pf: mm->mm_users can not pin apf->mm KVM: ARM: vgic: Fix sgi dispatch problem MAINTAINERS: co-maintainance of KVM/{arm,arm64} arm: KVM: fix possible misalignment of PGDs and bounce page KVM: x86: Check for host supported fields in shadow vmcs kvm: Use pci_enable_msix_exact() instead of pci_enable_msix() ARM: KVM: disable KVM in Kconfig on big-endian systems	2014-05-02 09:26:09 -07:00
Rob Herring	1bac186994	x86: use FDT accessors for FDT blob header data Remove the direct accesses to FDT header data using accessor function instead. This makes the code more readable and makes the FDT blob structure more opaque to the arch code. This also prepares for removing struct boot_param_header completely. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Tested-by: Grant Likely <grant.likely@linaro.org>	2014-04-30 00:59:19 -05:00
Marcelo Tosatti	e4c9a5a175	KVM: x86: expose invariant tsc cpuid bit (v2) Invariant TSC is a property of TSC, no additional support code necessary. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-04-29 15:22:43 +02:00
Thomas Gleixner	62a08ae2a5	genirq: x86: Ensure that dynamic irq allocation does not conflict On x86 the allocation of irq descriptors may allocate interrupts which are in the range of the GSI interrupts. That's wrong as those interrupts are hardwired and we don't have the irq domain translation like PPC. So one of these interrupts can be hooked up later to one of the devices which are hard wired to it and the io_apic init code for that particular interrupt line happily reuses that descriptor with a completely different configuration so hell breaks lose. Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs, except for a few usage sites which have not yet blown up in our face for whatever reason. But for drivers which need an irq range, like the GPIO drivers, we have no limit in place and we don't want to expose such a detail to a driver. To cure this introduce a function which an architecture can implement to impose a lower bound on the dynamic interrupt allocations. Implement it for x86 and set the lower bound to nr_gsi_irqs, which is the end of the hardwired interrupt space, so all dynamic allocations happen above. That not only allows the GPIO driver to work sanely, it also protects the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and htirq code. They need to be cleaned up as well, but that's a separate issue. Reported-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Krogerus Heikki <heikki.krogerus@intel.com> Cc: Linus Walleij <linus.walleij@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-04-28 12:20:00 +02:00
Bandan Das	fe2b201b3b	KVM: x86: Check for host supported fields in shadow vmcs We track shadow vmcs fields through two static lists, one for read only and another for r/w fields. However, with addition of new vmcs fields, not all fields may be supported on all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on unsupported hosts will result in a vmwrite error. For example, commit `36be0b9deb` introduced GUEST_BNDCFGS, which is not supported by all processors. Filter out host unsupported fields before letting guests use shadow vmcs Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-04-28 11:14:51 +02:00
Oren Twaig	39025ba382	x86/vsmp: Fix irq routing Correct IRQ routing in case a vSMP box is detected but the Interrupt Routing Comply (IRC) value is set to "comply", which leads to incorrect IRQ routing. Before the patch: When a vSMP box was detected and IRC was set to "comply", users (and the kernel) couldn't effectively set the destination of the IRQs. This is because the hook inside vsmp_64.c always setup all CPUs as the IRQ destination using cpumask_setall() as the return value for IRQ allocation mask. Later, this "overrided" mask caused the kernel to set the IRQ destination to the lowest online CPU in the mask (CPU0 usually). After the patch: When the IRC is set to "comply", users (and the kernel) can control the destination of the IRQs as we will not be changing the default "apic->vector_allocation_domain". Signed-off-by: Oren Twaig <oren@scalemp.com> Acked-by: Shai Fultheim <shai@scalemp.com> Link: http://lkml.kernel.org/r/1398669697-2123-1-git-send-email-oren@scalemp.com [ Minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-28 09:27:34 +02:00
Greg Kroah-Hartman	f379a07109	Merge 3.15-rc3 into tty-next	2014-04-27 21:40:39 -07:00
Bjorn Helgaas	44c8bdbe32	x86/PCI: Mark ATI SBx00 HPET BAR as IORESOURCE_PCI_FIXED Bodo reported that on the Asrock M3A UCC, v3.12.6 hangs during boot unless he uses "pci=nocrs". This regression was caused by `7bc5e3f2be` ("x86/PCI: use host bridge _CRS info by default on 2008 and newer machines"), which appeared in v2.6.34. The reason is that the HPET address appears in a PCI device BAR, and this address is not contained in any of the host bridge windows. Linux moves the PCI BAR into a window, but the original address was published via the HPET table and an ACPI device, so changing the BAR is a bad idea. Here's the dmesg info: ACPI: HPET id: 0x43538301 base: 0xfed00000 pci_root PNP0A03:00: host bridge window [mem 0xd0000000-0xdfffffff] pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xfebfffff] pci 0000:00:14.0: [1002:4385] type 0 class 0x000c05 pci 0000:00:14.0: reg 14: [mem 0xfed00000-0xfed003ff] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0 pnp 00:06: Plug and Play ACPI device, IDs PNP0103 (active) pnp 00:06: [mem 0xfed00000-0xfed003ff] When we notice the BAR is not in a host bridge window, we try to move it, but that causes a hang shortly thereafter: pci 0000:00:14.0: no compatible bridge window for [mem 0xfed00000-0xfed003ff] pci 0000:00:14.0: BAR 1: assigned [mem 0xf0000000-0xf00003ff] This patch marks the BAR as IORESOURCE_PCI_FIXED to prevent Linux from moving it. This depends on a previous patch ("x86/PCI: Don't try to move IORESOURCE_PCI_FIXED resources") to check for this flag when pci_claim_resource() fails. Link: https://bugzilla.kernel.org/show_bug.cgi?id=68591 Reported-and-tested-by: Bodo Eggert <7eggert@gmx.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-04-25 11:09:04 -06:00
Bjorn Helgaas	4e4ba9441f	x86/PCI: Don't try to move IORESOURCE_PCI_FIXED resources Don't attempt to move resource marked IORESOURCE_PCI_FIXED, even if pci_claim_resource() fails. In some cases, these are legacy resources that cannot be moved. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-04-25 11:09:04 -06:00
Bjorn Helgaas	0b2d70764b	x86/PCI: Fix Broadcom CNB20LE unintended sign extension In the expression "word1 << 16", word1 starts as u16, but is promoted to a signed int, then sign-extended to resource_size_t, which is probably not what was intended. Cast to resource_size_t to avoid the sign extension. Found by Coverity (CID 138749, 138750). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-04-25 11:01:08 -06:00
Ingo Molnar	42ebd27bcb	Merge branch 'perf/urgent' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-25 10:04:22 +02:00
Rob Herring	d20642f0a3	x86: move FIX_EARLYCON_MEM kconfig into x86 In preparation to support FIX_EARLYCON_MEM on other arches, make the option per arch. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-04-24 16:32:26 -07:00
Ian Campbell	5e40704ed2	arm: xen: implement multicall hypercall support. As part of this make the usual change to xen_ulong_t in place of unsigned long. This change has no impact on x86. The Linux definition of struct multicall_entry.result differs from the Xen definition, I think for good reasons, and used a long rather than an unsigned long. Therefore introduce a xen_long_t, which is a long on x86 architectures and a signed 64-bit integer on ARM. Use uint32_t nr_calls on x86 for consistency with the ARM definition. Build tested on amd64 and i386 builds. Runtime tested on ARM. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-04-24 13:09:46 +01:00
Stephane Eranian	9f7ff8931e	perf/x86: Fix RAPL rdmsrl_safe() usage This patch fixes a bug introduced by: `2422365780` ("perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU") The rdmsrl_safe() function returns 0 on success. The current code was failing to detect the RAPL PMU on real hardware (missing /sys/devices/power) because the return value of rdmsrl_safe() was misinterpreted. Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Venkatesh Srinivas <venkateshs@google.com> Cc: peterz@infradead.org Cc: zheng.z.yan@intel.com Link: http://lkml.kernel.org/r/20140423170418.GA12767@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-24 08:12:41 +02:00
Xiao Guangrong	198c74f43f	KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Now we can flush all the TLBs out of the mmu lock without TLB corruption when write-proect the sptes, it is because: - we have marked large sptes readonly instead of dropping them that means we just change the spte from writable to readonly so that we only need to care the case of changing spte from present to present (changing the spte from present to nonpresent will flush all the TLBs immediately), in other words, the only case we need to care is mmu_spte_update() - in mmu_spte_update(), we haved checked SPTE_HOST_WRITEABLE \| PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK anymore Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:49:52 -03:00
Xiao Guangrong	7f31c9595e	KVM: MMU: flush tlb if the spte can be locklessly modified Relax the tlb flush condition since we will write-protect the spte out of mmu lock. Note lockless write-protection only marks the writable spte to readonly and the spte can be writable only if both SPTE_HOST_WRITEABLE and SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable) This patch is used to avoid this kind of race: VCPU 0 VCPU 1 lockless wirte protection: set spte.w = 0 lock mmu-lock write protection the spte to sync shadow page, see spte.w = 0, then without flush tlb unlock mmu-lock !!! At this point, the shadow page can still be writable due to the corrupt tlb entry Flush all TLB Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:49:51 -03:00
Xiao Guangrong	c126d94f2c	KVM: MMU: lazily drop large spte Currently, kvm zaps the large spte if write-protected is needed, the later read can fault on that spte. Actually, we can make the large spte readonly instead of making them un-present, the page fault caused by read access can be avoided The idea is from Avi: \| As I mentioned before, write-protecting a large spte is a good idea, \| since it moves some work from protect-time to fault-time, so it reduces \| jitter. This removes the need for the return value. This version has fixed the issue reported in `6b73a9606`, the reason of that issue is that fast_page_fault() directly sets the readonly large spte to writable but only dirty the first page into the dirty-bitmap that means other pages are missed. Fixed it by only the normal sptes (on the PT_PAGE_TABLE_LEVEL level) can be fast fixed Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:49:50 -03:00
Xiao Guangrong	92a476cbfc	KVM: MMU: properly check last spte in fast_page_fault() Using sp->role.level instead of @level since @level is not got from the page table hierarchy There is no issue in current code since the fast page fault currently only fixes the fault caused by dirty-log that is always on the last level (level = 1) This patch makes the code more readable and avoids potential issue in the further development Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:49:49 -03:00
Xiao Guangrong	a086f6a1eb	Revert "KVM: Simplify kvm->tlbs_dirty handling" This reverts commit `5befdc385d`. Since we will allow flush tlb out of mmu-lock in the later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:49:48 -03:00
Nadav Amit	42bf549f3c	KVM: x86: Processor mode may be determined incorrectly If EFER.LMA is off, cs.l does not determine execution mode. Currently, the emulation engine assumes differently. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:47:00 -03:00
Nadav Amit	e6e39f0438	KVM: x86: IN instruction emulation should ignore REP-prefix The IN instruction is not be affected by REP-prefix as INS is. Therefore, the emulation should ignore the REP prefix as well. The current emulator implementation tries to perform writeback when IN instruction with REP-prefix is emulated. This causes it to perform wrong memory write or spurious #GP exception to be injected to the guest. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:46:59 -03:00
Nadav Amit	346874c950	KVM: x86: Fix CR3 reserved bits According to Intel specifications, PAE and non-PAE does not have any reserved bits. In long-mode, regardless to PCIDE, only the high bits (above the physical address) are reserved. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:46:57 -03:00
Nadav Amit	671bd9934a	KVM: x86: Fix wrong/stuck PMU when guest does not use PMI If a guest enables a performance counter but does not enable PMI, the hypervisor currently does not reprogram the performance counter once it overflows. As a result the host performance counter is kept with the original sampling period which was configured according to the value of the guest's counter when the counter was enabled. Such behaviour can cause very bad consequences. The most distrubing one can cause the guest not to make any progress at all, and keep exiting due to host PMI before any guest instructions is exeucted. This situation occurs when the performance counter holds a very high value when the guest enables the performance counter. As a result the host's sampling period is configured to be very short. The host then never reconfigures the sampling period and get stuck at entry->PMI->exit loop. We encountered such a scenario in our experiments. The solution is to reprogram the counter even if the guest does not use PMI. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:46:52 -03:00
Bandan Das	e0ba1a6ffc	KVM: nVMX: Advertise support for interrupt acknowledgement Some Type 1 hypervisors such as XEN won't enable VMX without it present Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-22 18:41:34 -03:00
Bandan Das	77b0f5d67f	KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to This feature emulates the "Acknowledge interrupt on exit" behavior. We can safely emulate it for L1 to run L2 even if L0 itself has it disabled (to run L1). Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-22 18:41:33 -03:00
Bandan Das	4b85507860	KVM: nVMX: Don't advertise single context invalidation for invept For single context invalidation, we fall through to global invalidation in handle_invept() except for one case - when the operand supplied by L1 is different from what we have in vmcs12. However, typically hypervisors will only call invept for the currently loaded eptp, so the condition will never be true. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-22 18:41:28 -03:00
Huw Davies	fd2a445a94	KVM: VMX: Advance rip to after an ICEBP instruction When entering an exception after an ICEBP, the saved instruction pointer should point to after the instruction. This fixes the bug here: https://bugs.launchpad.net/qemu/+bug/1119686 Signed-off-by: Huw Davies <huw@codeweavers.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-22 18:37:43 -03:00
Linus Torvalds	39bfe90706	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso fix from Peter Anvin: "This is a single build fix for building with gold as opposed to GNU ld. It got queued up separately and was expected to be pushed during the merge window, but it got left behind" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, vdso: Make the vdso linker script compatible with Gold	2014-04-22 09:09:06 -07:00

1 2 3 4 5 ...

19250 Commits