OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Gleb Natapov	05988d728d	KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped Quote Gleb's mail: \| why don't we check for sp->role.invalid in \| kvm_mmu_prepare_zap_page before calling kvm_reload_remote_mmus()? and \| Actually we can add check for is_obsolete_sp() there too since \| kvm_mmu_invalidate_all_pages() already calls kvm_reload_remote_mmus() \| after incrementing mmu_valid_gen. [ Xiao: add some comments and the check of is_obsolete_sp() ] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:34:02 +03:00
Xiao Guangrong	365c886860	KVM: MMU: reclaim the zapped-obsolete page first As Marcelo pointed out that \| "(retention of large number of pages while zapping) \| can be fatal, it can lead to OOM and host crash" We introduce a list, kvm->arch.zapped_obsolete_pages, to link all the pages which are deleted from the mmu cache but not actually freed. When page reclaiming is needed, we always zap this kind of pages first. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:33:33 +03:00
Xiao Guangrong	f34d251d66	KVM: MMU: collapse TLB flushes when zap all pages kvm_zap_obsolete_pages uses lock-break technique to zap pages, it will flush tlb every time when it does lock-break We can reload mmu on all vcpus after updating the generation number so that the obsolete pages are not used on any vcpus, after that we do not need to flush tlb when obsolete pages are zapped It will do kvm_mmu_prepare_zap_page many times and use one kvm_mmu_commit_zap_page to collapse tlb flush, the side-effects is that causes obsolete pages unlinked from active_list but leave on hash-list, so we add the comment around the hash list walker Note: kvm_mmu_commit_zap_page is still needed before free the pages since other vcpus may be doing locklessly shadow page walking Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:33:18 +03:00
Xiao Guangrong	e7d11c7a89	KVM: MMU: zap pages in batch Zap at lease 10 pages before releasing mmu-lock to reduce the overload caused by requiring lock After the patch, kvm_zap_obsolete_pages can forward progress anyway, so update the comments [ It improves the case 0.6% ~ 1% that do kernel building meanwhile read PCI ROM. ] Note: i am not sure that "10" is the best speculative value, i just guessed that '10' can make vcpu do not spend long time on kvm_zap_obsolete_pages and do not cause mmu-lock too hungry. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:33:10 +03:00
Xiao Guangrong	7f52af7412	KVM: MMU: do not reuse the obsolete page The obsolete page will be zapped soon, do not reuse it to reduce future page fault Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:33:04 +03:00
Xiao Guangrong	35006126f0	KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages It is good for debug and development Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:32:57 +03:00
Xiao Guangrong	2248b02321	KVM: MMU: show mmu_valid_gen in shadow page related tracepoints Show sp->mmu_valid_gen Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:32:49 +03:00
Xiao Guangrong	6ca18b6950	KVM: x86: use the fast way to invalidate all pages Replace kvm_mmu_zap_all by kvm_mmu_invalidate_zap_all_pages Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:32:42 +03:00
Xiao Guangrong	5304b8d37c	KVM: MMU: fast invalidate all pages The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to walk and zap all shadow pages one by one, also it need to zap all guest page's rmap and all shadow page's parent spte list. Particularly, things become worse if guest uses more memory or vcpus. It is not good for scalability In this patch, we introduce a faster way to invalidate all shadow pages. KVM maintains a global mmu invalid generation-number which is stored in kvm->arch.mmu_valid_gen and every shadow page stores the current global generation-number into sp->mmu_valid_gen when it is created When KVM need zap all shadow pages sptes, it just simply increase the global generation-number then reload root shadow pages on all vcpus. Vcpu will create a new shadow page table according to current kvm's generation-number. It ensures the old pages are not used any more. Then the obsolete pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen) are zapped by using lock-break technique Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:32:33 +03:00
Xiao Guangrong	a2ae162265	KVM: MMU: drop unnecessary kvm_reload_remote_mmus It is the responsibility of kvm_mmu_zap_all that keeps the consistent of mmu and tlbs. And it is also unnecessary after zap all mmio sptes since no mmio spte exists on root shadow page and it can not be cached into tlb Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:32:24 +03:00
Xiao Guangrong	758ccc89b8	KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall Quote Gleb's mail: \| Back then kvm->lock protected memslot access so code like: \| \| mutex_lock(&vcpu->kvm->lock); \| kvm_mmu_zap_all(vcpu->kvm); \| mutex_unlock(&vcpu->kvm->lock); \| \| which is what `7aa81cc0` does was enough to guaranty that no vcpu will \| run while code is patched. This is no longer the case and \| mutex_lock(&vcpu->kvm->lock); is gone from that code path long time ago, \| so now kvm_mmu_zap_all() there is useless and the code is incorrect. So we drop it and it will be fixed later Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-05 12:32:00 +03:00
Linus Torvalds	8b35c35955	Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm bugfixes from Gleb Natapov: "The bulk of the fixes is in MIPS KVM kernel<->userspace ABI. MIPS KVM is new for 3.10 and some problems were found with current ABI. It is better to fix them now and do not have a kernel with broken one" * 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix race in apic->pending_events processing KVM: fix sil/dil/bpl/spl in the mod/rm fields KVM: Emulate multibyte NOP ARM: KVM: be more thorough when invalidating TLBs ARM: KVM: prevent NULL pointer dereferences with KVM VCPU ioctl mips/kvm: Use ENOIOCTLCMD to indicate unimplemented ioctls. mips/kvm: Fix ABI by moving manipulation of CP0 registers to KVM_{G,S}ET_ONE_REG mips/kvm: Use ARRAY_SIZE() instead of hardcoded constants in kvm_arch_vcpu_ioctl_{s,g}et_regs mips/kvm: Fix name of gpr field in struct kvm_regs. mips/kvm: Fix ABI for use of 64-bit registers. mips/kvm: Fix ABI for use of FPU.	2013-06-05 09:09:35 +09:00
Michael Wang	e8747f10ba	x86, cleanups: Remove extra tab in __flush_tlb_one() Remove the extra tab in __flush_tlb_one(). CC: Alex Shi <alex.shi@intel.com> CC: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/51AD8902.60603@linux.vnet.ibm.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-06-04 11:51:26 -07:00
Konrad Rzeszutek Wilk	466318a87f	xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU. The xen_play_dead is an undead function. When the vCPU is told to offline it ends up calling xen_play_dead wherin it calls the VCPUOP_down hypercall which offlines the vCPU. However, when the vCPU is onlined back, it resumes execution right after VCPUOP_down hypercall. That was OK (albeit the API for play_dead assumes that the CPU stays dead and never returns) but with commit `4b0c0f294` (tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe as said commit resets the ts->inidle which at the start of the cpu_idle loop was set. The net effect is that we get this warn: Broke affinity for irq 16 installing Xen timer for CPU 1 cpu 1 spinlock event irq 48 ------------[ cut here ]------------ WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33 #1 Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009 ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0 ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010 0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0 Call Trace: [<ffffffff816707c8>] dump_stack+0x19/0x1b [<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0 [<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20 [<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0 [<ffffffff810da755>] cpu_startup_entry+0x205/0x250 [<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15 ---[ end trace 915c8c486004dda1 ]--- b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround to call tick_nohz_idle_enter before returning from xen_play_dead() - and that is what this patch does and fixes the issue. We also add the stable part b/c git commit `4b0c0f294` is on the stable tree. CC: stable@vger.kernel.org Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-06-04 09:03:18 -04:00
Stephen Rothwell	40b313608a	Finally eradicate CONFIG_HOTPLUG Ever since commit `45f035ab9b` ("CONFIG_HOTPLUG should be always on"), it has been basically impossible to build a kernel with CONFIG_HOTPLUG turned off. Remove all the remaining references to it. Cc: Russell King <linux@arm.linux.org.uk> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-06-03 14:20:18 -07:00
Gleb Natapov	299018f44a	KVM: Fix race in apic->pending_events processing apic->pending_events processing has a race that may cause INIT and SIPI processing to be reordered: vpu0: vcpu1: set INIT test_and_clear_bit(KVM_APIC_INIT) process INIT set INIT set SIPI test_and_clear_bit(KVM_APIC_SIPI) process SIPI At the end INIT is left pending in pending_events. The following patch fixes this by latching pending event before processing them. Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-03 11:32:39 +03:00
Paolo Bonzini	8acb42070e	KVM: fix sil/dil/bpl/spl in the mod/rm fields The x86-64 extended low-byte registers were fetched correctly from reg, but not from mod/rm. This fixes another bug in the boot of RHEL5.9 64-bit, but it is still not enough. Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-03 11:27:12 +03:00
Paolo Bonzini	103f98ea64	KVM: Emulate multibyte NOP This is encountered when booting RHEL5.9 64-bit. There is another bug after this one that is not a simple emulation failure, but this one lets the boot proceed a bit. Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-03 11:20:53 +03:00
Jacob Shin	6b3389ac21	x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m Fix section mismatch warnings on microcode_amd_early. Compile error occurs when CONFIG_MICROCODE=m, change so that early loading depends on microcode_core. Reported-by: Yinghai Lu <yinghai@kernel.org> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/20130531150241.GA12006@jshin-Toonie Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-31 13:56:58 -07:00
Yinghai Lu	7de3d66b13	x86: Fix adjust_range_size_mask calling position Commit `8d57470d` x86, mm: setup page table in top-down causes a kernel panic while setting mem=2G. [mem 0x00000000-0x000fffff] page 4k [mem 0x7fe00000-0x7fffffff] page 1G [mem 0x7c000000-0x7fdfffff] page 1G [mem 0x00100000-0x001fffff] page 4k [mem 0x00200000-0x7bffffff] page 2M for last entry is not what we want, we should have [mem 0x00200000-0x3fffffff] page 2M [mem 0x40000000-0x7bffffff] page 1G Actually we merge the continuous ranges with same page size too early. in this case, before merging we have [mem 0x00200000-0x3fffffff] page 2M [mem 0x40000000-0x7bffffff] page 2M after merging them, will get [mem 0x00200000-0x7bffffff] page 2M even we can use 1G page to map [mem 0x40000000-0x7bffffff] that will cause problem, because we already map [mem 0x7fe00000-0x7fffffff] page 1G [mem 0x7c000000-0x7fdfffff] page 1G with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already. During phys_pud_init() for [0x40000000-0x7bffffff], it will not reuse existing that pud page, and allocate new one then try to use 2M page to map it instead, as page_size_mask does not include PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop in phys_pmd_init stop mapping at 0x7bffffff. That is right behavoir, it maps exact range with exact page size that we ask, and we should explicitly call it to map [7c000000-0x7fffffff] before or after mapping 0x40000000-0x7bffffff. Anyway we need to make sure ranges' page_size_mask correct and consistent after split_mem_range for each range. Fix that by calling adjust_range_size_mask before merging range with same page size. -v2: update change log. -v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and it causes panic. Bisected-by: "Xie, ChanglongX" <changlongx.xie@intel.com> Bisected-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reported-and-tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1370015587-20835-1-git-send-email-yinghai@kernel.org Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-31 13:21:32 -07:00
Paul Bolle	71c69f7f4b	x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL The Kconfig symbol X86_MCE_P4THERMAL was removed in v2.6.32. Remove a useless check for its macro, as it will now always evaluate to false. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Link: http://lkml.kernel.org/r/1369853850.23034.28.camel@x61.thuisdomein Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-31 13:12:35 +02:00
Andrew Jones	b0bc225d0e	sched/x86: Construct all sibling maps if smt Commit `316ad24830` ("sched/x86: Rewrite set_cpu_sibling_map()") broke the construction of sibling maps, which also broke the booted_cores accounting. Before the rewrite, if smt was present, then each map was updated for each smt sibling. After the rewrite only cpu_sibling_mask gets updated, as the llc and core maps depend on 'has_mc = x86_max_cores > 1' instead. This leads to problems with topologies like the following (qemu -smp sockets=2,cores=1,threads=2) processor : 0 physical id : 0 siblings : 1 <= should be 2 core id : 0 cpu cores : 1 processor : 1 physical id : 0 siblings : 1 <= should be 2 core id : 0 cpu cores : 0 <= should be 1 processor : 2 physical id : 1 siblings : 1 <= should be 2 core id : 0 cpu cores : 1 processor : 3 physical id : 1 siblings : 1 <= should be 2 core id : 0 cpu cores : 0 <= should be 1 This patch restores the former construction by defining has_mc as (has_smt \|\| x86_max_cores > 1). This should be fine as there were no (has_smt && !has_mc) conditions in the context. Aso rename has_mc to has_mp now that it's not just for cores. Signed-off-by: Andrew Jones <drjones@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: a.p.zijlstra@chello.nl Cc: fenghua.yu@intel.com Link: http://lkml.kernel.org/r/1369831695-11970-1-git-send-email-drjones@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-31 13:10:38 +02:00
Jan Beulich	1d10f6ee60	x86: __force_order doesn't need to be an actual variable It being static causes over a dozen instances to be scattered across the kernel image, with non of them ever being referenced in any way. Making the variable extern without ever defining it works as well - all we need is to have the compiler think the variable is being accessed. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A610B802000078000D99A0@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-31 13:09:17 +02:00
Jan Beulich	cbdce7b251	ix86: Don't waste fixmap entries The vsyscall related pvclock entries can only ever be used on x86-64, and hence they shouldn't even get allocated for 32-bit kernels (the more that it is there where address space is relatively precious). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/51A60F1F02000078000D997C@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-31 13:08:18 +02:00
Jacob Shin	757885e94a	x86, microcode, amd: Early microcode patch loading support for AMD Add early microcode patch loading support for AMD. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1369940959-2077-5-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>	2013-05-30 20:19:25 -07:00
Jacob Shin	a76096a657	x86, microcode, amd: Refactor functions to prepare for early loading In preparation work for early loading, refactor some common functions that will be shared, and move some struct defines to a common header file. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1369940959-2077-4-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>	2013-05-30 20:19:25 -07:00
Jacob Shin	f2b3ee820a	x86, microcode: Vendor abstract out save_microcode_in_initrd() Currently save_microcode_in_initrd() is declared in vendor neutural microcode.h file, but defined in vendor specific microcode_intel_early.c file. Vendor abstract it out to microcode_core_early.c with a wrapper function. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1369940959-2077-3-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>	2013-05-30 20:19:25 -07:00
Borislav Petkov	83b325f1b4	x86, microcode, intel: Correct typo in printk User-visible so correct it. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1369940959-2077-2-git-send-email-jacob.shin@amd.com Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>	2013-05-30 20:19:25 -07:00
Andy Lutomirski	d0d98eedee	Add arch_phys_wc_{add, del} to manipulate WC MTRRs if needed Several drivers currently use mtrr_add through various #ifdef guards and/or drm wrappers. The vast majority of them want to add WC MTRRs on x86 systems and don't actually need the MTRR if PAT (i.e. ioremap_wc, etc) are working. arch_phys_wc_add and arch_phys_wc_del are new functions, available on all architectures and configurations, that add WC MTRRs on x86 if needed (and handle errors) and do nothing at all otherwise. They're also easier to use than mtrr_add and mtrr_del, so the call sites can be simplified. As an added benefit, this will avoid wasting MTRRs and possibly warning pointlessly on PAT-supporting systems. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-05-31 13:02:52 +10:00
Linus Torvalds	484b002e28	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: - Three EFI-related fixes - Two early memory initialization fixes - build fix for older binutils - fix for an eager FPU performance regression -- currently we don't allow the use of the FPU at interrupt time at all in eager mode, which is clearly wrong. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Allow FPU to be used at interrupt time even with eagerfpu x86, crc32-pclmul: Fix build with older binutils x86-64, init: Fix a possible wraparound bug in switchover in head_64.S x86, range: fix missing merge during add range x86, efi: initial the local variable of DataSize to zero efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse efivarfs: Never return ENOENT from firmware again	2013-05-31 09:44:10 +09:00
Pekka Riikonen	5187b28ff0	x86: Allow FPU to be used at interrupt time even with eagerfpu With the addition of eagerfpu the irq_fpu_usable() now returns false negatives especially in the case of ksoftirqd and interrupted idle task, two common cases for FPU use for example in networking/crypto. With eagerfpu=off FPU use is possible in those contexts. This is because of the eagerfpu check in interrupted_kernel_fpu_idle(): ... * For now, with eagerfpu we will return interrupted kernel FPU * state as not-idle. TBD: Ideally we can change the return value * to something like __thread_has_fpu(current). But we need to * be careful of doing __thread_clear_has_fpu() before saving * the FPU etc for supporting nested uses etc. For now, take * the simple route! ... if (use_eager_fpu()) return 0; As eagerfpu is automatically "on" on those CPUs that also have the features like AES-NI this patch changes the eagerfpu check to return 1 in case the kernel_fpu_begin() has not been said yet. Once it has been the __thread_has_fpu() will start returning 0. Notice that with eagerfpu the __thread_has_fpu is always true initially. FPU use is thus always possible no matter what task is under us, unless the state has already been saved with kernel_fpu_begin(). [ hpa: this is a performance regression, not a correctness regression, but since it can be quite serious on CPUs which need encryption at interrupt time I am marking this for urgent/stable. ] Signed-off-by: Pekka Riikonen <priikone@iki.fi> Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org Cc: <stable@vger.kernel.org> v3.7+ Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-30 16:36:42 -07:00
Jan Beulich	2baad6121e	x86, crc32-pclmul: Fix build with older binutils binutils prior to 2.18 (e.g. the ones found on SLE10) don't support assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ in the same file should be used. This requires making the helper macros capable of recognizing 32-bit general purpose register operands. [ hpa: tagging for stable as it is a low risk build fix ] Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com Cc: Alexander Boyko <alexander_boyko@xyratex.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-30 16:36:23 -07:00
Linus Torvalds	3655b22de0	Fixes: - Use proper error paths - Clean up APIC IPI usage (incorrect arguments) - Delay XenBus frontend resume is backend (xenstored) is not running - Fix build error with various combinations of CONFIG_ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRp59pAAoJEFjIrFwIi8fJRIYIAKvRh2Dp/AB44ZN97MW/QhEN NUvrSTYr2HlqcUW7bv0ScrMLb0LlFeo+9s/bo0KI2+2F+zK822WPC+2KEZmzQIVs q261dNsA3/HoyBDOLwWjatjsSus+njBOEgDIwARPwhkoon4fRXBnRJVMy+0bZC3I fpd1nlUy0J7jW0QLO5ueKqd5ZN0Mkwn2H4+D8TOPVYHCnk3mT2W+qLCEJmkMxOuZ iFYy95K1ky5r0leUUwCTUIGLmgftoh0Qo/RweXSmzuLiZrY+5ilike3gxQSiAjsM lIjq+gKXNJJGz4M6wbOTfDzb/WQnKD+2PqlsbulrTD7E6RD6wIsqG/zvc1RqHqw= =9gi8 -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fixes from Konrad Rzeszutek Wilk: - Use proper error paths - Clean up APIC IPI usage (incorrect arguments) - Delay XenBus frontend resume is backend (xenstored) is not running - Fix build error with various combinations of CONFIG_ * tag 'stable/for-linus-3.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xenbus_client.c: correct exit path for xenbus_map_ring_valloc_hvm xen-pciback: more uses of cached MSI-X capability offset xen: Clean up apic ipi interface xenbus: save xenstore local status for later use xenbus: delay xenbus frontend resume if xenstored is not running xmem/tmem: fix 'undefined variable' build error.	2013-05-31 06:01:18 +09:00
Dimitri Sivanich	879d5ad0dc	x86/UV: Add GRU distributed mode mappings GRU hardware will support an optional distributed mode that will allow per-node address mapping of local GRU space, as opposed to mapping all GRU hardware to the same contiguous high space. If GRU distributed mode is selected, setup per-node page table mappings. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Russ Anderson <rja@sgi.com> Cc: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130529155609.GB22917@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-30 09:46:09 +02:00
John Stultz	ce0b098981	x86: Fix vrtc_get_time/set_mmss to use new timespec interface The patch "x86: Increase precision of x86_platform.get/set_wallclock" changed the x86 platform set_wallclock/get_wallclock interfaces to use nsec granular timespecs instead of a second granular interface. However, that patch missed converting the vrtc code, so this patch converts those functions to use timespecs. Many thanks to the kbuild test robot for finding this! Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-05-29 12:57:35 -07:00
Stefan Bader	1db01b4903	xen: Clean up apic ipi interface Commit `f447d56d36` introduced the implementation of the PV apic ipi interface. But there were some odd things (it seems none of which cause really any issue but maybe they should be cleaned up anyway): - xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself) ignore the passed in vector and only use the CALL_FUNCTION_SINGLE vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector. - physflat_send_IPI_allbutself is declared unnecessarily. It is never used. This patch tries to clean up those things. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-05-29 09:04:21 -04:00
Zhang Yanfei	e9d0626ed4	x86-64, init: Fix a possible wraparound bug in switchover in head_64.S In head_64.S, a switchover has been used to handle kernel crossing 1G, 512G boundaries. And commit `8170e6bed4` x86, 64bit: Use a #PF handler to materialize early mappings on demand said: During the switchover in head_64.S, before #PF handler is available, we use three pages to handle kernel crossing 1G, 512G boundaries with sharing page by playing games with page aliasing: the same page is mapped twice in the higher-level tables with appropriate wraparound. But from the switchover code, when we set up the PUD table: 114 addq $4096, %rdx 115 movq %rdi, %rax 116 shrq $PUD_SHIFT, %rax 117 andl $(PTRS_PER_PUD-1), %eax 118 movq %rdx, (4096+0)(%rbx,%rax,8) 119 movq %rdx, (4096+8)(%rbx,%rax,8) It seems line 119 has a potential bug there. For example, if the kernel is loaded at physical address 511G+1008M, that is 000000000 111111111 111111000 000000000000000000000 and the kernel _end is 512G+2M, that is 000000001 000000000 000000001 000000000000000000000 So in this example, when using the 2nd page to setup PUD (line 114~119), rax is 511. In line 118, we put rdx which is the address of the PMD page (the 3rd page) into entry 511 of the PUD table. But in line 119, the entry we calculate from (4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line 119 should be wraparound into entry 0 of the PUD table. The patch fixes the bug. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.com Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-28 15:41:59 -07:00
David Vrabel	3565184ed0	x86: Increase precision of x86_platform.get/set_wallclock() All the virtualized platforms (KVM, lguest and Xen) have persistent wallclocks that have more than one second of precision. read_persistent_wallclock() and update_persistent_wallclock() allow for nanosecond precision but their implementation on x86 with x86_platform.get/set_wallclock() only allows for one second precision. This means guests may see a wallclock time that is off by up to 1 second. Make set_wallclock() and get_wallclock() take a struct timespec parameter (which allows for nanosecond precision) so KVM and Xen guests may start with a more accurate wallclock time and a Xen dom0 can maintain a more accurate wallclock for guests. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-05-28 14:00:59 -07:00
Linus Torvalds	30a9e50143	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This push fixes a crash in the new sha256_ssse3 driver as well as a DMA setup/teardown bug in caam" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations crypto: caam - fix inconsistent assoc dma mapping direction	2013-05-28 10:09:38 -07:00
Yijing Wang	ea221e6414	x86/PCI: Increase info->res_num before checking pci_use_crs We should increase info->res_num before we checking pci_use_crs return when pci=nocrs set. No functional change, since we don't use res_num and res_offset[] in the "!pci_use_crs" case anyway, but this makes the code read better. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jiang Liu <liuj97@gmail.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Feng Tang <feng.tang@intel.com>	2013-05-28 11:09:16 -06:00
Borislav Petkov	46ff53874b	x86, platform, kvm, kconfig: Turn existing .config's into KVM-capable configs Add an config file snippet which enables additional options useful for running the kernel in a kvm guest. When you execute 'make kvmconfig' it merges those options with an already existing user config before you build the kernel. Based on an patch from the external lkvm tree. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: penberg@kernel.org Cc: levinsasha928@gmail.com Cc: mtosatti@redhat.com Cc: fengguang.wu@intel.com Link: http://lkml.kernel.org/r/20130522144638.GB15085@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 12:11:32 +02:00
Thorsten Glaser	8ef726cb23	update AMD powerflags comments from http://www.flounder.com/cpuid80000007amd.gif and http://support.amd.com/us/Embedded_TechDocs/25481.pdf Signed-off-by: Thorsten Glaser <t.glaser@tarent.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-05-28 12:02:10 +02:00
Zhang Yanfei	592a9b8cc8	x86/mm: Drop unneeded include <asm/pgtable, page_types.h> arch/x86/boot/compressed/head_64.S includes <asm/pgtable_types.h> and <asm/page_types.h> but it doesn't look like it needs them. So remove them. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Link: http://lkml.kernel.org/r/5191FAE2.4020403@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 11:47:23 +02:00
Zhang Yanfei	1e3b308181	x86_64: Correct phys_addr in cleanup_highmap comment For x86_64, we have phys_base, which means the delta between the the address kernel is actually running at and the address kernel is compiled to run at. Not phys_addr so correct it. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Link: http://lkml.kernel.org/r/5192F9BF.2000802@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 11:41:24 +02:00
Jussi Kivilinna	a710f761fc	crypto: sha256_ssse3 - add sha224 support Add sha224 implementation to sha256_ssse3 module. This also fixes sha256_ssse3 module autoloading issue when 'sha224' is used before 'sha256'. Previously in such case, just sha256_generic was loaded and not sha256_ssse3 (since it did not provide sha224). Now if 'sha256' was used after 'sha224' usage, sha256_ssse3 would remain unloaded. Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-28 15:43:05 +08:00
Jussi Kivilinna	340991e30c	crypto: sha512_ssse3 - add sha384 support Add sha384 implementation to sha512_ssse3 module. This also fixes sha512_ssse3 module autoloading issue when 'sha384' is used before 'sha512'. Previously in such case, just sha512_generic was loaded and not sha512_ssse3 (since it did not provide sha384). Now if 'sha512' was used after 'sha384' usage, sha512_ssse3 would remain unloaded. For example, this happens with tcrypt testing module since it tests 'sha384' before 'sha512'. Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-28 15:43:05 +08:00
Michael S. Tsirkin	016be2e55d	x86: uaccess s/might_sleep/might_fault/ The only reason uaccess routines might sleep is if they fault. Make this explicit. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1369577426-26721-9-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 09:41:10 +02:00
Peter Zijlstra	1b45adcd9a	perf/x86/amd: Rework AMD PMU init code Josh reported that his QEMU is a bad hardware emulator and trips a WARN in the AMD PMU init code. He requested the WARN be turned into a pr_err() or similar. While there, rework the code a little. Reported-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Robert Richter <rric@kernel.org> Acked-by: Jacob Shin <jacob.shin@amd.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130521110537.GG26912@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 09:13:55 +02:00
Stephane Eranian	2b923c8f5d	perf/x86: Check branch sampling priv level in generic code This patch moves commit `7cc23cd` to the generic code: perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL The check is now implemented in generic code instead of x86 specific code. That way we do not have to repeat the test in each arch supporting branch sampling. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/20130521105337.GA2879@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 09:13:54 +02:00
Dan Carpenter	13acac3075	perf/x86/intel: Prevent some shift wrapping bugs in the Intel uncore driver We're trying to use 64 bit masks but the shifts wrap so we can't use the high 32 bits. I've fixed this by changing several types to unsigned long long. This is a static checker fix. The one change which is clearly needed is "mask = 0xff << (idx * 8);" where the author obviously intended to use all 64 bits. The other changes are mostly to silence my static checker. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20130518183452.GA14587@elgon.mountain Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 09:13:52 +02:00
Jiri Olsa	ddd40da4cc	x86/signals: Merge EFLAGS bit clearing into a single statement Merging EFLAGS bit clearing into a single statement, to ensure EFLAGS bits are being cleared in a single instruction. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1367421944-19082-4-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 08:46:53 +02:00
Jiri Olsa	24cda10996	x86/signals: Clear RF EFLAGS bit for signal handler Clearing RF EFLAGS bit for signal handler. The reason is that this flag is set by debug exception code to prevent the recursive exception entry. Leaving it set for signal handler might prevent debug exception of the signal handler itself. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1367421944-19082-3-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 08:46:52 +02:00
Jiri Olsa	5e219b3c67	x86/signals: Propagate RF EFLAGS bit through the signal restore call While porting Vince's perf overflow tests I found perf event breakpoint overflow does not work properly. I found the x86 RF EFLAG bit not being set when returning from debug exception after triggering signal handler. Which is exactly what you get when you set perf breakpoint overflow SIGIO handler. This patch and the next two patches fix the underlying bugs. This patch adds the RF EFLAGS bit to be restored on return from signal from the original register context before the signal was entered. This will prevent the RF flag to disappear when returning from exception due to the signal handler being executed. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1367421944-19082-2-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-28 08:46:50 +02:00
Jussi Kivilinna	de614e561b	crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations The _XFER stack element size was set too small, 8 bytes, when it needs to be 16 bytes. As _XFER is the last stack element used by these implementations, the 16 byte stores with 'movdqa' corrupt the stack where the value of register %r12 is temporarily stored. As these implementations align the stack pointer to 16 bytes, this corruption did not happen every time. Patch corrects this issue. Reported-by: Julian Wollrath <jwollrath@web.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Tested-by: Julian Wollrath <jwollrath@web.de> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-28 13:46:47 +08:00
David S. Miller	e6ff4c75f9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge net into net-next because some upcoming net-next changes build on top of bug fixes that went into net. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-24 16:48:28 -07:00
Tim Chen	0b95a7f857	crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform Glue code that plugs the PCLMULQDQ accelerated CRC T10 DIF hash into the crypto framework. The config CRYPTO_CRCT10DIF_PCLMUL should be turned on to enable the feature. The crc_t10dif crypto library function will use this faster algorithm when crct10dif_pclmul module is loaded. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-24 17:55:27 +08:00
Linus Torvalds	b91fd4d5aa	PCI updates for v3.10: Moorestown Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0" Hotplug PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRnnUuAAoJEFmIoMA60/r8+JwQALrHdQvA8rGl/TNF0xkJjsU+ EfdkBj23+qYYsYNla7Z/CfY7k5gXTtymqauhzpa1VRoY/bkpSWMDBxK551CxgBae RW0JbZxfoswLlA9eVRUz6mlUl7hL0Ky0/E6d9/UBXSAC0wsN6BB11lqKgi/uxAr2 ACuZBZW1EZP9zoD615Id5gtRAZe8+9w47p8d6kQcyJntmCoZwo8Na/HRE/tR+wVY o3AHMxMOr9Ac+G3/rFl5ffzoHkLwTws4UEVNhvKBek3ogsAau3tgeIgjjvFRToFh HxsdpCmgOAfjCF/RC0AxRugtykHw4n3WlaND2WM6JWolLeIQTZr41GlwYy4F1CRG gSRk0dMNGBYs9OgDC81KJGYqstHwL0MTig2ul4BvlrLanLnySgVu1kbhaerR3WzA 11k0GwlJbkFMCbQLZDaq51KHAdDYnCsRVG/tOGmsM8puUPOTXu3pLbPk3SbgWVGc 7vbUkjJlHwlNoUbeyOOOElUMCRnIiCl/IetwjgZb+Qibq2dO4v6WF5GR+2DPBwua 8GNbzQXyNxOPzvYBHE4YgK5lcb7yVeLvlOtPgbM0UMqC1SnFISzJRiNpfbj21rAk ADT4lqL29vqkX37uIQ55KPeTTThA65EKUoI3+dNtGHLSPoKdt3QlJc9wwQNKxEzx EEHxGRkebYNcPB1GqwEl =M8/H -----END PGP SIGNATURE----- Merge tag 'pci-v3.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Here are some more fixes for v3.10. The Moorestown update broke Intel Medfield devices, so I reverted it. The acpiphp change fixes a regression: we broke hotplug notifications to host bridges when we split acpiphp into the host-bridge related part and the endpoint-related part. Moorestown Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0" Hotplug PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check" * tag 'pci-v3.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0" PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check	2013-05-23 13:50:53 -07:00
H. Peter Anvin	3979fdce77	* Avoid confusing the user by returning -EIO instead of -ENOENT in efivarfs if an EFI variable gets deleted from under us and return EOF when reading from a zero-length file - Lingzhu Xiang * Fix an oops in efivar_update_sysfs_entries() caused by reusing (and therefore corrupting) a kzalloc() allocation - Seiji Aguchi * Initialise the DataSize argument to GetVariable() otherwise it will not be updated with the actual size of the variable on return. Discovered on a Acer Aspire V3 BIOS - Lee, Chun-Yi -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJRlgJ5AAoJEC84WcCNIz1VcRgQAJWnQpuI+Y1RjPYuLb8ISZ92 jKnVSAQPaWDccf/KKx7qtk2iK6G1mExomOggp6PpV4/uHWqkQ64qoGL0HmRbmdH7 //gNXm+ITnSVojoTBBis2LYEYDUQ1aDp3vTevFT2OOMKzKRsgwFD5KbqLnPJCxnl kFHlxQSPa7ZYk6GXOVdBJ5ya7spU+Lk4ODiL4xM8EBkMBedUStedB5zNg5rEMLJq 5FZfs2Ryg7riOkhBnKWCvFjAj3PO/yHLVdrVrGZx+KpATHr/rbsbzDEp3+Pk8zEu 3aE1p/XLaB4Qk9WvwhWonWF7PSZP41SjrcEt77mTmYivk0cJHBORo/wJsom683BU 4elWR588Pi92nKEx4MSLREhkzP5RdVFIb4Yrg+1xKLEfI5Kfj1nEQrFVk53VM/rL L1m1bwhKLaNUVFJmxJYHaZ6VTjcB6lupz9/AMQr1N1UAavlmr4xEveNTfto+Ys7u /J3Z04wouhOGQonVDrZxxOzWFewq1kR/hRfwBAf/V0qemAeS61ZHGpfmmq00y5L2 jmXncQ5EspFIJfk7+KV2sPQfqC8U8b2jyfO36Ed9u3d/qQ6eiQ1rTZMLKGg1iSdW Z+94w7+huEdsNZShSUE8+HXNvzPCS6704PBa9YT292ZibWM9onk6AoBQy/snrEaO K+PKxkZyPBML77Vs4TSI =8+a3 -----END PGP SIGNATURE----- Merge tag 'efi-urgent' into x86/urgent * Avoid confusing the user by returning -EIO instead of -ENOENT in efivarfs if an EFI variable gets deleted from under us and return EOF when reading from a zero-length file - Lingzhu Xiang * Fix an oops in efivar_update_sysfs_entries() caused by reusing (and therefore corrupting) a kzalloc() allocation - Seiji Aguchi * Initialise the DataSize argument to GetVariable() otherwise it will not be updated with the actual size of the variable on return. Discovered on a Acer Aspire V3 BIOS - Lee, Chun-Yi Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-22 10:58:57 -07:00
Avi Kivity	e47a5f5fb7	KVM: x86 emulator: convert XADD to fastop Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-21 15:43:24 +03:00
Avi Kivity	203831e8e4	KVM: x86 emulator: drop unused old-style inline emulation Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-21 15:43:23 +03:00
Avi Kivity	b8c0b6ae49	KVM: x86 emulator: convert DIV/IDIV to fastop Since DIV and IDIV can generate exceptions, we need an additional output parameter indicating whether an execption has occured. To avoid increasing register pressure on i386, we use %rsi, which is already allocated for the fastop code pointer. Gleb: added comment about fop usage as exception indication. Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-21 15:43:21 +03:00
Avi Kivity	b9fa409b00	KVM: x86 emulator: convert single-operand MUL/IMUL to fastop Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-21 15:43:20 +03:00
Avi Kivity	017da7b604	KVM: x86 emulator: Switch fastop src operand to RDX This makes OpAccHi useful. Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-21 15:43:19 +03:00
Avi Kivity	ab2c5ce666	KVM: x86 emulator: switch MUL/DIV to DstXacc Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-21 15:43:17 +03:00
Avi Kivity	820207c8fc	KVM: x86 emulator: decode extended accumulator explicity Single-operand MUL and DIV access an extended accumulator: AX for byte instructions, and DX:AX, EDX:EAX, or RDX:RAX for larger-sized instructions. Add support for fetching the extended accumulator. In order not to change things too much, RDX is loaded into Src2, which is already loaded by fastop(). This avoids increasing register pressure on i386. Gleb: disable src writeback for ByteOp div/mul. Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-21 15:43:16 +03:00
Avi Kivity	fb32b1eda2	KVM: x86 emulator: add support for writing back the source operand Some instructions write back the source operand, not just the destination. Add support for doing this via the decode flags. Gleb: add BUG_ON() to prevent source to be memory operand. Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-21 15:43:14 +03:00
Linus Torvalds	5e427ec2d0	x86: Fix bit corruption at CPU resume time In commit `78d77df715` ("x86-64, init: Do not set NX bits on non-NX capable hardware") we added the early_pmd_flags that gets the NX bit set when a CPU supports NX. However, the new variable was marked __initdata, because the main _use_ of this is in an __init routine. However, the bit setting happens from secondary_startup_64(), which is called not only at bootup, but on every secondary CPU start. Including resuming from STR and at CPU hotplug time. So the value cannot be __initdata. Reported-bisected-and-tested-by: Michal Hocko <mhocko@suse.cz> Cc: stable@vger.kernel.org # v3.9 Acked-by: Peter Anvin <hpa@linux.intel.com> Cc: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-05-20 11:36:03 -07:00
Bjorn Helgaas	f3f011750a	Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0" This reverts commit `dd72be99d1`. Andy Shevchenko <andy.shevchenko@gmail.com> reported that this commit broke Intel Medfield devices. Reference: https://lkml.kernel.org/r/CAHp75Vdf6gFZChS47=grUygHBDWcoOWDYPzw+Zj5bdVCWj85Jw@mail.gmail.com Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2013-05-20 10:20:21 -06:00
Tim Chen	31d939625a	crypto: crct10dif - Accelerated CRC T10 DIF computation with PCLMULQDQ instruction This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ instructions. Details discussing the implementation can be found in the paper: "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction" http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-20 20:11:06 +08:00
Eric Dumazet	314beb9bca	x86: bpf_jit_comp: secure bpf jit against spraying attacks hpa bringed into my attention some security related issues with BPF JIT on x86. This patch makes sure the bpf generated code is marked read only, as other kernel text sections. It also splits the unused space (we vmalloc() and only use a fraction of the page) in two parts, so that the generated bpf code not starts at a known offset in the page, but a pseudo random one. Refs: http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html Reported-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-19 23:55:41 -07:00
Daniel Baluta	6865b32a86	lguest: rename i386_head.S Since commit `9a163ed8e0` (i386: move kernel) kernel/i386_head.S was renamed to kernel/head_32.S. We do the same for lguest/i386_head.S. Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-05-20 12:09:40 +09:30
Marc Zyngier	535cf7b3b1	KVM: get rid of $(addprefix ../../../virt/kvm/, ...) in Makefiles As requested by the KVM maintainers, remove the addprefix used to refer to the main KVM code from the arch code, and replace it with a KVM variable that does the same thing. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Christoffer Dall <cdall@cs.columbia.edu> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Alexander Graf <agraf@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-19 15:14:00 +03:00
Eric Dumazet	650c849670	x86: bpf_jit_comp: can call module_free() from any context It looks like we can call module_free()/vfree() from softirq context, so no longer need a wrapper and a work_struct. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-17 17:37:19 -07:00
Gleb Natapov	35af577aac	KVM: MMU: clenaup locking in mmu_free_roots() Do locking around each case separately instead of having one lock and two unlocks. Move root_hpa assignment out of the lock. Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-16 11:55:51 +03:00
Linus Torvalds	ae3b29e67c	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Fix for a CPU hot-add deadlock in microcode update code - Fix for idle consolidation fallout - Documentation update for initial kernel direct mapping * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Add missing comments for initial kernel direct mapping x86/microcode: Add local mutex to fix physical CPU hot-add deadlock x86: Fix idle consolidation fallout	2013-05-15 14:07:53 -07:00
Linus Torvalds	cc51bf6e6d	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: - Cure for not using zalloc in the first place, which leads to random crashes with CPUMASK_OFF_STACK. - Revert a user space visible change which broke udev - Add a missing cpu_online early return introduced by the new full dyntick conversions - Plug a long standing race in the timer wheel cpu hotplug code. Sigh... - Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu up. * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline tick: Cleanup NOHZ per cpu data on cpu down tick: Use zalloc_cpumask_var for allocating offstack cpumasks	2013-05-15 14:05:17 -07:00
Marcelo Tosatti	0061d53daf	KVM: x86: limit difference between kvmclock updates kvmclock updates which are isolated to a given vcpu, such as vcpu->cpu migration, should not allow system_timestamp from the rest of the vcpus to remain static. Otherwise ntp frequency correction applies to one vcpu's system_timestamp but not the others. So in those cases, request a kvmclock update for all vcpus. The worst case for a remote vcpu to update its kvmclock is then bounded by maximum nohz sleep latency. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-15 20:36:09 +03:00
John Stultz	b4f711ee03	time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config, which enables some minor compile time optimization to avoid uncessary code in mostly the suspend/resume path could cause problems for userland. In particular, the dependency for RTC_HCTOSYS on !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time twice and simplifies suspend/resume, has the side effect of causing the /sys/class/rtc/rtcN/hctosys flag to always be zero, and this flag is commonly used by udev to setup the /dev/rtc symlink to /dev/rtcN, which can cause pain for older applications. While the udev rules could use some work to be less fragile, breaking userland should strongly be avoided. Additionally the compile time optimizations are fairly minor, and the code being optimized is likely to be reworked in the future, so lets revert this change. Reported-by: Kay Sievers <kay@vrfy.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: stable <stable@vger.kernel.org> #3.9 Cc: Feng Tang <feng.tang@intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-05-14 20:54:06 +02:00
Jan Kiszka	f1ed0450a5	KVM: x86: Remove support for reporting coalesced APIC IRQs Since the arrival of posted interrupt support we can no longer guarantee that coalesced IRQs are always reported to the IRQ source. Moreover, accumulated APIC timer events could cause a busy loop when a VCPU should rather be halted. The consensus is to remove coalesced tracking from the LAPIC. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-14 12:09:02 +03:00
Lee, Chun-Yi	eccaf52fee	x86, efi: initial the local variable of DataSize to zero That will be better initial the value of DataSize to zero for the input of GetVariable(), otherwise we will feed a random value. The debug log of input DataSize like this: ... [ 195.915612] EFI Variables Facility v0.08 2004-May-17 [ 195.915819] efi: size: 18446744071581821342 [ 195.915969] efi: size': 18446744071581821342 [ 195.916324] efi: size: 18446612150714306560 [ 195.916632] efi: size': 18446612150714306560 [ 195.917159] efi: size: 18446612150714306560 [ 195.917453] efi: size': 18446612150714306560 ... The size' is value that was returned by BIOS. After applied this patch: [ 82.442042] EFI Variables Facility v0.08 2004-May-17 [ 82.442202] efi: size: 0 [ 82.442360] efi: size': 1039 [ 82.443828] efi: size: 0 [ 82.444127] efi: size': 2616 [ 82.447057] efi: size: 0 [ 82.447356] efi: size': 5832 ... Found on Acer Aspire V3 BIOS, it will not return the size of data if we input a non-zero DataSize. Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-05-14 08:13:05 +01:00
Borislav Petkov	4d067d8e05	x86: Extend #DF debugging aid to 64-bit It is sometimes very helpful to be able to pinpoint the location which causes a double fault before it turns into a triple fault and the machine reboots. We have this for 32-bit already so extend it to 64-bit. On 64-bit we get the register snapshot at #DF time and not from the first exception which actually causes the #DF. It should be close enough, though. [ hpa: and definitely better than nothing, which is what we have now. ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1368093749-31296-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-13 13:42:44 -07:00
Takuya Yoshikawa	e2858b4ab1	KVM: MMU: Use kvm_mmu_sync_roots() in kvm_mmu_load() No need to open-code this function. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-12 14:52:55 +03:00
Linus Torvalds	607eeb0b83	Bug-fixes: - More fixes in the vCPU PVHVM hotplug path. - Add more documentation. - Fix various ARM related issues in the Xen generic drivers. - Updates in the xen-pciback driver per Bjorn's updates. - Mask the x2APIC feature for PV guests. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRjPL5AAoJEFjIrFwIi8fJdlIIANXawH+B+aFbqsFSKOOh76XN smgICU78SVzKpW9WAPYK7YFqSdNN4AleGC2Mn2lSkiaqgciRyDb9Yt+OSMMts2Xn ZVbFkGhEKR+DtZfTKo9YgsGatul/McTiVEkuuli+aN5dql3WXDLAaA+/b9bO3ohh TCWtWNuSCGmlfDoJET2je+J6CgKvCErH3fvzKNxgYxytcGhAvxoVK/lC4d3pnq/m wQUAIcF8XYENqC2m1WDR0OGveAB0Me0j9g+UkQS+TzqA8GPmxC4aptjkroFYhOz6 8nZp+LanimmTI6olVioWEXkCr5+dxb058jQKwncQfonFpl58RS0qUrz5zoe3etU= =h9SX -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bug-fixes from Konrad Rzeszutek Wilk: - More fixes in the vCPU PVHVM hotplug path. - Add more documentation. - Fix various ARM related issues in the Xen generic drivers. - Updates in the xen-pciback driver per Bjorn's updates. - Mask the x2APIC feature for PV guests. * tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pci: Used cached MSI-X capability offset xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK xen: clear IRQ_NOAUTOEN and IRQ_NOREQUEST xen: mask x2APIC feature in PV xen: SWIOTLB is only used on x86 xen/spinlock: Fix check from greater than to be also be greater or equal to. xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info xen/vcpu: Document the xen_vcpu_info and xen_vcpu xen/vcpu/pvhvm: Fix vcpu hotplugging hanging.	2013-05-11 16:19:30 -07:00
Linus Torvalds	ac4e01093f	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull idle update from Len Brown: "Add support for new Haswell-ULT CPU idle power states" * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: intel_idle: initial C8, C9, C10 support tools/power turbostat: display C8, C9, C10 residency	2013-05-11 15:23:17 -07:00
Linus Torvalds	3644bc2ec7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull stray syscall bits from Al Viro: "Several syscall-related commits that were missing from the original" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE unicore32: just use mmap_pgoff()... unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)	2013-05-10 09:21:05 -07:00
Linus Torvalds	c67723ebbb	Merge tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Gleb Natapov: "Most of the fixes are in the emulator since now we emulate more than we did before for correctness sake we see more bugs there, but there is also an OOPS fixed and corruption of xcr0 register." * tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: emulator: emulate SALC KVM: emulator: emulate XLAT KVM: emulator: emulate AAM KVM: VMX: fix halt emulation while emulating invalid guest sate KVM: Fix kvm_irqfd_init initialization KVM: x86: fix maintenance of guest/host xcr0 state	2013-05-10 09:08:21 -07:00
Bjorn Helgaas	7c86617dde	xen/pci: Used cached MSI-X capability offset We now cache the MSI-X capability offset in the struct pci_dev, so no need to find the capability again. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-05-10 09:10:38 -04:00
Bjorn Helgaas	4be6bfe2af	xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the Table Offset register, not the flags ("Message Control" per spec) register. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-05-10 09:10:20 -04:00
Zhang Yanfei	cf8b166d5c	x86/mm: Add missing comments for initial kernel direct mapping Two sets of comments were lost during patch-series shuffling: - comments for init_range_memory_mapping() - comments in init_mem_mapping that is helpful for reminding people that the pagetable is setup top-down The comments were written by Yinghai in his patch in: https://lkml.org/lkml/2012/11/28/620 This patch reintroduces them. Originally-From: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/518BC776.7010506@gmail.com [ Tidied it all up a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-10 12:00:35 +02:00
Al Viro	91c2e0bcae	unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-09 13:46:38 -04:00
Linus Torvalds	e15e611906	PCI updates for v3.10: MSI PCI: Set ->mask_pos correctly Hotplug PCI: Delay final fixups until resources are assigned Moorestown x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRi9CqAAoJEFmIoMA60/r82WIQAMJE7kwl88TpIw4QdTum4a+z E+eBMnGFSw/y7JNP81rOuLG8xZiZNQxGUvTEVV+WY6n7xhRSwcNZIiVqZ1xSBSjM PpmLOLrXI6I9JIouBLKIRiPOzbC7iB6O7RhKZCO68guQ8Y7epYoJjORaKELGZmN+ 09fVapvHHhcOcYYYAiUWvobjk6vCx4+6dSj6tRldI8JNl+WQtqMaK2P3QjsncZek 20OXcrfv4X3ApoSwXcn1NUUDSlrAgM+VcwL6RJK9boURDnPOU8IzaD78DQVOUCNx BfLULFYkBq62ExCpTkg4Xo8aRowLAEcThZ3Z8/XtMmFlWDSdNm5BaTkYnPCf7nGW +8Oaxjm0pNVa5QnQqoK1HWpWtU1JTA0hO1tmJ9WyU+84GPuqTN3qiJTigfC9NHOi mj8O98sgbzIENKszBRpaNctjKVxKNFrBzQ3kOdFGB7NBVXN0pC2jnGBoxuxUrU3B h/yMB0Ku/GvnHRSngEhDzlkeuQpWTxvlhdjvlL63F0gEDQO0k3UPaiqD4zMpruga bHZ10v73Kdqp0FVatijmHztXO/yYB3m7tH3ZUDD3yfhKdaeOuTceDgbyo/ZwP2Wq gzOwuOVYLuJK68Qj4vkHJIKy86jHfNP4HGawhh8dECZDmsciarymQ+vAzobalYIu GSurydL4zqJJ+MIH84dW =B2U8 -----END PGP SIGNATURE----- Merge tag 'pci-v3.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "MSI: PCI: Set ->mask_pos correctly Hotplug: PCI: Delay final fixups until resources are assigned Moorestown: x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0" * tag 'pci-v3.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Delay final fixups until resources are assigned x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0 PCI: Set ->mask_pos correctly	2013-05-09 10:21:44 -07:00
Linus Torvalds	5647ac0ad4	Removal of GENERIC_GPIO for v3.10 GENERIC_GPIO now synonymous with GPIOLIB. There are no longer any valid cases for enableing GENERIC_GPIO without GPIOLIB, even though it is possible to do so which has been causing confusion and breakage. This branch does the work to completely eliminate GENERIC_GPIO. However, it is not trivial to just create a branch to remove it. Over the course of the v3.9 cycle more code referencing GENERIC_GPIO has been added to linux-next that conflicts with this branch. The following must be done to resolve the conflicts when merging this branch into mainline: * "git grep CONFIG_GENERIC_GPIO" should return 0 hits. Matches should be replaced with CONFIG_GPIOLIB * "git grep '\bGENERIC_GPIO\b'" should return 1 hit in the Chinese documentation. * Selectors of GENERIC_GPIO should be turned into selectors of GPIOLIB * definitions of the option in architecture Kconfig code should be deleted. Stephen has 3 merge fixup patches[1] that do the above. They are currently applicable on mainline as of May 2nd. [1] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg428056.html -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRifUnAAoJEEFnBt12D9kBs2YP/0U6+ia+xYvkVaJc28PDVIzn OReZNcJOYU8D5voxz0voaRD0EdcPwjbMu9Kp9aXMHlk4VxevF+8jCc/us0bIjtO1 VcB5VmSCIhMhxdnBlum11Mk7Vr5MCweyl9NBsypnPt8cl4obMBZHf2yzoodFktNb wtyYlOb6FALtc6iDbOO6dG3w9F7FAOLvskUFzdv89m8mupTsBu9jw9NqFDbJHOex rxq0Sdd+kWF/nkJVcV5Y6jIdletRlhpipefMJ9diexreHvwqh+c4kJEYZaXgB5+m ha95cPbReK1d+RqzM3A8d4irzSVSmq4k7ijI6QkFOr48+AH7XsgKv5so885LKzMN IIXg2Phm9i0H8+ecEvhcc4oIYBHJiEKK54Y0qUD9dqbFoDGPTCSqMHdSSMbpAY+J bIIXlVzj1En3PPNUJLPt8q8Qz6WxCT9mDST3QSGYnD4o90HT+1R9j92RxGL6McOq rUOyJDwmzFvpBvKK4raGdOU435M+ps2NPKKNIRaIGQPPY9rM1kN4YqvhXukEsC9L 3a3+3cQLh7iKxBHncxeQsJfethP1CPkJnzvF9r+ZZLf2rcPH4pbQIE2uO0XnX/nd 5/DKi0nGgAJ//GMMzdo3RiOA5zGFjIZ/KMvfhQldpP6qFJRhqdGi6FPlAcwr1z1n YnCByPwwlvfC4LTXFOGL =xodc -----END PGP SIGNATURE----- Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux Pull removal of GENERIC_GPIO from Grant Likely: "GENERIC_GPIO now synonymous with GPIOLIB. There are no longer any valid cases for enableing GENERIC_GPIO without GPIOLIB, even though it is possible to do so which has been causing confusion and breakage. This branch does the work to completely eliminate GENERIC_GPIO." * tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux: gpio: update gpio Chinese documentation Remove GENERIC_GPIO config option Convert selectors of GENERIC_GPIO to GPIOLIB blackfin: force use of gpiolib m68k: coldfire: use gpiolib mips: pnx833x: remove requirement for GENERIC_GPIO openrisc: default GENERIC_GPIO to false avr32: default GENERIC_GPIO to false xtensa: remove explicit selection of GENERIC_GPIO sh: replace CONFIG_GENERIC_GPIO by CONFIG_GPIOLIB powerpc: remove redundant GENERIC_GPIO selection unicore32: default GENERIC_GPIO to false unicore32: remove unneeded select GENERIC_GPIO arm: plat-orion: use GPIO driver on CONFIG_GPIOLIB arm: remove redundant GENERIC_GPIO selection mips: alchemy: require gpiolib mips: txx9: change GENERIC_GPIO to GPIOLIB mips: loongson: use GPIO driver on CONFIG_GPIOLIB mips: remove redundant GENERIC_GPIO select	2013-05-09 09:59:16 -07:00
Paolo Bonzini	326f578f7e	KVM: emulator: emulate SALC This is an almost-undocumented instruction available in 32-bit mode. I say "almost" undocumented because AMD documents it in their opcode maps just to say that it is unavailable in 64-bit mode (sections "A.2.1 One-Byte Opcodes" and "B.3 Invalid and Reassigned Instructions in 64-Bit Mode"). It is roughly equivalent to "sbb %al, %al" except it does not set the flags. Use fastop to emulate it, but do not use the opcode directly because it would fail if the host is 64-bit! Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@vger.kernel.org # 3.9 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-09 13:15:08 +03:00
Paolo Bonzini	7fa57952d7	KVM: emulator: emulate XLAT This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1. It is just a MOV in disguise, with a funny source address. Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@vger.kernel.org # 3.9 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-09 13:14:51 +03:00
Paolo Bonzini	a035d5c64d	KVM: emulator: emulate AAM This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1. AAM needs the source operand to be unsigned; do the same in AAD as well for consistency, even though it does not affect the result. Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@vger.kernel.org # 3.9 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-09 13:12:58 +03:00
Konrad Rzeszutek Wilk	074d72ff57	x86/microcode: Add local mutex to fix physical CPU hot-add deadlock This can easily be triggered if a new CPU is added (via ACPI hotplug mechanism) and from user-space you do: echo 1 > /sys/devices/system/cpu/cpu3/online (or wait for UDEV to do it) on a newly appeared physical CPU. The deadlock is that the "store_online" in drivers/base/cpu.c takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up". "cpu_up" eventually ends up calling "save_mc_for_early" which also takes the cpu_hotplug_driver_lock() lock. And here is that lockdep thinks of it: smpboot: Stack at about ffff880075c39f44 smpboot: CPU3: has booted. microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25 ============================================= [ INFO: possible recursive locking detected ] 3.9.0upstream-10129-g167af0e #1 Not tainted --------------------------------------------- sh/2487 is trying to acquire lock: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 but task is already holding lock: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(x86_cpu_hotplug_driver_mutex); lock(x86_cpu_hotplug_driver_mutex); * DEADLOCK * May be due to missing lock nesting notation 6 locks held by sh/2487: #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190 #1: (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160 #2: (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160 #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20 #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60 Suggested-and-Acked-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: fenghua.yu@intel.com Cc: xen-devel@lists.xensource.com Cc: stable@vger.kernel.org # for v3.9 Link: http://lkml.kernel.org/r/1368029583-23337-1-git-send-email-konrad.wilk@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-05-09 11:10:00 +02:00
Gleb Natapov	8d76c49e9f	KVM: VMX: fix halt emulation while emulating invalid guest sate The invalid guest state emulation loop does not check halt_request which causes 100% cpu loop while guest is in halt and in invalid state, but more serious issue is that this leaves halt_request set, so random instruction emulated by vm86 #GP exit can be interpreted as halt which causes guest hang. Fix both problems by handling halt_request in emulation loop. Reported-by: Tomas Papan <tomas.papan@gmail.com> Tested-by: Tomas Papan <tomas.papan@gmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> CC: stable@vger.kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-09 09:04:56 +03:00
Zhenzhong Duan	4ea9b9aca9	xen: mask x2APIC feature in PV On x2apic enabled pvm, doing sysrq+l, got NULL pointer dereference as below. SysRq : Show backtrace of all active CPUs BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8125e3cb>] memcpy+0xb/0x120 Call Trace: [<ffffffff81039633>] ? __x2apic_send_IPI_mask+0x73/0x160 [<ffffffff8103973e>] x2apic_send_IPI_all+0x1e/0x20 [<ffffffff8103498c>] arch_trigger_all_cpu_backtrace+0x6c/0xb0 [<ffffffff81501be4>] ? _raw_spin_lock_irqsave+0x34/0x50 [<ffffffff8131654e>] sysrq_handle_showallcpus+0xe/0x10 [<ffffffff8131616d>] __handle_sysrq+0x7d/0x140 [<ffffffff81316230>] ? __handle_sysrq+0x140/0x140 [<ffffffff81316287>] write_sysrq_trigger+0x57/0x60 [<ffffffff811ca996>] proc_reg_write+0x86/0xc0 [<ffffffff8116dd8e>] vfs_write+0xce/0x190 [<ffffffff8116e3e5>] sys_write+0x55/0x90 [<ffffffff8150a242>] system_call_fastpath+0x16/0x1b That's because apic points to apic_x2apic_cluster or apic_x2apic_phys but the basic element like cpumask isn't initialized. Mask x2APIC feature in pvm to avoid overwrite of apic pointer, update commit message per Konrad's suggestion. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Tested-by: Tamon Shiose <tamon.shiose@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-05-08 08:38:11 -04:00
Konrad Rzeszutek Wilk	cb91f8f44c	xen/spinlock: Fix check from greater than to be also be greater or equal to. During review of git commit `cb9c6f15f3` ("xen/spinlock: Check against default value of -1 for IRQ line.") Stefano pointed out a bug in the patch. Unfortunatly due to vacation timing the fix was not applied and this patch fixes it up. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-05-08 08:38:09 -04:00
Konrad Rzeszutek Wilk	d5b17dbff8	xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info As it will point to some data, but not event channel data (the shared_info has an array limited to 32). This means that for PVHVM guests with more than 32 VCPUs without the usage of VCPUOP_register_info any interrupts to VCPUs larger than 32 would have gone unnoticed during early bootup. That is OK, as during early bootup, in smp_init we end up calling the hotplug mechanism (xen_hvm_cpu_notify) which makes the VCPUOP_register_vcpu_info call for all VCPUs and we can receive interrupts on VCPUs 33 and further. This is just a cleanup. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-05-08 08:38:08 -04:00
Marcelo Tosatti	42bdf991f4	KVM: x86: fix maintenance of guest/host xcr0 state Emulation of xcr0 writes zero guest_xcr0_loaded variable so that subsequent VM-entry reloads CPU's xcr0 with guests xcr0 value. However, this is incorrect because guest_xcr0_loaded variable is read to decide whether to reload hosts xcr0. In case the vcpu thread is scheduled out after the guest_xcr0_loaded = 0 assignment, and scheduler decides to preload FPU: switch_to { __switch_to __math_state_restore restore_fpu_checking fpu_restore_checking if (use_xsave()) fpu_xrstor_checking xrstor64 with CPU's xcr0 == guests xcr0 Fix by properly restoring hosts xcr0 during emulation of xcr0 writes. Analyzed-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-08 12:47:43 +03:00
Linus Torvalds	c8de2fa4dc	Merge branch 'rwsem-optimizations' Merge rwsem optimizations from Michel Lespinasse: "These patches extend Alex Shi's work (which added write lock stealing on the rwsem slow path) in order to provide rwsem write lock stealing on the fast path (that is, without taking the rwsem's wait_lock). I have unfortunately been unable to push this through -next before due to Ingo Molnar / David Howells / Peter Zijlstra being busy with other things. However, this has gotten some attention from Rik van Riel and Davidlohr Bueso who both commented that they felt this was ready for v3.10, and Ingo Molnar has said that he was OK with me pushing directly to you. So, here goes :) Davidlohr got the following test results from pgbench running on a quad-core laptop: \| db_size \| clients \| tps-vanilla \| tps-rwsem \| +---------+----------+----------------+--------------+ \| 160 MB \| 1 \| 5803 \| 6906 \| + 19.0% \| 160 MB \| 2 \| 13092 \| 15931 \| \| 160 MB \| 4 \| 29412 \| 33021 \| \| 160 MB \| 8 \| 32448 \| 34626 \| \| 160 MB \| 16 \| 32758 \| 33098 \| \| 160 MB \| 20 \| 26940 \| 31343 \| + 16.3% \| 160 MB \| 30 \| 25147 \| 28961 \| \| 160 MB \| 40 \| 25484 \| 26902 \| \| 160 MB \| 50 \| 24528 \| 25760 \| ------------------------------------------------------ \| 1.6 GB \| 1 \| 5733 \| 7729 \| + 34.8% \| 1.6 GB \| 2 \| 9411 \| 19009 \| + 101.9% \| 1.6 GB \| 4 \| 31818 \| 33185 \| \| 1.6 GB \| 8 \| 33700 \| 34550 \| \| 1.6 GB \| 16 \| 32751 \| 33079 \| \| 1.6 GB \| 20 \| 30919 \| 31494 \| \| 1.6 GB \| 30 \| 28540 \| 28535 \| \| 1.6 GB \| 40 \| 26380 \| 27054 \| \| 1.6 GB \| 50 \| 25241 \| 25591 \| ------------------------------------------------------ \| 7.6 GB \| 1 \| 5779 \| 6224 \| \| 7.6 GB \| 2 \| 10897 \| 13611 \| + 24.9% \| 7.6 GB \| 4 \| 32683 \| 33108 \| \| 7.6 GB \| 8 \| 33968 \| 34712 \| \| 7.6 GB \| 16 \| 32287 \| 32895 \| \| 7.6 GB \| 20 \| 27770 \| 31689 \| + 14.1% \| 7.6 GB \| 30 \| 26739 \| 29003 \| \| 7.6 GB \| 40 \| 24901 \| 26683 \| \| 7.6 GB \| 50 \| 17115 \| 25925 \| + 51.5% ------------------------------------------------------ (Davidlohr also has one additional patch which further improves throughput, though I will ask him to send it directly to you as I have suggested some minor changes)." * emailed patches from Michel Lespinasse <walken@google.com>: rwsem: no need for explicit signed longs x86 rwsem: avoid taking slow path when stealing write lock rwsem: do not block readers at head of queue if other readers are active rwsem: implement support for write lock stealing on the fastpath rwsem: simplify __rwsem_do_wake rwsem: skip initial trylock in rwsem_down_write_failed rwsem: avoid taking wait_lock in rwsem_down_write_failed rwsem: use cmpxchg for trying to steal write lock rwsem: more agressive lock stealing in rwsem_down_write_failed rwsem: simplify rwsem_down_write_failed rwsem: simplify rwsem_down_read_failed rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed rwsem: shorter spinlocked section in rwsem_down_failed_common() rwsem: make the waiter type an enumeration rather than a bitmask	2013-05-07 09:22:03 -07:00
Thomas Gleixner	97a5b81fa4	x86: Fix idle consolidation fallout The core code expects the arch idle code to return with interrupts enabled. The conversion missed two x86 cases which fail to do that. Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305021557030.3972@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-05-07 16:24:03 +02:00
Michel Lespinasse	a31a369b07	x86 rwsem: avoid taking slow path when stealing write lock modify __down_write[_nested] and __down_write_trylock to grab the write lock whenever the active count is 0, even if there are queued waiters (they must be writers pending wakeup, since the active count is 0). Note that this is an optimization only; architectures without this optimization will still work fine: - __down_write() would take the slow path which would take the wait_lock and then try stealing the lock (as in the spinlocked rwsem implementation) - __down_write_trylock() would fail, but callers must be ready to deal with that - since there are some writers pending wakeup, they could have raced with us and obtained the lock before we steal it. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-05-07 07:20:17 -07:00
Konrad Rzeszutek Wilk	a520996ae2	xen/vcpu: Document the xen_vcpu_info and xen_vcpu They are important structures and it is not clear at first look what they are for. The xen_vcpu is a pointer. By default it points to the shared_info structure (at the CPU offset location). However if the VCPUOP_register_vcpu_info hypercall is implemented we can make the xen_vcpu pointer point to a per-CPU location. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Added comments from Ian Campbell] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-05-07 10:05:34 -04:00
Linus Torvalds	99737982ca	IOMMU Updates for Linux v3.10 The updates are mostly about the x86 IOMMUs this time. Exceptions are the groundwork for the PAMU IOMMU from Freescale (for a PPC platform) and an extension to the IOMMU group interface. On the x86 side this includes a workaround for VT-d to disable interrupt remapping on broken chipsets. On the AMD-Vi side the most important new feature is a kernel command-line interface to override broken information in IVRS ACPI tables and get interrupt remapping working this way. Besides that there are small fixes all over the place. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRh2vAAAoJECvwRC2XARrjbjkP/jzKzeffybUpQsIJF8rs/IEt hSwqpGLr6WR5FdneEH9fiBIp4pyMDXmuAb/2ZNgB+DgPN3xgqmWVo4WLk7pMo3BS /xIz/lu7hIX3AtKt807pL9+rPdhGYEJ43Vmr4bW9x0l1kuNXy6fmMLcN5FaPKjV4 p4hY4jOstEgtYQw4wi39/9b4FsYoipZizkOUSdtCzWwTv7jOHH7/Wra8iZyzL6Je 1VlF/efp0ytTcwLdHOfGwPCIlZrQRtQCM4SqdAUG9bOL3ARR9Yu/0iW1295nbLzo CQX5CfKePvo/fGxki1jcBi+UCyxYKPosB5kCxmh4MAxCg/VzzMsaME/A73tLJa6W Y29bbjwPoBPMq03HX8S9R5QWY8HpFujUUp+J4TXcKuTgYEV28WfLu1uaeKD716nM LoXUojov7Cj8ZQZnhyu5l+XNaephBZLfw/8bM6bAxhlKXwAjmLiS5Z+srPl1GJee 5GCV+L94JifHLZaREWh3JFsh9O3W7Wno2++c4JU32uCWJHXH7tMgs2P8n5AY9rnT Km1a9y6w2MF3Gg9j4y6u75m0XnFTNzYjeJMUtqVlwVhNHhgaXfuIWY63xOQCLJs1 ThTHOjoh0VqONGobR/ywn+0ouo9X07DnWpluyFd+zY3XK0UE0NOu9XMr4i6TWxOf mlzWoEKxtw36XGHB/FtQ =MVc/ -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "The updates are mostly about the x86 IOMMUs this time. Exceptions are the groundwork for the PAMU IOMMU from Freescale (for a PPC platform) and an extension to the IOMMU group interface. On the x86 side this includes a workaround for VT-d to disable interrupt remapping on broken chipsets. On the AMD-Vi side the most important new feature is a kernel command-line interface to override broken information in IVRS ACPI tables and get interrupt remapping working this way. Besides that there are small fixes all over the place." * tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (24 commits) iommu/tegra: Fix printk formats for dma_addr_t iommu: Add a function to find an iommu group by id iommu/vt-d: Remove warning for HPET scope type iommu: Move swap_pci_ref function to drivers/iommu/pci.h. iommu/vt-d: Disable translation if already enabled iommu/amd: fix error return code in early_amd_iommu_init() iommu/AMD: Per-thread IOMMU Interrupt Handling iommu: Include linux/err.h iommu/amd: Workaround for ERBT1312 iommu/amd: Document ivrs_ioapic and ivrs_hpet parameters iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides iommu/amd: Add ioapic and hpet ivrs override iommu/amd: Add early maps for ioapic and hpet iommu/amd: Extend IVRS special device data structure iommu/amd: Move add_special_device() to __init iommu: Fix compile warnings with forward declarations iommu/amd: Properly initialize irq-table lock iommu/amd: Use AMD specific data structure for irq remapping iommu/amd: Remove map_sg_no_iommu() iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets ...	2013-05-06 14:59:13 -07:00
Konrad Rzeszutek Wilk	7f1fc268c4	xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. If a user did: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online we would (this a build with DEBUG enabled) get to: smpboot: ++++++++++++++++++++=_---CPU UP 1 .. snip.. smpboot: Stack at about ffff880074c0ff44 smpboot: CPU1: has booted. and hang. The RCU mechanism would kick in an try to IPI the CPU1 but the IPIs (and all other interrupts) would never arrive at the CPU1. At first glance at least. A bit digging in the hypervisor trace shows that (using xenanalyze): [vla] d4v1 vec 243 injecting 0.043163027 --\|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043163639 --\|x d4v1 vmentry cycles 1468 ] 0.043164913 --\|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043164913 --\|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting 0.043164913 --\|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043165526 --\|x d4v1 vmentry cycles 1472 ] 0.043166800 --\|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043166800 --\|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting there is a pending event (subsequent debugging shows it is the IPI from the VCPU0 when smpboot.c on VCPU1 has done "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is interrupted with the callback IPI (0xf3 aka 243) which ends up calling __xen_evtchn_do_upcall. The __xen_evtchn_do_upcall seems to do something but not acknowledge the pending events. And the moment the guest does a 'cli' (that is the ffffffff81673254 in the log above) the hypervisor is invoked again to inject the IPI (0xf3) to tell the guest it has pending interrupts. This repeats itself forever. The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup we set each per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info to register per-CPU structures (xen_vcpu_setup). This is used to allow events for more than 32 VCPUs and for performance optimizations reasons. When the user performs the VCPU hotplug we end up calling the the xen_vcpu_setup once more. We make the hypercall which returns -EINVAL as it does not allow multiple registration calls (and already has re-assigned where the events are being set). We pick the fallback case and set per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] (which is a good fallback during bootup). However the hypervisor is still setting events in the register per-cpu structure (per_cpu(xen_vcpu_info, cpu)). As such when the events are set by the hypervisor (such as timer one), and when we iterate in __xen_evtchn_do_upcall we end up reading stale events from the shared_info->vcpu_info[vcpu] instead of the per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the events that the hypervisor has set and the hypervisor keeps on reminding us to ack the events which we never do. The fix is simple. Don't on the second time when xen_vcpu_setup is called over-write the per_cpu(xen_vcpu, cpu) if it points to per_cpu(xen_vcpu_info). Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-05-06 11:29:17 -04:00
Linus Torvalds	01227a889e	Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...	2013-05-05 14:47:31 -07:00
Linus Torvalds	64049d1973	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes plus a small hw-enablement patch for Intel IB model 58 uncore events" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL perf/x86/intel/lbr: Fix LBR filter perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge perf: Fix vmalloc ring buffer pages handling perf/x86/intel: Fix unintended variable name reuse perf/x86/intel: Add support for IvyBridge model 58 Uncore perf/x86/intel: Fix typo in perf_event_intel_uncore.c x86: Eliminate irq_mis_count counted in arch_irq_stat	2013-05-05 11:37:16 -07:00
Peter Zijlstra	7cc23cd6c0	perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL We should always have proper privileges when requesting kernel data. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: eranian@google.com Link: http://lkml.kernel.org/r/20130503121256.230745028@chello.nl [ Fix build error reported by fengguang.wu@intel.com, propagate error code back. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwrili@git.kernel.org	2013-05-05 10:58:11 +02:00
Peter Zijlstra	6e15eb3ba6	perf/x86/intel/lbr: Fix LBR filter The LBR 'from' adddress is under full userspace control; ensure we validate it before reading from it. Note: is_module_text_address() can potentially be quite expensive; for those running into that with high overhead in modules optimize it using an RCU backed rb-tree. Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/20130503121256.158211806@chello.nl Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/n/tip-mk8i82ffzax01cnqo829iy1q@git.kernel.org	2013-05-04 08:37:47 +02:00
Peter Zijlstra	741a698f42	perf/x86: Blacklist all MEM__RETIRED events for Ivy Bridge Errata BV98 states that all MEM__RETIRED events corrupt the counter value of the SMT sibling's counters. Blacklist these events Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/20130503121256.083340271@chello.nl Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/n/tip-jwra43mujrv1oq9xk6mfe57v@git.kernel.org	2013-05-04 08:37:46 +02:00
Jan Kiszka	03b28f8133	KVM: x86: Account for failing enable_irq_window for NMI window request With VMX, enable_irq_window can now return -EBUSY, in which case an immediate exit shall be requested before entering the guest. Account for this also in enable_nmi_window which uses enable_irq_window in absence of vnmi support, e.g. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-05-02 22:17:38 -03:00
Alexander van Heukelum	5522ddb3fc	x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) Commit `49cb25e929` x86: 'get rid of pt_regs argument in vm86/vm86old' got rid of the pt_regs stub for sys_vm86old and sys_vm86. The functions were, however, not changed to use the calling convention for syscalls. [AV: killed asmlinkage_protect() - it's done automatically now] Reported-and-tested-by: Hans de Bruin <jmdebruin@xmsnet.nl> Cc: stable@vger.kernel.org Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-02 20:36:32 -04:00
Linus Torvalds	797994f81a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: - XTS mode optimisation for twofish/cast6/camellia/aes on x86 - AVX2/x86_64 implementation for blowfish/twofish/serpent/camellia - SSSE3/AVX/AVX2 optimisations for sha256/sha512 - Added driver for SAHARA2 crypto accelerator - Fix for GMAC when used in non-IPsec secnarios - Added generic CMAC implementation (including IPsec glue) - IP update for crypto/atmel - Support for more than one device in hwrng/timeriomem - Added Broadcom BCM2835 RNG driver - Misc fixes * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (59 commits) crypto: caam - fix job ring cleanup code crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipher crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher crypto: tcrypt - add async cipher speed tests for blowfish crypto: testmgr - extend camellia test-vectors for camellia-aesni/avx2 crypto: aesni_intel - fix Kconfig problem with CRYPTO_GLUE_HELPER_X86 crypto: aesni_intel - add more optimized XTS mode for x86-64 crypto: x86/camellia-aesni-avx - add more optimized XTS code crypto: cast6-avx: use new optimized XTS code crypto: x86/twofish-avx - use optimized XTS code crypto: x86 - add more optimized XTS-mode for serpent-avx xfrm: add rfc4494 AES-CMAC-96 support crypto: add CMAC support to CryptoAPI crypto: testmgr - add empty test vectors for null ciphers crypto: testmgr - add AES GMAC test vectors crypto: gcm - fix rfc4543 to handle async crypto correctly crypto: gcm - make GMAC work when dst and src are different hwrng: timeriomem - added devicetree hooks ...	2013-05-02 14:53:12 -07:00
Linus Torvalds	a8bdf74512	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Two regression fixes: 1. On 64 bits, we would set NX on non-NX-capable hardware (very rare in 64-bit land, but a nonzero subset.) 2. Fix suspend/resume across kernel versions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, init: Do not set NX bits on non-NX capable hardware x86, gdt, hibernate: Store/load GDT for hibernate path.	2013-05-02 14:43:58 -07:00
Linus Torvalds	736a2dd257	Lots of virtio work which wasn't quite ready for last merge window. Plus I dived into lguest again, reworking the pagetable code so we can move the switcher page: our fixmaps sometimes take more than 2MB now... Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRga7lAAoJENkgDmzRrbjx/yIQAKpqIBtxOJeYH3SY+Uoe7Cfp toNYcpJEldvb0UcWN8M2cSZpHoxl1SUoq9djwcM29tcKa7EZAjHaGtb/Q1qMTDgv +B3WAfiGU2pmXFxLAkbrlLNGnysy24JspqJQ5hcYV84EiBxQdZp+nCYgOphd+GMK ww16vo9ya8jFjzt3GeRp/Heb3vEzV4Cp6BC3i0m8A3WNpEpbRb66pqXNk5o8ggJO SxQOKSXmUM+0m+jKSul5xn3e2Ls2LOrZZ8/DIHA+gW66N4Zab7n2/j1Q9VRxb4lh FqnR7KwgBX8OCh9IsBDqQYS7MohvMYge6eUdLtFrq84jvMleMEhrC8q9v2tucFUb 5t18CLwvyK7Gdg6UCKiZ7YSPcuURAILO16al9bh5IseeBDsuX+43VsvQoBmFn9k6 cLOVTZ6BlOmahK5PyRYFSvLa9Rxzr/05Mr7oYq9UgshD9io78dnqczFYIORF53rW zD7C4HuTZfYJFfNd0wAJ0RfVXnf8QvDlMdo7zPC26DSXNWqj8OexCY0qqSWUB+2F vcfJP6NkV4fZB8aawWIFUVwc64yqtt2uPVLa7ATZWqk16PgKrchGewmw3tiEwOgu 1l7xgffTRRUIJsqaCZoXdgw3yezcKRjuUBcOxL09lDAAhc+NxWNvzZBsKp66DwDk yZQKn0OdXnuf0CeEOfFf =1tYL -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio & lguest updates from Rusty Russell: "Lots of virtio work which wasn't quite ready for last merge window. Plus I dived into lguest again, reworking the pagetable code so we can move the switcher page: our fixmaps sometimes take more than 2MB now..." Ugh. Annoying conflicts with the tcm_vhost -> vhost_scsi rename. Hopefully correctly resolved. * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits) caif_virtio: Remove bouncing email addresses lguest: improve code readability in lg_cpu_start. virtio-net: fill only rx queues which are being used lguest: map Switcher below fixmap. lguest: cache last cpu we ran on. lguest: map Switcher text whenever we allocate a new pagetable. lguest: don't share Switcher PTE pages between guests. lguest: expost switcher_pages array (as lg_switcher_pages). lguest: extract shadow PTE walking / allocating. lguest: make check_gpte et. al return bool. lguest: assume Switcher text is a single page. lguest: rename switcher_page to switcher_pages. lguest: remove RESERVE_MEM constant. lguest: check vaddr not pgd for Switcher protection. lguest: prepare to make SWITCHER_ADDR a variable. virtio: console: replace EMFILE with EBUSY for already-open port virtio-scsi: reset virtqueue affinity when doing cpu hotplug virtio-scsi: introduce multiqueue support virtio-scsi: push vq lock/unlock into virtscsi_vq_done virtio-scsi: pass struct virtio_scsi to virtqueue completion function ...	2013-05-02 14:14:04 -07:00
H. Peter Anvin	78d77df715	x86-64, init: Do not set NX bits on non-NX capable hardware During early init, we would incorrectly set the NX bit even if the NX feature was not supported. Instead, only set this bit if NX is actually available and enabled. We already do very early detection of the NX bit to enable it in EFER, this simply extends this detection to the early page table mask. Reported-by: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1367476850.5660.2.camel@nexus Cc: <stable@vger.kernel.org> v3.9	2013-05-02 11:27:35 -07:00
Konrad Rzeszutek Wilk	cc456c4e7c	x86, gdt, hibernate: Store/load GDT for hibernate path. The git commite7a5cd063c7b4c58417f674821d63f5eb6747e37 ("x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed.") assumes that for the hibernate path the booting kernel and the resuming kernel MUST be the same. That is certainly the case for a 32-bit kernel (see check_image_kernel and CONFIG_ARCH_HIBERNATION_HEADER config option). However for 64-bit kernels it is OK to have a different kernel version (and size of the image) of the booting and resuming kernels. Hence the above mentioned git commit introduces an regression. This patch fixes it by introducing a 'struct desc_ptr gdt_desc' back in the 'struct saved_context'. However instead of having in the 'save_processor_state' and 'restore_processor_state' the store/load_gdt calls, we are only saving the GDT in the save_processor_state. For the restore path the lgdt operation is done in hibernate_asm_[32\|64].S in the 'restore_registers' path. The apt reader of this description will recognize that only 64-bit kernels need this treatment, not 32-bit. This patch adds the logic in the 32-bit path to be more similar to 64-bit so that in the future the unification process can take advantage of this. [ hpa: this also reverts an inadvertent on-disk format change ] Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1367459610-9656-2-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-02 11:27:35 -07:00
Joerg Roedel	0c4513be3d	Merge branches 'iommu/fixes', 'x86/vt-d', 'x86/amd', 'ppc/pamu', 'core' and 'arm/tegra' into next	2013-05-02 12:10:19 +02:00
Linus Torvalds	20b4fb4852	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...	2013-05-01 17:51:54 -07:00
Linus Torvalds	b9394d8a65	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/efi changes from Peter Anvin: "The bulk of these changes are cleaning up the efivars handling and breaking it up into a tree of files. There are a number of fixes as well. The entire changeset is pretty big, but most of it is code movement. Several of these commits are quite new; the history got very messed up due to a mismerge with the urgent changes for rc8 which completely broke IA64, and so Ingo requested that we rebase it to straighten it out." * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: remove "kfree(NULL)" efi: locking fix in efivar_entry_set_safe() efi, pstore: Read data from variable store before memcpy() efi, pstore: Remove entry from list when erasing efi, pstore: Initialise 'entry' before iterating efi: split efisubsystem from efivars efivarfs: Move to fs/efivarfs efivars: Move pstore code into the new EFI directory efivars: efivar_entry API efivars: Keep a private global pointer to efivars efi: move utf16 string functions to efi.h x86, efi: Make efi_memblock_x86_reserve_range more readable efivarfs: convert to use simple_open()	2013-05-01 15:51:46 -07:00
Linus Torvalds	73287a43cc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Highlights (1721 non-merge commits, this has to be a record of some sort): 1) Add 'random' mode to team driver, from Jiri Pirko and Eric Dumazet. 2) Make it so that any driver that supports configuration of multiple MAC addresses can provide the forwarding database add and del calls by providing a default implementation and hooking that up if the driver doesn't have an explicit set of handlers. From Vlad Yasevich. 3) Support GSO segmentation over tunnels and other encapsulating devices such as VXLAN, from Pravin B Shelar. 4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton. 5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita Dukkipati. 6) In the PHY layer, allow supporting wake-on-lan in situations where the PHY registers have to be written for it to be configured. Use it to support wake-on-lan in mv643xx_eth. From Michael Stapelberg. 7) Significantly improve firewire IPV6 support, from YOSHIFUJI Hideaki. 8) Allow multiple packets to be sent in a single transmission using network coding in batman-adv, from Martin Hundebøll. 9) Add support for T5 cxgb4 chips, from Santosh Rastapur. 10) Generalize the VXLAN forwarding tables so that there is more flexibility in configurating various aspects of the endpoints. From David Stevens. 11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver, from Dmitry Kravkov. 12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo Neira Ayuso. 13) Start adding networking selftests. 14) In situations of overload on the same AF_PACKET fanout socket, or per-cpu packet receive queue, minimize drop by distributing the load to other cpus/fanouts. From Willem de Bruijn and Eric Dumazet. 15) Add support for new payload offset BPF instruction, from Daniel Borkmann. 16) Convert several drivers over to mdoule_platform_driver(), from Sachin Kamat. 17) Provide a minimal BPF JIT image disassembler userspace tool, from Daniel Borkmann. 18) Rewrite F-RTO implementation in TCP to match the final specification of it in RFC4138 and RFC5682. From Yuchung Cheng. 19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear you like netlink, so I implemented netlink dumping of netlink sockets.") From Andrey Vagin. 20) Remove ugly passing of rtnetlink attributes into rtnl_doit functions, from Thomas Graf. 21) Allow userspace to be able to see if a configuration change occurs in the middle of an address or device list dump, from Nicolas Dichtel. 22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes Frederic Sowa. 23) Increase accuracy of packet length used by packet scheduler, from Jason Wang. 24) Beginning set of changes to make ipv4/ipv6 fragment handling more scalable and less susceptible to overload and locking contention, from Jesper Dangaard Brouer. 25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_() instead. From Hong Zhiguo. 26) Optimize route usage in IPVS by avoiding reference counting where possible, from Julian Anastasov. 27) Convert IPVS schedulers to RCU, also from Julian Anastasov. 28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger Eitzenberger. 29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG, nfnetlink_log, and nfnetlink_queue. From Gao feng. 30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa. 31) Support several new r8169 chips, from Hayes Wang. 32) Support tokenized interface identifiers in ipv6, from Daniel Borkmann. 33) Use usbnet_link_change() helper in USB net driver, from Ming Lei. 34) Add 802.1ad vlan offload support, from Patrick McHardy. 35) Support mmap() based netlink communication, also from Patrick McHardy. 36) Support HW timestamping in mlx4 driver, from Amir Vadai. 37) Rationalize AF_PACKET packet timestamping when transmitting, from Willem de Bruijn and Daniel Borkmann. 38) Bring parity to what's provided by /proc/net/packet socket dumping and the info provided by netlink socket dumping of AF_PACKET sockets. From Nicolas Dichtel. 39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin Poirier" git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) filter: fix va_list build error af_unix: fix a fatal race with bit fields bnx2x: Prevent memory leak when cnic is absent bnx2x: correct reading of speed capabilities net: sctp: attribute printl with __printf for gcc fmt checks netlink: kconfig: move mmap i/o into netlink kconfig netpoll: convert mutex into a semaphore netlink: Fix skb ref counting. net_sched: act_ipt forward compat with xtables mlx4_en: fix a build error on 32bit arches Revert "bnx2x: allow nvram test to run when device is down" bridge: avoid OOPS if root port not found drivers: net: cpsw: fix kernel warn on cpsw irq enable sh_eth: use random MAC address if no valid one supplied 3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA) tg3: fix to append hardware time stamping flags unix/stream: fix peeking with an offset larger than data in queue unix/dgram: fix peeking with an offset larger than data in queue unix/dgram: peek beyond 0-sized skbs openvswitch: Remove unneeded ovs_netdev_get_ifindex() ...	2013-05-01 14:08:52 -07:00
Linus Torvalds	08d7676083	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull compat cleanup from Al Viro: "Mostly about syscall wrappers this time; there will be another pile with patches in the same general area from various people, but I'd rather push those after both that and vfs.git pile are in." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: syscalls.h: slightly reduce the jungles of macros get rid of union semop in sys_semctl(2) arguments make do_mremap() static sparc: no need to sign-extend in sync_file_range() wrapper ppc compat wrappers for add_key(2) and request_key(2) are pointless x86: trim sys_ia32.h x86: sys32_kill and sys32_mprotect are pointless get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC merge compat sys_ipc instances consolidate compat lookup_dcookie() convert vmsplice to COMPAT_SYSCALL_DEFINE switch getrusage() to COMPAT_SYSCALL_DEFINE switch epoll_pwait to COMPAT_SYSCALL_DEFINE convert sendfile{,64} to COMPAT_SYSCALL_DEFINE switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect make HAVE_SYSCALL_WRAPPERS unconditional consolidate cond_syscall and SYSCALL_ALIAS declarations teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long get rid of duplicate logics in __SC_....[1-6] definitions	2013-05-01 07:21:43 -07:00
Linus Torvalds	5f56886521	Merge branch 'akpm' (incoming from Andrew) Merge third batch of fixes from Andrew Morton: "Most of the rest. I still have two large patchsets against AIO and IPC, but they're a bit stuck behind other trees and I'm about to vanish for six days. - random fixlets - inotify - more of the MM queue - show_stack() cleanups - DMI update - kthread/workqueue things - compat cleanups - epoll udpates - binfmt updates - nilfs2 - hfs - hfsplus - ptrace - kmod - coredump - kexec - rbtree - pids - pidns - pps - semaphore tweaks - some w1 patches - relay updates - core Kconfig changes - sysrq tweaks" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits) Documentation/sysrq: fix inconstistent help message of sysrq key ethernet/emac/sysrq: fix inconstistent help message of sysrq key sparc/sysrq: fix inconstistent help message of sysrq key powerpc/xmon/sysrq: fix inconstistent help message of sysrq key ARM/etm/sysrq: fix inconstistent help message of sysrq key power/sysrq: fix inconstistent help message of sysrq key kgdb/sysrq: fix inconstistent help message of sysrq key lib/decompress.c: fix initconst notifier-error-inject: fix module names in Kconfig kernel/sys.c: make prctl(PR_SET_MM) generally available UAPI: remove empty Kbuild files menuconfig: print more info for symbol without prompts init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display kconfig menu: move Virtualization drivers near other virtualization options Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS relay: use macro PAGE_ALIGN instead of FIX_SIZE kernel/relay.c: move FIX_SIZE macro into relay.c kernel/relay.c: remove unused function argument actor drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave() drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave() ...	2013-04-30 17:37:43 -07:00
Stephen Boyd	446f24d119	Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS The help text for this config is duplicated across the x86, parisc, and s390 Kconfig.debug files. Arnd Bergman noted that the help text was slightly misleading and should be fixed to state that enabling this option isn't a problem when using pre 4.4 gcc. To simplify the rewording, consolidate the text into lib/Kconfig.debug and modify it there to be more explicit about when you should say N to this config. Also, make the text a bit more generic by stating that this option enables compile time checks so we can cover architectures which emit warnings vs. ones which emit errors. The details of how an architecture decided to implement the checks isn't as important as the concept of compile time checking of copy_from_user() calls. While we're doing this, remove all the copy_from_user_overflow() code that's duplicated many times and place it into lib/ so that any architecture supporting this option can get the function for free. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:09 -07:00
Oleg Nesterov	079148b919	coredump: factor out the setting of PF_DUMPCORE Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into zap_threads() called by do_coredump(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Mandeep Singh Baines <msb@chromium.org> Cc: Neil Horman <nhorman@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:06 -07:00
Tejun Heo	a43cb95d54	dump_stack: unify debug information printed by show_regs() show_regs() is inherently arch-dependent but it does make sense to print generic debug information and some archs already do albeit in slightly different forms. This patch introduces a generic function to print debug information from show_regs() so that different archs print out the same information and it's much easier to modify what's printed. show_regs_print_info() prints out the same debug info as dump_stack() does plus task and thread_info pointers. * Archs which didn't print debug info now do. alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r, metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc, um, xtensa * Already prints debug info. Replaced with show_regs_print_info(). The printed information is superset of what used to be there. arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86 * s390 is special in that it used to print arch-specific information along with generic debug info. Heiko and Martin think that the arch-specific extra isn't worth keeping s390 specfic implementation. Converted to use the generic version. Note that now all archs print the debug info before actual register dumps. An example BUG() dump follows. kernel BUG at /work/os/work/kernel/workqueue.c:4841! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000 RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6 RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650 0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760 Call Trace: [<ffffffff81000312>] do_one_initcall+0x122/0x170 [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8 [<ffffffff81c47760>] ? rest_init+0x140/0x140 [<ffffffff81c4776e>] kernel_init+0xe/0xf0 [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0 [<ffffffff81c47760>] ? rest_init+0x140/0x140 ... v2: Typo fix in x86-32. v3: CPU number dropped from show_regs_print_info() as dump_stack_print_info() has been updated to print it. s390 specific implementation dropped as requested by s390 maintainers. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Tejun Heo	98e5e1bf72	dump_stack: implement arch-specific hardware description in task dumps x86 and ia64 can acquire extra hardware identification information from DMI and print it along with task dumps; however, the usage isn't consistent. * x86 show_regs() collects vendor, product and board strings and print them out with PID, comm and utsname. Some of the information is printed again later in the same dump. * warn_slowpath_common() explicitly accesses the DMI board and prints it out with "Hardware name:" label. This applies to both x86 and ia64 but is irrelevant on all other archs. * ia64 doesn't show DMI information on other non-WARN dumps. This patch introduces arch-specific hardware description used by dump_stack(). It can be set by calling dump_stack_set_arch_desc() during boot and, if exists, printed out in a separate line with "Hardware name:" label. dmi_set_dump_stack_arch_desc() is added which sets arch-specific description from DMI data. It uses dmi_ids_string[] which is set from dmi_present() used for DMI debug message. It is superset of the information x86 show_regs() is using. The function is called from x86 and ia64 boot code right after dmi_scan_machine(). This makes the explicit DMI handling in warn_slowpath_common() unnecessary. Removed. show_regs() isn't yet converted to use generic debug information printing and this patch doesn't remove the duplicate DMI handling in x86 show_regs(). The next patch will unify show_regs() handling and remove the duplication. An example WARN dump follows. WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505() Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48 ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e 0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58 Call Trace: [<ffffffff81c614dc>] dump_stack+0x19/0x1b [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0 [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8234a0c3>] init_workqueues+0x35/0x505 ... v2: Use the same string as the debug message from dmi_present() which also contains BIOS information. Move hardware name into its own line as warn_slowpath_common() did. This change was suggested by Bjorn Helgaas. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Tejun Heo	196779b9b4	dump_stack: consolidate dump_stack() implementations and unify their behaviors Both dump_stack() and show_stack() are currently implemented by each architecture. show_stack(NULL, NULL) dumps the backtrace for the current task as does dump_stack(). On some archs, dump_stack() prints extra information - pid, utsname and so on - in addition to the backtrace while the two are identical on other archs. The usages in arch-independent code of the two functions indicate show_stack(NULL, NULL) should print out bare backtrace while dump_stack() is used for debugging purposes when something went wrong, so it does make sense to print additional information on the task which triggered dump_stack(). There's no reason to require archs to implement two separate but mostly identical functions. It leads to unnecessary subtle information. This patch expands the dummy fallback dump_stack() implementation in lib/dump_stack.c such that it prints out debug information (taken from x86) and invokes show_stack(NULL, NULL) and drops arch-specific dump_stack() implementations in all archs except blackfin. Blackfin's dump_stack() does something wonky that I don't understand. Debug information can be printed separately by calling dump_stack_print_info() so that arch-specific dump_stack() implementation can still emit the same debug information. This is used in blackfin. This patch brings the following behavior changes. * On some archs, an extra level in backtrace for show_stack() could be printed. This is because the top frame was determined in dump_stack() on those archs while generic dump_stack() can't do that reliably. It can be compensated by inlining dump_stack() but not sure whether that'd be necessary. * Most archs didn't use to print debug info on dump_stack(). They do now. An example WARN dump follows. WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505() Hardware name: empty Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9 0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48 ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c 0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58 Call Trace: [<ffffffff81c614dc>] dump_stack+0x19/0x1b [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8234a071>] init_workqueues+0x35/0x505 ... v2: CPU number added to the generic debug info as requested by s390 folks and dropped the s390 specific dump_stack(). This loses %ksp from the debug message which the maintainers think isn't important enough to keep the s390-specific dump_stack() implementation. dump_stack_print_info() is moved to kernel/printk.c from lib/dump_stack.c. Because linkage is per objecct file, dump_stack_print_info() living in the same lib file as generic dump_stack() means that archs which implement custom dump_stack() - at this point, only blackfin - can't use dump_stack_print_info() as that will bring in the generic version of dump_stack() too. v1 The v1 patch broke build on blackfin due to this issue. The build breakage was reported by Fengguang Wu. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390 bits] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Tejun Heo	a77f2a4e6f	x86: don't show trace beyond show_stack(NULL, NULL) There are multiple ways a task can be dumped - explicit call to dump_stack(), triggering WARN() or BUG(), through sysrq-t and so on. Most of what gets printed is upto each architecture and the current state is not particularly pretty. Different pieces of information are presented differently depending on which path the dump takes and which architecture it's running on. This is messy for no good reason and makes it exceedingly difficult to add or modify debug information to task dumps. In all archs except for s390, there's nothing arch-specific about the printed debug information. This patchset updates all those archs to use the same helpers to consistently print out the same debug information. An example WARN dump after this patchset. WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505() Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48 ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e 0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58 Call Trace: [<ffffffff81c614dc>] dump_stack+0x19/0x1b [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0 [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8234a0c3>] init_workqueues+0x35/0x505 ... And BUG dump. kernel BUG at kernel/workqueue.c:4841! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000 RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6 RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650 0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760 Call Trace: [<ffffffff81000312>] do_one_initcall+0x122/0x170 [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8 [<ffffffff81c47760>] ? rest_init+0x140/0x140 [<ffffffff81c4776e>] kernel_init+0xe/0xf0 [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0 [<ffffffff81c47760>] ? rest_init+0x140/0x140 ... This patchset contains the following seven patches. 0001-x86-don-t-show-trace-beyond-show_stack-NULL-NULL.patch 0002-sparc32-make-show_stack-acquire-fp-if-_ksp-is-not-sp.patch 0003-dump_stack-consolidate-dump_stack-implementations-an.patch 0004-dmi-morph-dmi_dump_ids-into-dmi_format_ids-which-for.patch 0005-dump_stack-implement-arch-specific-hardware-descript.patch 0006-dump_stack-unify-debug-information-printed-by-show_r.patch 0007-arc-print-fatal-signals-reduce-duplicated-informatio.patch 0001-0002 update stack dumping functions in x86 and sparc32 in preparation. 0003 makes all arches except blackfin use generic dump_stack(). blackfin still uses the generic helper to print the same info. 0004-0005 properly abstract DMI identifier printing in WARN() and show_regs() so that all dumps print out the information. This enables show_regs() to use the same debug info message. 0006 updates show_regs() of all arches to use a common generic helper to print debug info. 0007 removes somem duplicate information from arc dumps. While this patchset changes how debug info is printed on some archs, the printed information is always superset of what used to be there. This patchset makes task dump debug messages consistent and enables adding more information. Workqueue is scheduled to add worker information including the workqueue in use and work item specific description. While this patch touches a lot of archs, it isn't too likely to cause non-trivial conflicts with arch-specfic changes and would probably be best to route together either through -mm. x86 is tested but other archs are either only compile tested or not tested at all. Changes to most archs are generally trivial. This patch: show_stack(current or NULL, NULL) is used to print the backtrace of the current task. As trace beyond the function itself isn't of much interest to anyone, don't show it by determining sp and bp in show_stack()'s frame and passing them to show_stack_log_lvl(). This brings show_stack(NULL, NULL)'s behavior in line with dump_stack(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:01 -07:00
Linus Torvalds	3ed1c478ef	Power management and ACPI updates for 3.10-rc1 - ARM big.LITTLE cpufreq driver from Viresh Kumar. - exynos5440 cpufreq driver from Amit Daniel Kachhap. - cpufreq core cleanup and code consolidation from Viresh Kumar and Stratos Karafotis. - cpufreq scalability improvement from Nathan Zimmer. - AMD "frequency sensitivity feedback" powersave bias for the ondemand cpufreq governor from Jacob Shin. - cpuidle code consolidation and cleanups from Daniel Lezcano. - ARM OMAP cpuidle fixes from Santosh Shilimkar and Daniel Lezcano. - ACPICA fixes and other improvements from Bob Moore, Jung-uk Kim, Lv Zheng, Yinghai Lu, Tang Chen, Colin Ian King, and Linn Crosetto. - ACPI core updates related to hotplug from Toshi Kani, Paul Bolle, Yasuaki Ishimatsu, and Rafael J. Wysocki. - Intel Lynxpoint LPSS (Low-Power Subsystem) support improvements from Rafael J. Wysocki and Andy Shevchenko. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJRf8M8AAoJEKhOf7ml8uNsud4P/3cabXP5lDipzibRrpOiONse puuvIdhtNdMRMc3t1oSDjNH/w/JA51Gc+ICGFAORiyVmqxBc85mpT6J5ibqV7hNd pCqbKJceoB5PajHZSx22e4wG9O7YN1k3r80p38IfFzA+Ct0KNSuE0ixMEfHKYjiq p5pXswk6TY3gtBReH9agrafHqDtXw4IMTE0asMuJ+BorPW7vQeiNlrkuA+0qmDuu 26O0Pm2TVkx1ryfTjdM9zSZ9X2G4JuM8rm1/VFZWQJTExwlv3bA2Za1nvQNJlJ99 6JZ0JXfAehcEW2Ye0sqsZ8HSEabDVHM29QvvOszJ5RpBXERiOCHOkhvFleCoTpn0 Xq0rtXPrLMH1G28Ej+cxmsAjfzOLV2Byg30CAoI/GCLuQ+xh+VMCpuNYQuld25CG 9rtYd0fWESeYsAebhDcX0E3xyzJtbrHtOb9PyGwNkbAJ8YQfhVSMCOPi2SX2wa+Q qXLXi2VaHvjBSUKcAv5BmM+Ya57Be+88D0LxbgXbUeOnYefUK1ljldKDDshkMjgG P4LPdm4JpoB5ncXSOO1Dz9w9QnNcFexSUySd/TtKLNMha1vEHV8ISzNPYY+9IdXf XN0VZbFnUDzdj+Fwna7zyFb1cGihDYJKAtpXvRd8Y6RGUxKx9uGLAFJZw/xZB/cR KZKuML5O8MgJuef37F38 =H/se -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael J Wysocki: - ARM big.LITTLE cpufreq driver from Viresh Kumar. - exynos5440 cpufreq driver from Amit Daniel Kachhap. - cpufreq core cleanup and code consolidation from Viresh Kumar and Stratos Karafotis. - cpufreq scalability improvement from Nathan Zimmer. - AMD "frequency sensitivity feedback" powersave bias for the ondemand cpufreq governor from Jacob Shin. - cpuidle code consolidation and cleanups from Daniel Lezcano. - ARM OMAP cpuidle fixes from Santosh Shilimkar and Daniel Lezcano. - ACPICA fixes and other improvements from Bob Moore, Jung-uk Kim, Lv Zheng, Yinghai Lu, Tang Chen, Colin Ian King, and Linn Crosetto. - ACPI core updates related to hotplug from Toshi Kani, Paul Bolle, Yasuaki Ishimatsu, and Rafael J Wysocki. - Intel Lynxpoint LPSS (Low-Power Subsystem) support improvements from Rafael J Wysocki and Andy Shevchenko. * tag 'pm+acpi-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (192 commits) cpufreq: Revert incorrect commit `5800043` cpufreq: MAINTAINERS: Add co-maintainer cpuidle: add maintainer entry ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points ARM: s3c64xx: cpuidle: use init/exit common routine cpufreq: pxa2xx: initialize variables ACPI: video: correct acpi_video_bus_add error processing SH: cpuidle: use init/exit common routine ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y ACPI: Fix wrong parameter passed to memblock_reserve cpuidle: fix comment format pnp: use %*phC to dump small buffers isapnp: remove debug leftovers ARM: imx: cpuidle: use init/exit common routine ARM: davinci: cpuidle: use init/exit common routine ARM: kirkwood: cpuidle: use init/exit common routine ARM: calxeda: cpuidle: use init/exit common routine ARM: tegra: cpuidle: use init/exit common routine for tegra3 ARM: tegra: cpuidle: use init/exit common routine for tegra2 ARM: OMAP4: cpuidle: use init/exit common routine ...	2013-04-30 15:21:02 -07:00
Bin Gao	dd72be99d1	x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0 For real PCI devices 00:00.0, 00:02.0 and 00:03.0, there is either no PCI shim, or no guarantee of data correctness of offset 256-4k. So for whatever reason, Linux kernel should not do MMCFG PCI config access to those devices. Instead, always use configuration mechanism 1 for those devices. The 00:00.0, 00:02.0 and 00:03.0 devices are built-in single-function devices and are not PCI-PCI bridges, so this set of devices should be complete. Signed-off-by: Bin Gao <bin.gao@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2013-04-30 09:55:39 -07:00
Linus Torvalds	5d434fcb25	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small code cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits) mm: Convert print_symbol to %pSR gfs2: Convert print_symbol to %pSR m32r: Convert print_symbol to %pSR iostats.txt: add easy-to-find description for field 6 x86 cmpxchg.h: fix wrong comment treewide: Fix typo in printk and comments doc: devicetree: Fix various typos docbook: fix 8250 naming in device-drivers pata_pdc2027x: Fix compiler warning treewide: Fix typo in printks mei: Fix comments in drivers/misc/mei treewide: Fix typos in kernel messages pm44xx: Fix comment for "CONFIG_CPU_IDLE" doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" mmzone: correct "pags" to "pages" in comment. kernel-parameters: remove outdated 'noresidual' parameter Remove spurious _H suffixes from ifdef comments sound: Remove stray pluses from Kconfig file radio-shark: Fix printk "CONFIG_LED_CLASS" doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE ...	2013-04-30 09:36:50 -07:00
Linus Torvalds	5a5a1bf099	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - Add an Intel CMCI hotplug fix - Add AMD family 16h EDAC support - Make the AMD MCE banks code more flexible for virtual environments * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: amd64_edac: Add Family 16h support x86/mce: Rework cmci_rediscover() to play well with CPU hotplug x86, MCE, AMD: Use MCG_CAP MSR to find out number of banks on AMD x86, MCE, AMD: Replace shared_bank array with is_shared_bank() helper	2013-04-30 08:42:45 -07:00
Linus Torvalds	74c7d2f520	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: "Small fixes and cleanups all over the map" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/setup: Drop unneeded include <asm/dmi.h> x86/olpc/xo1/sci: Don't call input_free_device() after input_unregister_device() x86/platform/intel/mrst: Remove cast for kmalloc() return value x86/platform/uv: Replace kmalloc() & memset with kzalloc()	2013-04-30 08:42:11 -07:00
Linus Torvalds	1e2f5b598a	Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt update from Ingo Molnar: "Various paravirtualization related changes - the biggest one makes guest support optional via CONFIG_HYPERVISOR_GUEST" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, wakeup, sleep: Use pvops functions for changing GDT entries x86, xen, gdt: Remove the pvops variant of store_gdt. x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed. x86: Make Linux guest support optional x86, Kconfig: Move PARAVIRT_DEBUG into the paravirt menu	2013-04-30 08:41:21 -07:00
Linus Torvalds	f9b3bcfbc4	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "Misc smaller changes all over the map" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/iommu/dmar: Remove warning for HPET scope type x86/mm/gart: Drop unnecessary check x86/mm/hotplug: Put kernel_physical_mapping_remove() declaration in CONFIG_MEMORY_HOTREMOVE x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER x86/mm/numa: Simplify some bit mangling x86/mm: Re-enable DEBUG_TLBFLUSH for X86_32 x86/mm/cpa: Cleanup split_large_page() and its callee x86: Drop always empty .text..page_aligned section	2013-04-30 08:40:35 -07:00
Linus Torvalds	01c7cd0ef5	Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perparatory x86 kasrl changes from Ingo Molnar: "This contains changes from the ongoing KASLR work, by Kees Cook. The main changes are the use of a read-only IDT on x86 (which decouples the userspace visible virtual IDT address from the physical address), and a rework of ELF relocation support, in preparation of random, boot-time kernel image relocation." * 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, relocs: Refactor the relocs tool to merge 32- and 64-bit ELF x86, relocs: Build separate 32/64-bit tools x86, relocs: Add 64-bit ELF support to relocs tool x86, relocs: Consolidate processing logic x86, relocs: Generalize ELF structure names x86: Use a read-only IDT alias on all CPUs	2013-04-30 08:37:24 -07:00
Linus Torvalds	39b2f8656e	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug update from Ingo Molnar: "Two small changes: a documentation update and a constification" * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, early-printk: Update earlyprintk documentation (and kill x86 copy) x86: Constify a few items	2013-04-30 08:35:20 -07:00
Linus Torvalds	df8edfa9af	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid changes from Ingo Molnar: "The biggest change is x86 CPU bug handling refactoring and cleanups, by Borislav Petkov" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, CPU, AMD: Drop useless label x86, AMD: Correct {rd,wr}msr_amd_safe warnings x86: Fold-in trivial check_config function x86, cpu: Convert AMD Erratum 400 x86, cpu: Convert AMD Erratum 383 x86, cpu: Convert Cyrix coma bug detection x86, cpu: Convert FDIV bug detection x86, cpu: Convert F00F bug detection x86, cpu: Expand cpufeature facility to include cpu bugs	2013-04-30 08:34:38 -07:00
Linus Torvalds	874f6d1be7	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc smaller cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/lib: Fix spelling, put space between a numeral and its units x86/lib: Fix spelling in the comments x86, quirks: Shut-up a long-standing gcc warning x86, msr: Unify variable names x86-64, docs, mm: Add vsyscall range to virtual address space layout x86: Drop KERNEL_IMAGE_START x86_64: Use __BOOT_DS instead_of __KERNEL_DS for safety	2013-04-30 08:34:07 -07:00
Linus Torvalds	ab86e974f0	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core timer updates from Ingo Molnar: "The main changes in this cycle's merge are: - Implement shadow timekeeper to shorten in kernel reader side blocking, by Thomas Gleixner. - Posix timers enhancements by Pavel Emelyanov: - allocate timer ID per process, so that exact timer ID allocations can be re-created be checkpoint/restore code. - debuggability and tooling (/proc/PID/timers, etc.) improvements. - suspend/resume enhancements by Feng Tang: on certain new Intel Atom processors (Penwell and Cloverview), there is a feature that the TSC won't stop in S3 state, so the TSC value won't be reset to 0 after resume. This can be taken advantage of by the generic via the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to recover/approximate sleep time, the main (and precise) clocksource can be used. - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many CPUs the file goes beyond 4MB of size and thus the current simplistic seqfile approach fails. Convert /proc/timer_list to a proper seq_file with its own iterator. - Cleanups and refactorings of the core timekeeping code by John Stultz. - International Atomic Clock time is managed by the NTP code internally currently but not exposed externally. Separate the TAI code out and add CLOCK_TAI support and TAI support to the hrtimer and posix-timer code, by John Stultz. - Add deep idle support enhacement to the broadcast clockevents core timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ clockevents feature (which will be utilized by future clockevents driver updates), which allows the use of IRQ affinities to avoid spurious wakeups of idle CPUs - the right CPU with an expiring timer will be woken. - Add new ARM bcm281xx clocksource driver, by Christian Daudt - ... various other fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) clockevents: Set dummy handler on CPU_DEAD shutdown timekeeping: Update tk->cycle_last in resume posix-timers: Remove unused variable clockevents: Switch into oneshot mode even if broadcast registered late timer_list: Convert timer list to be a proper seq_file timer_list: Split timer_list_show_tickdevices posix-timers: Show sigevent info in proc file posix-timers: Introduce /proc/PID/timers file posix timers: Allocate timer id per process (v2) timekeeping: Make sure to notify hrtimers when TAI offset changes hrtimer: Fix ktime_add_ns() overflow on 32bit architectures hrtimer: Add expiry time overflow check in hrtimer_interrupt timekeeping: Shorten seq_count region timekeeping: Implement a shadow timekeeper timekeeping: Delay update of clock->cycle_last timekeeping: Store cycle_last value in timekeeper struct as well ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state timekeeping: Simplify tai updating from do_adjtimex timekeeping: Hold timekeepering locks in do_adjtimex and hardpps timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex() ...	2013-04-30 08:15:40 -07:00
Linus Torvalds	8700c95adb	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP/hotplug changes from Ingo Molnar: "This is a pretty large, multi-arch series unifying and generalizing the various disjunct pieces of idle routines that architectures have historically copied from each other and have grown in random, wildly inconsistent and sometimes buggy directions: 101 files changed, 455 insertions(+), 1328 deletions(-) this went through a number of review and test iterations before it was committed, it was tested on various architectures, was exposed to linux-next for quite some time - nevertheless it might cause problems on architectures that don't read the mailing lists and don't regularly test linux-next. This cat herding excercise was motivated by the -rt kernel, and was brought to you by Thomas "the Whip" Gleixner." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) idle: Remove GENERIC_IDLE_LOOP config switch um: Use generic idle loop ia64: Make sure interrupts enabled when we "safe_halt()" sparc: Use generic idle loop idle: Remove unused ARCH_HAS_DEFAULT_IDLE bfin: Fix typo in arch_cpu_idle() xtensa: Use generic idle loop x86: Use generic idle loop unicore: Use generic idle loop tile: Use generic idle loop tile: Enter idle with preemption disabled sh: Use generic idle loop score: Use generic idle loop s390: Use generic idle loop powerpc: Use generic idle loop parisc: Use generic idle loop openrisc: Use generic idle loop mn10300: Use generic idle loop mips: Use generic idle loop microblaze: Use generic idle loop ...	2013-04-30 07:50:17 -07:00
Linus Torvalds	16fa94b532	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this development cycle were: - full dynticks preparatory work by Frederic Weisbecker - factor out the cpu time accounting code better, by Li Zefan - multi-CPU load balancer cleanups and improvements by Joonsoo Kim - various smaller fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) sched: Fix init NOHZ_IDLE flag sched: Prevent to re-select dst-cpu in load_balance() sched: Rename load_balance_tmpmask to load_balance_mask sched: Move up affinity check to mitigate useless redoing overhead sched: Don't consider other cpus in our group in case of NEWLY_IDLE sched: Explicitly cpu_idle_type checking in rebalance_domains() sched: Change position of resched_cpu() in load_balance() sched: Fix wrong rq's runnable_avg update with rt tasks sched: Document task_struct::personality field sched/cpuacct/UML: Fix header file dependency bug on the UML build cgroup: Kill subsys.active flag sched/cpuacct: No need to check subsys active state sched/cpuacct: Initialize cpuacct subsystem earlier sched/cpuacct: Initialize root cpuacct earlier sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically sched/cpuacct: Clean up cpuacct.h sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field() sched/cpuacct: Remove redundant NULL checks in cpuacct_charge() sched/cpuacct: Add cpuacct_acount_field() sched/cpuacct: Add cpuacct_init() ...	2013-04-30 07:43:28 -07:00
Linus Torvalds	e0972916e8	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Features: - Add "uretprobes" - an optimization to uprobes, like kretprobes are an optimization to kprobes. "perf probe -x file sym%return" now works like kretprobes. By Oleg Nesterov. - Introduce per core aggregation in 'perf stat', from Stephane Eranian. - Add memory profiling via PEBS, from Stephane Eranian. - Event group view for 'annotate' in --stdio, --tui and --gtk, from Namhyung Kim. - Add support for AMD NB and L2I "uncore" counters, by Jacob Shin. - Add Ivy Bridge-EP uncore support, by Zheng Yan - IBM zEnterprise EC12 oprofile support patchlet from Robert Richter. - Add perf test entries for checking breakpoint overflow signal handler issues, from Jiri Olsa. - Add perf test entry for for checking number of EXIT events, from Namhyung Kim. - Add perf test entries for checking --cpu in record and stat, from Jiri Olsa. - Introduce perf stat --repeat forever, from Frederik Deweerdt. - Add --no-demangle to report/top, from Namhyung Kim. - PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes, by Oleg Nesterov. Various fixes and refactorings: - Fix dependency of the python binding wrt libtraceevent, from Naohiro Aota. - Simplify some perf_evlist methods and to allow 'stat' to share code with 'record' and 'trace', by Arnaldo Carvalho de Melo. - Remove dead code in related to libtraceevent integration, from Namhyung Kim. - Revert "perf sched: Handle PERF_RECORD_EXIT events" to get 'perf sched lat' back working, by Arnaldo Carvalho de Melo - We don't use Newt anymore, just plain libslang, by Arnaldo Carvalho de Melo. - Kill a bunch of die() calls, from Namhyung Kim. - Fix build on non-glibc systems due to libio.h absence, from Cody P Schafer. - Remove some perf_session and tracing dead code, from David Ahern. - Honor parallel jobs, fix from Borislav Petkov - Introduce tools/lib/lk library, initially just removing duplication among tools/perf and tools/vm. from Borislav Petkov ... and many more I missed to list, see the shortlog and git log for more details." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (136 commits) perf/x86/intel/P4: Robistify P4 PMU types perf/x86/amd: Fix AMD NB and L2I "uncore" support perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c perf/x86: Check all MSRs before passing hw check perf/x86/amd: Add support for AMD NB and L2I "uncore" counters perf/x86/intel: Add Ivy Bridge-EP uncore support perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management perf/x86: Avoid kfree() in CPU_{STARTING,DYING} uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit() uprobes/tracing: Change create_trace_uprobe() to support uretprobes uprobes/tracing: Make seq_printf() code uretprobe-friendly uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher() uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers uprobes/tracing: Generalize struct uprobe_trace_entry_head uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls uprobes/tracing: Kill the pointless seq_print_ip_sym() call uprobes/tracing: Kill the pointless task_pt_regs() calls ...	2013-04-30 07:41:01 -07:00
Jan-Simon Möller	1b0dac2ac6	perf/x86/intel: Fix unintended variable name reuse The variable name events_group is already in used and led to a compilation error when using clang to build the Linux Kernel . The fix is just to rename the var. No functional change. Please apply. Fix suggested in discussion by PaX Team <pageexec@freemail.hu> Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de> Cc: rostedt@goodmis.org Cc: a.p.zijlstra@chello.nl Cc: paulus@samba.org Cc: acme@ghostprotocols.net Link: http://lkml.kernel.org/r/1367316153-14808-1-git-send-email-dl9pf@gmx.de Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-30 13:12:47 +02:00
Matt Fleming	a614e1923d	Linux 3.9 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJRfcB+AAoJEHm+PkMAQRiGmLAH/0bIpdOYylJhRDmVOztXpANP jRYYH00UiSIBz8XO463dbbtevT2pB8pIw5TCxBWBi/V5rnJS9X5pvAHyNZBDUvYd 3BQCQ2cnQ+6stFpi4o6NciZzQShDGMmUxAOD6ejZM35/P2l+ZKrNqBwy3R4oeMuZ /WUYZTCfFF3G7qgkHoOwIjM6c34v0tpqLfx4R5CdTnKe0Ow0OGb5ko5+lefD6i9m 6cd2GFlWeIUvw0FSMLyB+HN6Tkf3JnwrklP+vuLNV+uOq5BLwggGc6A1eS51IuVJ e/ZkGTtirz+mZiG5lvqSXHaVEObPsbm32XfVVHp1SiE+TIugDb2uhtEQEv+a43w= =UOGY -----END PGP SIGNATURE----- Merge tag 'v3.9' into efi-for-tip2 Resolve conflicts for Ingo. Conflicts: drivers/firmware/Kconfig drivers/firmware/efivars.c Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-04-30 11:42:13 +01:00
Vince Weaver	9a6bc14350	perf/x86/intel: Add support for IvyBridge model 58 Uncore According to Intel Vol3b 18.9, the IvyBridge model 58 uncore is the same as that of SandyBridge. I've done some simple tests and with this patch things seem to work on my mac-mini. Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Stephane Eranian <eranian@gmail.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1304291549320.15827@vincent-weaver-1.um.maine.edu Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-30 10:56:45 +02:00
Vince Weaver	80e217e9ca	perf/x86/intel: Fix typo in perf_event_intel_uncore.c Sandy Bridge was misspelled. Either that or the Intel marketing names are getting even more obscure. Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1304291546590.15827@vincent-weaver-1.um.maine.edu [ Haha ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-30 10:56:44 +02:00
Li Fei	f7b0e10555	x86: Eliminate irq_mis_count counted in arch_irq_stat With the current implementation, kstat_cpu(cpu).irqs_sum is also increased in case of irq_mis_count increment. So there is no need to count irq_mis_count in arch_irq_stat, otherwise irq_mis_count will be counted twice in the sum of /proc/stat. Reported-by: Liu Chuansheng <chuansheng.liu@intel.com> Signed-off-by: Li Fei <fei.li@intel.com> Acked-by: Liu Chuansheng <chuansheng.liu@intel.com> Cc: tomoki.sekiyama.qu@hitachi.com Cc: joe@perches.com Link: http://lkml.kernel.org/r/1366980611.32469.7.camel@fli24-HP-Compaq-8100-Elite-CMT-PC Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-30 10:56:37 +02:00
Alex Williamson	4cee4b72f1	kvm: KVM_CAP_IOMMU only available with device assignment Fix build with CONFIG_PCI unset by linking KVM_CAP_IOMMU to device assignment config option. It has no purpose otherwise. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-04-29 23:08:50 -03:00
Akinobu Mita	56a1ba5fd1	x86: rename random32() to prandom_u32() Use preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:42 -07:00
Akinobu Mita	4a7b4d2360	x86: pageattr-test: remove srandom32 call pageattr-test calls srandom32() once every test iteration. But calling srandom32() after late_initcalls is not meaningfull. Because the random states for random32() is mixed by good random numbers in late_initcall prandom_reseed(). So this removes the call to srandom32(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:42 -07:00
Thomas Gleixner	d0380e6c3c	early_printk: consolidate random copies of identical code The early console implementations are the same all over the place. Move the print function to kernel/printk and get rid of the copies. [akpm@linux-foundation.org: arch/mips/kernel/early_printk.c needs kernel.h for va_list] [paul.gortmaker@windriver.com: sh4: make the bios early console support depend on EARLY_PRINTK] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: Mike Frysinger <vapier@gentoo.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Richard Weinberger <richard@nod.at> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:13 -07:00
Cody P Schafer	d2ad351e36	x86/mm/numa: use setup_nr_node_ids() instead of opencoding. Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:36 -07:00
Johannes Weiner	8e2cdbcb86	x86-64: fall back to regular page vmemmap on allocation failure Memory hotplug can happen on a machine under load, memory shortness and fragmentation, so huge page allocations for the vmemmap are not guaranteed to succeed. Try to fall back to regular pages before failing the hotplug event completely. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:35 -07:00
Johannes Weiner	e8216da5c7	x86-64: use vmemmap_populate_basepages() for !pse setups We already have generic code to allocate vmemmap with regular pages, use it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:35 -07:00
Johannes Weiner	6c7a2ca4c1	x86-64: remove dead debugging code for !pse setups No need to maintain addr_end and p_end when they are never actually read anywhere on !pse setups. Remove the dead code. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:35 -07:00
Johannes Weiner	0aad818b2d	sparse-vmemmap: specify vmemmap population range in bytes The sparse code, when asking the architecture to populate the vmemmap, specifies the section range as a starting page and a number of pages. This is an awkward interface, because none of the arch-specific code actually thinks of the range in terms of 'struct page' units and always translates it to bytes first. In addition, later patches mix huge page and regular page backing for the vmemmap. For this, they need to call vmemmap_populate_basepages() on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these are not necessarily multiples of the 'struct page' size and so this unit is too coarse. Just translate the section range into bytes once in the generic sparse code, then pass byte ranges down the stack. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: David S. Miller <davem@davemloft.net> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:35 -07:00
Joonsoo Kim	ef93247325	mm, vmalloc: change iterating a vmlist to find_vm_area() This patchset removes vm_struct list management after initializing vmalloc. Adding and removing an entry to vmlist is linear time complexity, so it is inefficient. If we maintain this list, overall time complexity of adding and removing area to vmalloc space is O(N), although we use rbtree for finding vacant place and it's time complexity is just O(logN). And vmlist and vmlist_lock is used many places of outside of vmalloc.c. It is preferable that we hide this raw data structure and provide well-defined function for supporting them, because it makes that they cannot mistake when manipulating theses structure and it makes us easily maintain vmalloc layer. For kexec and makedumpfile, I export vmap_area_list, instead of vmlist. This comes from Atsushi's recommendation. For more information, please refer below link. https://lkml.org/lkml/2012/12/6/184 This patch: The purpose of iterating a vmlist is finding vm area with specific virtual address. find_vm_area() is provided for this purpose and more efficient, because it uses a rbtree. So change it. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:33 -07:00
Gerald Schaefer	106c992a5e	mm/hugetlb: add more arch-defined huge_pte functions Commit `abf09bed3c` ("s390/mm: implement software dirty bits") introduced another difference in the pte layout vs. the pmd layout on s390, thoroughly breaking the s390 support for hugetlbfs. This requires replacing some more pte_xxx functions in mm/hugetlbfs.c with a huge_pte_xxx version. This patch introduces those huge_pte_xxx functions and their generic implementation in asm-generic/hugetlb.h, which will now be included on all architectures supporting hugetlbfs apart from s390. This change will be a no-op for those architectures. [akpm@linux-foundation.org: fix warning] Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts] Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:33 -07:00
Jiang Liu	5e7ccf8635	mm/x86: use free_highmem_page() to free highmem pages into buddy system Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cong Wang <amwang@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Attilio Rao <attilio.rao@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:32 -07:00
Jiang Liu	bced0e32f6	mm/x86: use common help functions to free reserved pages Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:31 -07:00
Rob Landley	0c0de199ce	mkcapflags.pl: convert to mkcapflags.sh Generate asm-x86/cpufeature.h with posix-2008 commands instead of perl. Signed-off-by: Rob Landley <rob@landley.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Boyer <jwboyer@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowell@redhat.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:27 -07:00
David Howells	2f96b8c1d5	proc: Split kcore bits from linux/procfs.h into linux/kcore.h Split kcore bits from linux/procfs.h into linux/kcore.h. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> cc: linux-mips@linux-mips.org cc: sparclinux@vger.kernel.org cc: x86@kernel.org cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:42:02 -04:00
David Howells	0d01ff2583	Include missing linux/slab.h inclusions Include missing linux/slab.h inclusions where the source file is currently expecting to get kmalloc() and co. through linux/proc_fs.h. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-s390@vger.kernel.org cc: sparclinux@vger.kernel.org cc: linux-efi@vger.kernel.org cc: linux-mtd@lists.infradead.org cc: devel@driverdev.osuosl.org cc: x86@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:42:01 -04:00
Al Viro	3dc20cb282	new helper: read_code() switch binfmts that use ->read() to that (and to kernel_read() in several cases in binfmt_flat - sure, it's nommu, but still, doing ->read() into kmalloc'ed buffer...) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:40:23 -04:00
Linus Torvalds	96a3e8af5a	PCI changes for the v3.10 merge window: PCI device hotplug - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe) - Make acpiphp builtin only, not modular (Jiang Liu) - Add acpiphp mutual exclusion (Jiang Liu) Power management - Skip "PME enabled/disabled" messages when not supported (Rafael Wysocki) - Fix fallback to PCI_D0 (Rafael Wysocki) Miscellaneous - Factor quirk_io_region (Yinghai Lu) - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas) - Clean up EISA resource initialization and logging (Bjorn Helgaas) - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas) - MIPS: Initialize of_node before scanning bus (Gabor Juhos) - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor Juhos) - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang) - Fix aer_inject return values (Prarit Bhargava) - Remove PME/ACPI dependency (Andrew Murray) - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRfhSWAAoJEFmIoMA60/r8GrYQAIHDsyZIuJSf6g+8Td1h+PIC YD3wQhbyrDqQDuKU4+9cz+JsbHmnozUGA4UmlwmOGBxEa/Uauspb6yX1P1+x9Ok1 WD7Ar3BlA5OuYI/1L1mgCiA428MTujwoR4fPnC0+KFy8xk1tBpmhzzeOFohbKyFF hMBO/Xt9tCzPATJ1LhjIH4xAykfDkbnPNHNcUKRoAkRo0CO0lS8gcTk0shXXSNng p9kQ6c4cYZvlRIJTwlawWV09nr7mDsBYa3JClqXYZufUWfEwvIuhisJxCJ57sWi9 t+Ev8dm7VM6Cr5dV+ORArlboBFrq4f/W5U9j9GPFrRplwf+WbNT6tNGSpSDq8XhU Q7JjNgPWVdWXe1vIsMwaO49zi45/bNehuCSFLZiyPZwedMk764tys+iYw+tMRtv1 tBR7lwESSXfagmvWyQAuQOTy6Rj26BPd2T8e2lMsvsuQO9mCyTK6Ey3YyKuqKQK/ l5Gns4vv4eaCjGXqqDGiydUjSes+r/v1bu43XiRnwPQJUKb5kr5SjN5/zSMBuUgm TLT/bnv8qvdFxCpVQJFv4k/uzULARMdbvLtTy8osB14vNHX9jPn+xORjLaZNiO6O 7fFispMU8Om56hNkD6C451r3icRjjGlD7OA8KOlbZ8f876sLzGV9i6P9gwCoRdEB wclDPsN7kAzw/V2sEE60 =bj8i -----END PGP SIGNATURE----- Merge tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v3.10 merge window: PCI device hotplug - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe) - Make acpiphp builtin only, not modular (Jiang Liu) - Add acpiphp mutual exclusion (Jiang Liu) Power management - Skip "PME enabled/disabled" messages when not supported (Rafael Wysocki) - Fix fallback to PCI_D0 (Rafael Wysocki) Miscellaneous - Factor quirk_io_region (Yinghai Lu) - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas) - Clean up EISA resource initialization and logging (Bjorn Helgaas) - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas) - MIPS: Initialize of_node before scanning bus (Gabor Juhos) - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor Juhos) - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang) - Fix aer_inject return values (Prarit Bhargava) - Remove PME/ACPI dependency (Andrew Murray) - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)" * tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits) vfio-pci: Use cached MSI/MSI-X capabilities vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Remove "extern" from function declarations PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h PCI: Use msix_table_size() directly, drop multi_msix_capable() PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros PCI: Drop is_64bit_address() and is_mask_bit_support() macros PCI: Drop msi_data_reg() macro PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc PCI: Clean up MSI/MSI-X capability #defines PCI: Use cached MSI-X cap while enabling MSI-X PCI: Use cached MSI cap while enabling MSI interrupts PCI: Remove MSI/MSI-X cap check in pci_msi_check_device() PCI: Cache MSI/MSI-X capability offsets in struct pci_dev PCI: Use u8, not int, for PM capability offset [SCSI] megaraid_sas: Use correct #define for MSI-X capability PCI: Remove "extern" from function declarations ...	2013-04-29 09:30:25 -07:00
Linus Torvalds	9d2da7af90	Features: - Populate the boot_params with EDD data. - Cleanups in the IRQ code. Bug-fixes: - CPU hotplug offline/online in PVHVM mode. - Re-upload processor PM data after ACPI S3 suspend/resume cycle. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRcWFDAAoJEFjIrFwIi8fJQScIAJmnsPzoqd/USahqgbOvc0Nb fJgzvno0Lfuvi5NipMpMDNCpCxRuikoJe6ocoF6E0+poucckRGDFSC3R2KVgLR2O pXkO7bzggkUYkLehW7gXmM4IlvhLxnsfEofDWxG2VJy6RfWxFk+84v/uimQSIB0D jI9FB3oUhfA+IT8/3Iofyv2OiPH39zNLlzidAnVXmJ8SoJFDLt3l1IymYqVdq6Lt AKL0IXa31f+yfj9Vli1mkBsxuEIvIo5tVQNn250B4cMlR8mcWsQBnxIeA2M+8Jhl d2ereTBwhjohCrFR/TWlXYrMlL+XVucI7J7agf3tF4kmDUqjg+c9hUkszSOPjEc= =yJWr -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen updates from Konrad Rzeszutek Wilk: "Features: - Populate the boot_params with EDD data. - Cleanups in the IRQ code. Bug-fixes: - CPU hotplug offline/online in PVHVM mode. - Re-upload processor PM data after ACPI S3 suspend/resume cycle." And Konrad gets a gold star for sending the pull request early when he thought he'd be away for the first week of the merge window (but because of 3.9 dragging out to -rc8 he then re-sent the reminder on the first day of the merge window anyway) * tag 'stable/for-linus-3.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: resolve section mismatch warnings in xen-acpi-processor xen: Re-upload processor PM data to hypervisor after S3 resume (v2) xen/smp: Unifiy some of the PVs and PVHVM offline CPU path xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one. xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM xen/spinlock: Check against default value of -1 for IRQ line. xen/time: Add default value of -1 for IRQ and check for that. xen/events: Check that IRQ value passed in is valid. xen/time: Fix kasprintf splat when allocating timer%d IRQ line. xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline xen/smp: Fix leakage of timer interrupt line for every CPU online/offline. xen kconfig: fix select INPUT_XEN_KBDDEV_FRONTEND xen: drop tracking of IRQ vector x86/xen: populate boot_params with EDD data	2013-04-29 08:16:51 -07:00
Jan Kiszka	5a2892ce72	KVM: nVMX: Skip PF interception check when queuing during nested run While a nested run is pending, vmx_queue_exception is only called to requeue exceptions that were previously picked up via vmx_cancel_injection. Therefore, we must not check for PF interception by L1, possibly causing a bogus nested vmexit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-28 13:34:39 +03:00
Chegu Vinod	cbf6435858	KVM: x86: Increase the "hard" max VCPU limit KVM guests today use 8bit APIC ids allowing for 256 ID's. Reserving one ID for Broadcast interrupts should leave 255 ID's. In case of KVM there is no need for reserving another ID for IO-APIC so the hard max limit for VCPUS can be increased from 254 to 255. (This was confirmed by Gleb Natapov http://article.gmane.org/gmane.comp.emulators.kvm.devel/99713 ) Signed-off-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-28 13:17:48 +03:00
Alex Williamson	2a5bab1004	kvm: Allow build-time configuration of KVM device assignment We hope to at some point deprecate KVM legacy device assignment in favor of VFIO-based assignment. Towards that end, allow legacy device assignment to be deconfigured. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-28 12:58:56 +03:00
Gleb Natapov	064d1afaa5	Merge git://github.com/agraf/linux-2.6.git kvm-ppc-next into queue	2013-04-28 12:50:07 +03:00
Jan Kiszka	730dca42c1	KVM: x86: Rework request for immediate exit The VMX implementation of enable_irq_window raised KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This caused infinite loops on vmentry. Fix it by letting enable_irq_window signal the need for an immediate exit via its return value and drop KVM_REQ_IMMEDIATE_EXIT. This issue only affects nested VMX scenarios. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-28 12:44:18 +03:00
Borislav Petkov	6614c7d042	kvm, svm: Fix typo in printk message It is "exit_int_info". It is actually EXITINTINFO in the official docs but we don't like screaming docs. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-28 12:42:41 +03:00
Jan Kiszka	cb0c8cda13	KVM: VMX: remove unprintable characters from comment Slipped in while copy&pasting from the SDM. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-28 08:55:59 +03:00
Rafael J. Wysocki	885f925eef	Merge branch 'pm-cpufreq' * pm-cpufreq: (57 commits) cpufreq: MAINTAINERS: Add co-maintainer cpufreq: pxa2xx: initialize variables ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y cpufreq: cpu0: Put cpu parent node after using it cpufreq: ARM big LITTLE: Adapt to latest cpufreq updates cpufreq: ARM big LITTLE: put DT nodes after using them cpufreq: Don't call __cpufreq_governor() for drivers without target() cpufreq: exynos5440: Protect OPP search calls with RCU lock cpufreq: dbx500: Round to closest available freq cpufreq: Call __cpufreq_governor() with correct policy->cpus mask cpufreq / intel_pstate: Optimize intel_pstate_set_policy cpufreq: OMAP: instantiate omap-cpufreq as a platform_driver arm: exynos: Enable OPP library support for exynos5440 cpufreq: exynos: Remove error return even if no soc is found cpufreq: exynos: Add cpufreq driver for exynos5440 cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor cpufreq: ondemand: allow custom powersave_bias_target handler to be registered cpufreq: convert cpufreq_driver to using RCU cpufreq: powerpc/platforms/cell: move cpufreq driver to drivers/cpufreq cpufreq: sparc: move cpufreq driver to drivers/cpufreq ... Conflicts: MAINTAINERS (with commit `a8e39c3` from pm-cpuidle) drivers/cpufreq/cpufreq_governor.h (with commit `beb0ff3`)	2013-04-28 02:10:46 +02:00
Rafael J. Wysocki	e4f5a3adc4	Merge branch 'pm-cpuidle' * pm-cpuidle: (51 commits) cpuidle: add maintainer entry ARM: s3c64xx: cpuidle: use init/exit common routine SH: cpuidle: use init/exit common routine cpuidle: fix comment format ARM: imx: cpuidle: use init/exit common routine ARM: davinci: cpuidle: use init/exit common routine ARM: kirkwood: cpuidle: use init/exit common routine ARM: calxeda: cpuidle: use init/exit common routine ARM: tegra: cpuidle: use init/exit common routine for tegra3 ARM: tegra: cpuidle: use init/exit common routine for tegra2 ARM: OMAP4: cpuidle: use init/exit common routine ARM: shmobile: cpuidle: use init/exit common routine ARM: tegra: cpuidle: use init/exit common routine ARM: OMAP3: cpuidle: use init/exit common routine ARM: at91: cpuidle: use init/exit common routine ARM: ux500: cpuidle: use init/exit common routine cpuidle: make a single register function for all ARM: ux500: cpuidle: replace for_each_online_cpu by for_each_possible_cpu cpuidle: remove en_core_tk_irqen flag ARM: OMAP3: remove cpuidle_wrap_enter ...	2013-04-28 01:54:49 +02:00
Alexander Graf	7df35f5496	KVM: Move irqfd resample cap handling to generic code Now that we have most irqfd code completely platform agnostic, let's move irqfd's resample capability return to generic code as well. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2013-04-26 20:27:19 +02:00
Alexander Graf	1c9f8520bd	KVM: Extract generic irqchip logic into irqchip.c The current irq_comm.c file contains pieces of code that are generic across different irqchip implementations, as well as code that is fully IOAPIC specific. Split the generic bits out into irqchip.c. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2013-04-26 20:27:17 +02:00
Alexander Graf	a725d56a02	KVM: Introduce CONFIG_HAVE_KVM_IRQ_ROUTING Quite a bit of code in KVM has been conditionalized on availability of IOAPIC emulation. However, most of it is generically applicable to platforms that don't have an IOPIC, but a different type of irq chip. Make code that only relies on IRQ routing, not an APIC itself, on CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2013-04-26 20:27:14 +02:00
Alexander Graf	8175e5b79c	KVM: Add KVM_IRQCHIP_NUM_PINS in addition to KVM_IOAPIC_NUM_PINS The concept of routing interrupt lines to an irqchip is nothing that is IOAPIC specific. Every irqchip has a maximum number of pins that can be linked to irq lines. So let's add a new define that allows us to reuse generic code for non-IOAPIC platforms. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2013-04-26 20:27:13 +02:00
Ingo Molnar	5ac2b5c272	perf/x86/intel/P4: Robistify P4 PMU types Linus found, while extending integer type extension checks in the sparse static code checker, various fragile patterns of mixed signed/unsigned 64-bit/32-bit integer use in perf_events_p4.c. The relevant hardware register ABI is 64 bit wide on 32-bit kernels as well, so clean it all up a bit, remove unnecessary casts, and make sure we use 64-bit unsigned integers in these places. [ Unfortunately this patch was not tested on real P4 hardware, those are pretty rare already. If this patch causes any problems on P4 hardware then please holler ... ] Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Miller <davem@davemloft.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130424072630.GB1780@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-26 09:31:41 +02:00
H. Peter Anvin	697dfd8844	* The EFI variable anti-bricking algorithm merged in -rc8 broke booting on some Apple machines because they implement EFI spec 1.10, which doesn't provide a QueryVariableInfo() runtime function and the logic used to check for the existence of that function was insufficient. Fix from Josh Boyer. * The anti-bricking algorithm also introduced a compiler warning on 32-bit. Fix from Borislav Petkov. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJReOtLAAoJEC84WcCNIz1VFZgP/Aws1NdPo/RdyI6/oGkI7ZV4 +5O79pLcaJt7ESuWjx2/9pto/qTzsWMri40HZivGbgxw+ViEdprGjJUFqSTn1LyJ QrYamP40jBdLFfh1oDHvsub8HiC72sjB/ILSoDvooHEniDmajrL6zZK7C66gP+na Q4ZN/Jp3x3XAW0s1mVJC4VnL60489Q/ndR3SH01hr2gqMSvmjwnhfiio6n9gYvdd egmoalTIst94+X0nW1VHA4HT3SRM7cuwCA/kDxtG6qitbsQMUKUoa+DOpMNfE8mD QdzmzZL115O+7ORj8Ki/JNS2CSyI83IRSQ3kcM1J5026mWIBMiM3h9Vlu5NwAyFA bapZSaYr7S5u9BU/vICGnpyYnSsLfjuB3CnAuJFyM0YVFjR6n7moUpnP1LNifGHX E/Qr1HDyIwwxE8K0f/n86a7BfstoMjzE74an6wOVXKDUY/RnH+FdWG/HDBPd8iG4 Avei1bK2zLLcXK4Kqmx8EkXTK7VSFx6StCPjAVlpgYOAMpRmQEmNpd/3lF7Y70gp yXIBTSTKaPZ+/5SaeOPL2sgW37Uo9fFMphww2mLXGIdgO3L0BHD5hIq9pZQ7g0VK noDN7f6ViCuNYuZIrTAtLo9Oc+KKgqOXa0TovUhORkJ8Gk93moL4fgYyFVPvsYnD rQuTRJ3pZEEHlCmyZzBl =l/fT -----END PGP SIGNATURE----- Merge tag 'efi-urgent' into x86/urgent * The EFI variable anti-bricking algorithm merged in -rc8 broke booting on some Apple machines because they implement EFI spec 1.10, which doesn't provide a QueryVariableInfo() runtime function and the logic used to check for the existence of that function was insufficient. Fix from Josh Boyer. * The anti-bricking algorithm also introduced a compiler warning on 32-bit. Fix from Borislav Petkov. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-25 14:00:22 -07:00
Jussi Kivilinna	f3f935a76a	crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher Patch adds AVX2/AES-NI/x86-64 implementation of Camellia cipher, requiring 32 parallel blocks for input (512 bytes). Compared to AVX implementation, this version is extended to use the 256-bit wide YMM registers. For AES-NI instructions data is split to two 128-bit registers and merged afterwards. Even with this additional handling, performance should be higher compared to the AES-NI/AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:07 +08:00
Jussi Kivilinna	56d76c96a9	crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipher Patch adds AVX2/x86-64 implementation of Serpent cipher, requiring 16 parallel blocks for input (256 bytes). Implementation is based on the AVX implementation and extends to use the 256-bit wide YMM registers. Since serpent does not use table look-ups, this implementation should be close to two times faster than the AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:07 +08:00
Jussi Kivilinna	cf1521a1a5	crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher Patch adds AVX2/x86-64 implementation of Twofish cipher, requiring 16 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Implementation also uses 256-bit wide YMM registers, which should give additional speed up compared to the AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:05 +08:00
Jussi Kivilinna	6048801070	crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:04 +08:00
Jussi Kivilinna	c456a9cd1a	crypto: aesni_intel - add more optimized XTS mode for x86-64 Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack usage and boost for speed. tcrypt results, with Intel i5-2450M: 256-bit key enc dec 16B 0.98x 0.99x 64B 0.64x 0.63x 256B 1.29x 1.32x 1024B 1.54x 1.58x 8192B 1.57x 1.60x 512-bit key enc dec 16B 0.98x 0.99x 64B 0.60x 0.59x 256B 1.24x 1.25x 1024B 1.39x 1.42x 8192B 1.38x 1.42x I chose not to optimize smaller than block size of 256 bytes, since XTS is practically always used with data blocks of size 512 bytes. This is why performance is reduced in tcrypt for 64 byte long blocks. Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:53 +08:00
Jussi Kivilinna	b5c5b072dc	crypto: x86/camellia-aesni-avx - add more optimized XTS code Add more optimized XTS code for camellia-aesni-avx, for smaller stack usage and small boost for speed. tcrypt results, with Intel i5-2450M: enc dec 16B 1.10x 1.01x 64B 0.82x 0.77x 256B 1.14x 1.10x 1024B 1.17x 1.16x 8192B 1.10x 1.11x Since XTS is practically always used with data blocks of size 512 bytes or more, I chose to not make use of camellia-2way for block sized smaller than 256 bytes. This causes slower result in tcrypt for 64 bytes. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:52 +08:00
Jussi Kivilinna	70177286e1	crypto: cast6-avx: use new optimized XTS code Change cast6-avx to use the new XTS code, for smaller stack usage and small boost to performance. tcrypt results, with Intel i5-2450M: enc dec 16B 1.01x 1.01x 64B 1.01x 1.00x 256B 1.09x 1.02x 1024B 1.08x 1.06x 8192B 1.08x 1.07x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:52 +08:00
Jussi Kivilinna	18be45270a	crypto: x86/twofish-avx - use optimized XTS code Change twofish-avx to use the new XTS code, for smaller stack usage and small boost to performance. tcrypt results, with Intel i5-2450M: enc dec 16B 1.03x 1.02x 64B 0.91x 0.91x 256B 1.10x 1.09x 1024B 1.12x 1.11x 8192B 1.12x 1.11x Since XTS is practically always used with data blocks of size 512 bytes or more, I chose to not make use of twofish-3way for block sized smaller than 128 bytes. This causes slower result in tcrypt for 64 bytes. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:51 +08:00
Jussi Kivilinna	a05248ed2d	crypto: x86 - add more optimized XTS-mode for serpent-avx This patch adds AVX optimized XTS-mode helper functions/macros and converts serpent-avx to use the new facilities. Benefits are slightly improved speed and reduced stack usage as use of temporary IV-array is avoided. tcrypt results, with Intel i5-2450M: enc dec 16B 1.00x 1.00x 64B 1.00x 1.00x 256B 1.04x 1.06x 1024B 1.09x 1.09x 8192B 1.10x 1.09x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:51 +08:00
Sandy Wu	57ae1b0532	crypto: crc32-pclmul - Use gas macro for pclmulqdq Occurs when CONFIG_CRYPTO_CRC32C_INTEL=y and CONFIG_CRYPTO_CRC32C_INTEL=y. Older versions of bintuils do not support the pclmulqdq instruction. The PCLMULQDQ gas macro is used instead. Signed-off-by: Sandy Wu <sandyw@twitter.com> Cc: stable@vger.kernel.org # 3.8+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:44 +08:00
Tim Chen	87de4579f9	crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:42 +08:00
Tim Chen	5663535b69	crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX2 RORX instruction. Provides SHA512 x86_64 assembly routine optimized with SSE, AVX and AVX2's RORX instructions. Speedup of 70% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:00:58 +08:00
Tim Chen	e01d69cb01	crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions. Provides SHA512 x86_64 assembly routine optimized with SSE and AVX instructions. Speedup of 60% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:00:58 +08:00
Tim Chen	bf215cee23	crypto: sha512 - Optimized SHA512 x86_64 assembly routine using Supplemental SSE3 instructions. Provides SHA512 x86_64 assembly routine optimized with SSSE3 instructions. Speedup of 40% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:00:58 +08:00
Tim Chen	8275d1aa64	crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:00:57 +08:00
Jean Delvare	06d219dc22	x86/setup: Drop unneeded include <asm/dmi.h> arch/x86/kernel/setup.c includes <asm/dmi.h> but it doesn't look like it needs it, <linux/dmi.h> is sufficient. Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/1366881845.4186.65.camel@chaos.site Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com>	2013-04-25 11:32:51 +02:00
Li Zhong	7f5281ae8a	x86 cmpxchg.h: fix wrong comment Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-04-25 10:39:04 +02:00
Gleb Natapov	660696d1d1	KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions Source operand for one byte mov[zs]x is decoded incorrectly if it is in high byte register. Fix that. Cc: stable@vger.kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-25 10:03:50 +03:00
Thomas Gleixner	6402c7dc2a	Merge branch 'linus' into timers/core Reason: Get upstream fixes before adding conflicting code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-04-24 20:33:54 +02:00
Josh Boyer	f697036b93	efi: Check EFI revision in setup_efi_vars We need to check the runtime sys_table for the EFI version the firmware specifies instead of just checking for a NULL QueryVariableInfo. Older implementations of EFI don't have QueryVariableInfo but the runtime is a smaller structure, so the pointer to it may be pointing off into garbage. This is apparently the case with several Apple firmwares that support EFI 1.10, and the current check causes them to no longer boot. Fix based on a suggestion from Matthew Garrett. Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-04-24 16:19:01 +01:00
Borislav Petkov	51f8fbba64	x86, efi: Fix a build warning Fix this: arch/x86/boot/compressed/eboot.c: In function ‘setup_efi_vars’: arch/x86/boot/compressed/eboot.c:269:2: warning: passing argument 1 of ‘efi_call_phys’ makes pointer from integer without a cast [enabled by default] In file included from arch/x86/boot/compressed/eboot.c:12:0: /w/kernel/linux/arch/x86/include/asm/efi.h:8:33: note: expected ‘void *’ but argument is of type ‘long unsigned int’ after `cc5a080c5d` ("efi: Pass boot services variable info to runtime code"). Reported-by: Paul Bolle <pebolle@tiscali.nl> Cc: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-04-24 11:57:15 +01:00
Wei Yongjun	3482e664dc	x86/olpc/xo1/sci: Don't call input_free_device() after input_unregister_device() input_free_device() should only be used if input_register_device() was not called yet or if it failed. Once device was unregistered use input_unregister_device() and memory will be freed once last reference to the device is dropped. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: dsd@laptop.org Cc: pgf@laptop.org Cc: gregkh@linuxfoundation.org Link: http://lkml.kernel.org/r/CAPgLHd84cboeucog%2BYNdHvGqTfTROujDKZgSkh3o0B-Q93ee2A@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-24 08:44:47 +02:00
Daniel Lezcano	554c06ba3e	cpuidle: remove en_core_tk_irqen flag The en_core_tk_irqen flag is set in all the cpuidle driver which means it is not necessary to specify this flag. Remove the flag and the code related to it. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org> # for mach-omap2/* Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-23 13:45:22 +02:00
David S. Miller	6e0895c2ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be_main.c drivers/net/ethernet/intel/igb/igb_main.c drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c include/net/scm.h net/batman-adv/routing.c net/ipv4/tcp_input.c The e{uid,gid} --> {uid,gid} credentials fix conflicted with the cleanup in net-next to now pass cred structs around. The be2net driver had a bug fix in 'net' that overlapped with the VLAN interface changes by Patrick McHardy in net-next. An IGB conflict existed because in 'net' the build_skb() support was reverted, and in 'net-next' there was a comment style fix within that code. Several batman-adv conflicts were resolved by making sure that all calls to batadv_is_my_mac() are changed to have a new bat_priv first argument. Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO rewrite in 'net-next', mostly overlapping changes. Thanks to Stephen Rothwell and Antonio Quartulli for help with several of these merge resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-22 20:32:51 -04:00
Jan Kiszka	d1fa0352a1	KVM: nVMX: VM_ENTRY/EXIT_LOAD_IA32_EFER overrides EFER.LMA settings If we load the complete EFER MSR on entry or exit, EFER.LMA (and LME) loading is skipped. Their consistency is already checked now before starting the transition. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 12:53:52 +03:00
Jan Kiszka	384bb78327	KVM: nVMX: Validate EFER values for VM_ENTRY/EXIT_LOAD_IA32_EFER As we may emulate the loading of EFER on VM-entry and VM-exit, implement the checks that VMX performs on the guest and host values on vmlaunch/ vmresume. Factor out kvm_valid_efer for this purpose which checks for set reserved bits. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 12:53:42 +03:00
Jacob Shin	94f4db3590	perf/x86/amd: Fix AMD NB and L2I "uncore" support Borislav Petkov reported a lockdep splat warning about kzalloc() done in an IPI (hardirq) handler. This is a real bug, do not call kzalloc() in a smp_call_function_single() handler because it can schedule and crash. Reported-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: <eranian@google.com> Cc: <a.p.zijlstra@chello.nl> Cc: <acme@ghostprotocols.net> Cc: <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20130421180627.GA21049@jshin-Toonie Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-22 10:10:55 +02:00
Jan Kiszka	ea8ceb8354	KVM: nVMX: Fix conditions for NMI injection The logic for checking if interrupts can be injected has to be applied also on NMIs. The difference is that if NMI interception is on these events are consumed and blocked by the VM exit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 11:10:49 +03:00
Jan Kiszka	2505dc9fad	KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask vmx_set_nmi_mask will soon be used by vmx_nmi_allowed. No functional changes. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 11:10:49 +03:00
Andrew Honig	27469d29b3	KVM: x86: Fix memory leak in vmx.c If userspace creates and destroys multiple VMs within the same process we leak 20k of memory in the userspace process context per VM. This patch frees the memory in kvm_arch_destroy_vm. If the process exits without closing the VM file descriptor or the file descriptor has been shared with another process then we don't free the memory. It's still possible for a user space process to leak memory if the last process to close the fd for the VM is not the process that created it. However, this is an unexpected case that's only caused by a user space process that's misbehaving. Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 11:02:38 +03:00
Wei Yongjun	f179735921	KVM: x86: fix error return code in kvm_arch_vcpu_init() Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:56:44 +03:00
Abel Gordon	8a1b9dd000	KVM: nVMX: Enable and disable shadow vmcs functionality Once L1 loads VMCS12 we enable shadow-vmcs capability and copy all the VMCS12 shadowed fields to the shadow vmcs. When we release the VMCS12, we also disable shadow-vmcs capability. Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:52:55 +03:00
Abel Gordon	012f83cb2f	KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs Synchronize between the VMCS12 software controlled structure and the processor-specific shadow vmcs Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:52:45 +03:00
Abel Gordon	c3114420d1	KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Introduce a function used to copy fields from the software controlled VMCS12 to the processor-specific shadow vmcs Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:52:37 +03:00
Abel Gordon	16f5b9034b	KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Introduce a function used to copy fields from the processor-specific shadow vmcs to the software controlled VMCS12 Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:52:24 +03:00
Abel Gordon	e7953d7fab	KVM: nVMX: Release shadow vmcs Unmap vmcs12 and release the corresponding shadow vmcs Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:52:17 +03:00
Abel Gordon	8de4883370	KVM: nVMX: Allocate shadow vmcs Allocate a shadow vmcs used by the processor to shadow part of the fields stored in the software defined VMCS12 (let L1 access fields without causing exits). Note we keep a shadow vmcs only for the current vmcs12. Once a vmcs12 becomes non-current, its shadow vmcs is released. Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:52:10 +03:00
Abel Gordon	145c28dd19	KVM: nVMX: Fix VMXON emulation handle_vmon doesn't check if L1 is already in root mode (VMXON was previously called). This patch adds this missing check and calls nested_vmx_failValid if VMX is already ON. We need this check because L0 will allocate the shadow vmcs when L1 executes VMXON and we want to avoid host leaks (due to shadow vmcs allocation) if L1 executes VMXON repeatedly. Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:52:01 +03:00
Abel Gordon	20b97feaf6	KVM: nVMX: Refactor handle_vmwrite Refactor existent code so we re-use vmcs12_write_any to copy fields from the shadow vmcs specified by the link pointer (used by the processor, implementation-specific) to the VMCS12 software format used by L0 to hold the fields in L1 memory address space. Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:51:44 +03:00
Abel Gordon	4607c2d7a2	KVM: nVMX: Introduce vmread and vmwrite bitmaps Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields. These lists are intended to specifiy most frequent accessed fields so we can minimize the number of fields that are copied from/to the software controlled VMCS12 format to/from to processor-specific shadow vmcs. The lists were built measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were additional fields which were frequently modified but they were not added to these lists because after boot these fields were not longer accessed by L1. Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:51:34 +03:00
Abel Gordon	abc4fc58c5	KVM: nVMX: Detect shadow-vmcs capability Add logic required to detect if shadow-vmcs is supported by the processor. Introduce a new kernel module parameter to specify if L0 should use shadow vmcs (or not) to run L1. Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:51:21 +03:00
Abel Gordon	89662e566c	KVM: nVMX: Shadow-vmcs control fields/bits Add definitions for all the vmcs control fields/bits required to enable vmcs-shadowing Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-04-22 10:51:09 +03:00
Rusty Russell	6b39271746	lguest: map Switcher below fixmap. Now we've adjusted all the code, we can simply set switcher_addr to wherever it needs to go below the fixmaps, rather than asserting that it should be so. With large NR_CPUS and PAE, people were hitting the "mapping switcher would thwack fixmap" message. Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-04-22 15:45:03 +09:30
Rusty Russell	93a2cdff98	lguest: assume Switcher text is a single page. ie. SHARED_SWITCHER_PAGES == 1. It is well under a page, and it's a minor simplification: it's nice to have one simplification in a patch series! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-04-22 15:31:36 +09:30
Rusty Russell	406a590ba1	lguest: prepare to make SWITCHER_ADDR a variable. We currently use the whole top PGD entry for the switcher, but that's hitting the fixmap in some configurations (mainly, large NR_CPUS). Introduce a variable, currently set to the constant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-04-22 15:31:33 +09:30
Linus Torvalds	3125929454	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix offcore_rsp valid mask for SNB/IVB perf: Treat attr.config as u64 in perf_swevent_init()	2013-04-21 10:25:42 -07:00
Jacob Shin	0cf5f4323b	perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c Support for NB counters, MSRs 0xc0010240 ~ 0xc0010247, got moved to perf_event_amd_uncore.c in the following commit: `c43ca5091a` perf/x86/amd: Add support for AMD NB and L2I "uncore" counters AMD Family 10h NB events (events 0xe0 ~ 0xff, on MSRs 0xc001000 ~ 0xc001007) will still continue to be handled by perf_event_amd.c Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jacob Shin <jacob.shin@amd.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/1366046483-1765-2-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-21 17:21:59 +02:00
George Dunlap	a5ebe0ba3d	perf/x86: Check all MSRs before passing hw check check_hw_exists() has a number of checks which go to two exit paths: msr_fail and bios_fail. Checks classified as msr_fail will cause check_hw_exists() to return false, causing the PMU not to be used; bios_fail checks will only cause a warning to be printed, but will return true. The problem is that if there are both msr failures and bios failures, and the routine hits a bios_fail check first, it will exit early and return true, not finishing the rest of the msr checks. If those msrs are in fact broken, it will cause them to be used erroneously. In the case of a Xen PV VM, the guest OS has read access to all the MSRs, but write access is white-listed to supported features. Writes to unsupported MSRs have no effect. The PMU MSRs are not (typically) supported, because they are expensive to save and restore on a VM context switch. One of the "msr_fail" checks is supposed to detect this circumstance (ether for Xen or KVM) and disable the harware PMU. However, on one of my AMD boxen, there is (apparently) a broken BIOS which triggers one of the bios_fail checks. In particular, MSR_K7_EVNTSEL0 has the ARCH_PERFMON_EVENTSEL_ENABLE bit set. The guest kernel detects this because it has read access to all MSRs, and causes it to skip the rest of the checks and try to use the non-existent hardware PMU. This minimally causes a lot of useless instruction emulation and Xen console spam; it may cause other issues with the watchdog as well. This changset causes check_hw_exists() to go through all of the msr checks, failing and returning false if any of them fail. This makes sure that a guest running under Xen without a virtual PMU will detect that there is no functioning PMU and not attempt to use it. This problem affects kernels as far back as 3.2, and should thus be considered for backport. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Link: http://lkml.kernel.org/r/1365000388-32448-1-git-send-email-george.dunlap@eu.citrix.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-21 11:16:29 +02:00
Jacob Shin	c43ca5091a	perf/x86/amd: Add support for AMD NB and L2I "uncore" counters Add support for AMD Family 15h [and above] northbridge performance counters. MSRs 0xc0010240 ~ 0xc0010247 are shared across all cores that share a common northbridge. Add support for AMD Family 16h L2 performance counters. MSRs 0xc0010230 ~ 0xc0010237 are shared across all cores that share a common L2 cache. We do not enable counter overflow interrupts. Sampling mode and per-thread events are not supported. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Stephane Eranian <eranian@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130419213428.GA8229@jshin-Toonie Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-21 11:01:24 +02:00
Yan, Zheng	e850f9c33c	perf/x86/intel: Add Ivy Bridge-EP uncore support The uncore subsystem in Ivy Bridge-EP is similar to Sandy Bridge-EP. There are some differences in config register encoding and pci device IDs. The Ivy Bridge-EP uncore also supports a few new events. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: peterz@infradead.org Cc: eranian@google.com Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/1366113067-3262-4-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-21 11:01:24 +02:00
Yan, Zheng	46bdd90598	perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management The existing code assumes all Cbox and PCU events are using filter, but actually the filter is event specific. Furthermore the filter is sub-divided into multiple fields which are used by different events. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: peterz@infradead.org Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/1366113067-3262-3-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Reported-by: Stephane Eranian <eranian@google.com>	2013-04-21 11:01:23 +02:00
Yan, Zheng	22cc4ccf63	perf/x86: Avoid kfree() in CPU_{STARTING,DYING} On -rt kfree() can schedule, but CPU_{STARTING,DYING} should be atomic. So use a list to defer kfree until CPU_{ONLINE,DEAD}. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: peterz@infradead.org Cc: eranian@google.com Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/1366113067-3262-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-21 10:59:01 +02:00
Ingo Molnar	73e21ce28d	Merge branch 'perf/urgent' into perf/core Conflicts: arch/x86/kernel/cpu/perf_event_intel.c Merge in the latest fixes before applying new patches, resolve the conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-21 10:57:33 +02:00
Linus Torvalds	830ac8524f	Merge branch 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kdump fixes from Peter Anvin: "The kexec/kdump people have found several problems with the support for loading over 4 GiB that was introduced in this merge cycle. This is partly due to a number of design problems inherent in the way the various pieces of kdump fit together (it is pretty horrifically manual in many places.) After a lot of iterations this is the patchset that was agreed upon, but of course it is now very late in the cycle. However, because it changes both the syntax and semantics of the crashkernel option, it would be desirable to avoid a stable release with the broken interfaces." I'm not happy with the timing, since originally the plan was to release the final 3.9 tomorrow. But apparently I'm doing an -rc8 instead... * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kexec: use Crash kernel for Crash kernel low x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low x86, kdump: Retore crashkernel= to allocate under 896M x86, kdump: Set crashkernel_low automatically	2013-04-20 18:40:36 -07:00
Linus Torvalds	db93f8b420	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Three groups of fixes: 1. Make sure we don't execute the early microcode patching if family < 6, since it would touch MSRs which don't exist on those families, causing crashes. 2. The Xen partial emulation of HyperV can be dealt with more gracefully than just disabling the driver. 3. More EFI variable space magic. In particular, variables hidden from runtime code need to be taken into account too." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Verify the family before dispatching microcode patching x86, hyperv: Handle Xen emulation of Hyper-V more gracefully x86,efi: Implement efi_no_storage_paranoia parameter efi: Export efi_query_variable_store() for efivars.ko x86/Kconfig: Make EFI select UCS2_STRING efi: Distinguish between "remaining space" and actually used space efi: Pass boot services variable info to runtime code Move utf16 functions to kernel core and rename x86,efi: Check max_size only if it is non-zero. x86, efivars: firmware bug workarounds should be in platform code	2013-04-20 18:38:48 -07:00
H. Peter Anvin	f53f292eea	Merge remote-tracking branch 'efi/chainsaw' into x86/efi Resolved Conflicts: drivers/firmware/efivars.c fs/efivarsfs/file.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-20 09:16:44 -07:00
H. Peter Anvin	c0a9f451e4	Merge remote-tracking branch 'efi/urgent' into x86/urgent Matt Fleming (1): x86, efivars: firmware bug workarounds should be in platform code Matthew Garrett (3): Move utf16 functions to kernel core and rename efi: Pass boot services variable info to runtime code efi: Distinguish between "remaining space" and actually used space Richard Weinberger (2): x86,efi: Check max_size only if it is non-zero. x86,efi: Implement efi_no_storage_paranoia parameter Sergey Vlasov (2): x86/Kconfig: Make EFI select UCS2_STRING efi: Export efi_query_variable_store() for efivars.ko Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-19 17:09:03 -07:00
H. Peter Anvin	74c3e3fcf3	x86, microcode: Verify the family before dispatching microcode patching For each CPU vendor that implements CPU microcode patching, there will be a minimum family for which this is implemented. Verify this minimum level of support. This can be done in the dispatch function or early in the application functions. Doing the latter turned out to be somewhat awkward because of the ineviable split between the BSP and the AP paths, and rather than pushing deep into the application functions, do this in the dispatch function. Reported-by: "Bryan O'Donoghue" <bryan.odonoghue.lkml@nexus-software.ie> Suggested-by: Borislav Petkov <bp@alien8.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1366392183-4149-1-git-send-email-bryan.odonoghue.lkml@nexus-software.ie	2013-04-19 16:36:03 -07:00
Joerg Roedel	35d3d814cb	iommu: Fix compile warnings with forward declarations The irq_remapping.h file for x86 does not include all necessary forward declarations for the data structures used. This causes compile warnings, so fix it. Signed-off-by: Joerg Roedel <joro@8bytes.org>	2013-04-19 20:34:55 +02:00
Ingo Molnar	5379f8c0d7	Add required support for AMD F16h to amd64_edac. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRcSI0AAoJEBLB8Bhh3lVKWjIP/3BdMQljadBzIm5xSU5Vt4lW 1R22XrfNAwqdbjuwHGV7TPMJrjnV8zGQBsghALdqCT7xtRnl71ce7XVGbd1MblGc llTcOBlCp1v5xHbNaL1xmunvI7JuMushX69YRD/KOFoEaXo2iQDCjSpA+73K3d/g OHjFQGrZkuuNkdTEWiOo6cplDMwXKNzLlS1ccN7XzzKPQ3aA0Q/XwlUftjHdBfGP WCxahtsz60xCxJt1x4hy5/ocZT2WfLp/shSS/udsGcTl5g75Yjhz8RkTuOmNySzo KDwsSQaT7HBz+b3SirG1IHEzgsF2DKZqDziDlBt8qx2xcHnkp7DfF/rZajCOzoLy k6tdFI2FEj3JZqyCDVxFAP/xmct39S/arqFlLLAUIAQHbsVkrmIH6ZC9NGiSPq9M kEdOOl/G0CbfnPFGd1hYCQUdRzJc7m5hKT05WFRRexRdsLVoAluO8rPWHtbqlmRS lvYkrvVglnOTlN0jVN6xw9LaTkX/q3+kUFzu4ZPiDQYqVzw+XmKYYh+L9OTgnDqA rKZqljdDgwb5GoiexyKtvuJUt7dJ2KlDZoEZ7INLQvA4Im9EQnYW/RH48G+8qouJ XfSjxUET0fmapYzulQI8Ikhs/dbPt7FItUn/ipX7I9fCxMDJOyoXTqZb2S0FbyYa kdZIbxzNDja0cCCcZfOs =5BNU -----END PGP SIGNATURE----- Merge tag 'edac_amd_f16h' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull AMD F16h support for amd64_edac from Borislav Petkov. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-19 13:03:08 +02:00
Aravind Gopalakrishnan	94c1acf2c8	amd64_edac: Add Family 16h support Add code to handle DRAM ECC errors decoding for Fam16h. Tested on Fam16h with ECC turned on using the mce_amd_inj facility and works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Boris: cleanups and clarifications ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-04-19 12:46:50 +02:00
K. Y. Srinivasan	7eff7ded02	x86, hyperv: Handle Xen emulation of Hyper-V more gracefully Install the Hyper-V specific interrupt handler only when needed. This would permit us to get rid of the Xen check. Note that when the vmbus drivers invokes the call to register its handler, we are sure to be running on Hyper-V. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1366299886-6399-1-git-send-email-kys@microsoft.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-04-18 08:59:20 -07:00
Neil Horman	03bbcb2e7e	iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. For details of those that reported the problem, please see: https://bugzilla.redhat.com/show_bug.cgi?id=887006 [ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Prarit Bhargava <prarit@redhat.com> CC: Don Zickus <dzickus@redhat.com> CC: Don Dutile <ddutile@redhat.com> CC: Bjorn Helgaas <bhelgaas@google.com> CC: Asit Mallick <asit.k.mallick@intel.com> CC: David Woodhouse <dwmw2@infradead.org> CC: linux-pci@vger.kernel.org CC: Joerg Roedel <joro@8bytes.org> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Joerg Roedel <joro@8bytes.org>	2013-04-18 17:00:47 +02:00
Zhang, Yang Z	6ffbbbbab3	KVM: x86: Fix posted interrupt with CONFIG_SMP=n ->send_IPI_mask is not defined on UP. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-04-17 23:11:54 -03:00
Kristen Carlson Accardi	ca58710f3a	tools/power turbostat: display C8, C9, C10 residency Display residency in the new C-states, C8, C9, C10. C8, C9, C10 are present on some: "Fourth Generation Intel(R) Core(TM) Processors", which are based on Intel(R) microarchitecture code name Haswell. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2013-04-17 19:23:26 -04:00

... 3 4 5 6 7 ...

17721 Commits