OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jan Beulich	72da0b07b1	x86: constify PCI raw ops structures As with any other such change, the goal is to prevent inadvertent writes to these structures (assuming DEBUG_RODATA is enabled), and to separate data (possibly frequently) written to from such never getting modified. Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2011-10-14 09:05:28 -07:00
Andi Kleen	30963c0ac7	x86, intel: Use c->microcode for Atom errata check Now that the cpu update level is available the Atom PSE errata check can use it directly without reading the MSR again. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1318466795-7393-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-14 13:16:38 +02:00
Andi Kleen	506ed6b53e	x86, intel: Output microcode revision in /proc/cpuinfo I got a request to make it easier to determine the microcode update level on Intel CPUs. This patch adds a new "microcode" field to /proc/cpuinfo. The microcode level is also outputed on fatal machine checks together with the other CPUID model information. I removed the respective code from the microcode update driver, it just reads the field from cpu_data. Also when the microcode is updated it fills in the new values too. I had to add a memory barrier to native_cpuid to prevent it being optimized away when the result is not used. This turns out to clean up further code which already got this information manually. This is done in followon patches. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1318466795-7393-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-14 13:16:35 +02:00
Linus Torvalds	2ad53110d6	Merge branch 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip * 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86: Default to vsyscall=native for now	2011-10-14 16:54:56 +12:00
Mika Westerberg	153b19a3b9	x86, mrst: use a temporary variable for SFI irq SFI tables reside in RAM and should not be modified once they are written. Current code went to set pentry->irq to zero which causes subsequent reads to fail with invalid SFI table checksum. This will break kexec as the second kernel fails to validate SFI tables. To fix this we use temporary variable for irq number. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-10-14 16:53:27 +12:00
Srivatsa S. Bhat	70989449da	x86, microcode: Don't request microcode from userspace unnecessarily Requesting the microcode from userspace every time when onlining CPUs (during a CPU hotplug operation) is unnecessary. Thus, ensure that once the kernel gets the microcode after booting, it is not freed nor invalidated when a CPU goes offline, so that it can be reused when that CPU comes back online, without requesting userspace for it again. As a result, the CPU hotplug operations become faster as well. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/4E91F908.5010006@linux.vnet.ibm.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-10-13 16:20:35 +02:00
Yinghai Lu	141d55e6cc	x86/irq: Standardize on CONFIG_SPARSE_IRQ=y Sparseirq got introduced in v2.6.28 and Thomas did a huge cleanup around v2.6.38 that eliminated basically all disadvantages of it. So we can remove non-sparseirq support now and simplify our IRQ degrees of freedom a bit. Suggested-and-acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/4E95E21D.6090200@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-13 12:12:12 +02:00
Ingo Molnar	910e94dd0c	Merge branch 'tip/perf/core' of git://github.com/rostedt/linux into perf/core	2011-10-12 17:14:47 +02:00
Yinghai Lu	6f50d45fae	x86, ioapic: Clean up ioapic/apic_id usage While looking at the code, apic_id sometime is referred to index of ioapic, but sometime is used for phys apic id. and some even use apic for real apic id. It is very confusing. So try to limit apic_id or ioapic_id to be real apic id for ioapic, and use ioapic_idx for ioapic index in the array. -v2: Suggested by Ingo, use ioapic_idx consistently, instead of ioapic Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542DC.3090509@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:28 +02:00
Yinghai Lu	cda417dd87	x86, ioapic: Factor out print_IO_APIC() to only print one io apic It is getting too big after the interrupt remaping entries debug print out was added. Original print_IO_APIC() becomes print_IO_APICs(). New print_IO_APIC() will only print one ioapic's registers As a side-effect this clean-up also made checkpatch.pl happier. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542D3.5000008@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:25 +02:00
Yinghai Lu	3a61d7feca	x86, ioapic: Print out irte with right ioapic index While checking irte dump in dmesg, the print out is confusing ioapic index with real io apic id: IOAPIC[0]: Set routing entry (1-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:1) IOAPIC[1]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:1 Avail:0 Vector:31 Dest:00000001 SID:00FF SQ:0 SVT:1) IOAPIC[0]: Set routing entry (1-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:1) IOAPIC[1]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:1 Avail:0 Vector:30 Dest:00000001 SID:00FF SQ:0 SVT:1) The system's first ioapic id is 1. This commit: \| commit `3040db92ee` \| Author: Naga Chumbalkar <nagananda.chumbalkar@hp.com> \| Date: Tue Jul 12 21:17:41 2011 +0000 \| \| x86, ioapic: Print IRTE when IR is enabled Confused apic_id with the ioapic ID - fix it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542C8.8040209@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:21 +02:00
Yinghai Lu	c5b4712c3f	x86, ioapic: Split up setup_ioapic_entry() Ingo pointed out that setup_ioapic_entry() is way too big now. Split the intr-remap code out into setup_ir_ioapic_entry(). Also pass struct io_apic_irq_attr * instead of 5 parameters in those two functions. At last in setup_ir_ioapic_entry() we don't need to panic. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542BB.4070807@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:18 +02:00
Yinghai Lu	e4aff81182	x86, ioapic: Pass struct irq_attr * to setup_ioapic_irq() Do not expand that struct, and just pass pointer to reduce the number of parameters in related functions. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542B1.7050800@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:15 +02:00
Adrian Bunk	2b666859ec	x86: Default to vsyscall=native for now This UML breakage: linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790 linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790 Is caused by commit `3ae36655` ("x86-64: Rework vsyscall emulation and add vsyscall= parameter") - the vsyscall emulation code is not fully cooked yet as UML relies on some rather fragile SIGSEGV semantics. Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to vsyscall=native for now, this patch implements that. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Andrew Lutomirski <luto@mit.edu> Cc: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fi Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-11 08:23:34 +02:00
Masami Hiramatsu	53a019a951	x86: Fix insn decoder for longer instruction Fix x86 insn decoder for hardening against invalid length instructions. This adds length checkings for each byte-read site and if it exceeds MAX_INSN_SIZE, returns immediately. This can happen when decoding user-space binary. Caller can check whether it happened by checking insn.*.got member is set or not. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: acme@redhat.com Cc: ming.m.lin@intel.com Cc: robert.richter@amd.com Cc: ravitillo@lbl.gov Cc: yrl.pp-manager.tt@hitachi.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20111007133155.10933.58577.stgit@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 09:05:51 +02:00
Ingo Molnar	d48b0e1737	x86, nmi, drivers: Fix nmi splitup build bug nmi.c needs an #include <linux/mca.h>: arch/x86/kernel/nmi.c: In function ‘unknown_nmi_error’: arch/x86/kernel/nmi.c:286:6: error: ‘MCA_bus’ undeclared (first use in this function) arch/x86/kernel/nmi.c:286:6: note: each undeclared identifier is reported only once for each function it appears in Another one is the hpwdt driver: drivers/watchdog/hpwdt.c:507:9: error: ‘NMI_DONE’ undeclared (first use in this function) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:21 +02:00
Robert Richter	b716916679	perf, x86: Implement IBS initialization This patch implements IBS feature detection and initialzation. The code is shared between perf and oprofile. If IBS is available on the system for perf, a pmu is setup. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1316597423-25723-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:16 +02:00
Robert Richter	ee5789dbcc	perf, x86: Share IBS macros between perf and oprofile Moving IBS macros from oprofile to <asm/perf_event.h> to make it available to perf. No additional changes. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1316597423-25723-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:11 +02:00
Don Zickus	efc3aac5f3	x86, nmi: Track NMI usage stats Now that the NMI handler are broken into lists, increment the appropriate stats for each list. This allows us to see what is going on when they get printed out in the next patch. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-6-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:06 +02:00
Don Zickus	b227e23399	x86, nmi: Add in logic to handle multiple events and unknown NMIs Previous patches allow the NMI subsystem to process multipe NMI events in one NMI. As previously discussed this can cause issues when an event triggered another NMI but is processed in the current NMI. This causes the next NMI to go unprocessed and become an 'unknown' NMI. To handle this, we first have to flag whether or not the NMI handler handled more than one event or not. If it did, then there exists a chance that the next NMI might be already processed. Once the NMI is flagged as a candidate to be swallowed, we next look for a back-to-back NMI condition. This is determined by looking at the %rip from pt_regs. If it is the same as the previous NMI, it is assumed the cpu did not have a chance to jump back into a non-NMI context and execute code and instead handled another NMI. If both of those conditions are true then we will swallow any unknown NMI. There still exists a chance that we accidentally swallow a real unknown NMI, but for now things seem better. An optimization has also been added to the nmi notifier rountine. Because x86 can latch up to one NMI while currently processing an NMI, we don't have to worry about executing _all_ the handlers in a standalone NMI. The idea is if multiple NMIs come in, the second NMI will represent them. For those back-to-back NMI cases, we have the potentail to drop NMIs. Therefore only execute all the handlers in the second half of a detected back-to-back NMI. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:01 +02:00
Don Zickus	9c48f1c629	x86, nmi: Wire up NMI handlers to new routines Just convert all the files that have an nmi handler to the new routines. Most of it is straight forward conversion. A couple of places needed some tweaking like kgdb which separates the debug notifier from the nmi handler and mce removes a call to notify_die. [Thanks to Ying for finding out the history behind that mce call https://lkml.org/lkml/2010/5/27/114 And Boris responding that he would like to remove that call because of it https://lkml.org/lkml/2011/9/21/163] The things that get converted are the registeration/unregistration routines and the nmi handler itself has its args changed along with code removal to check which list it is on (most are on one NMI list except for kgdb which has both an NMI routine and an NMI Unknown routine). Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Corey Minyard <minyard@acm.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Corey Minyard <minyard@acm.org> Cc: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:56:57 +02:00
Don Zickus	c9126b2ee8	x86, nmi: Create new NMI handler routines The NMI handlers used to rely on the notifier infrastructure. This worked great until we wanted to support handling multiple events better. One of the key ideas to the nmi handling is to process _all_ the handlers for each NMI. The reason behind this switch is because NMIs are edge triggered. If enough NMIs are triggered, then they could be lost because the cpu can only latch at most one NMI (besides the one currently being processed). In order to deal with this we have decided to process all the NMI handlers for each NMI. This allows the handlers to determine if they recieved an event or not (the ones that can not determine this will be left to fend for themselves on the unknown NMI list). As a result of this change it is now possible to have an extra NMI that was destined to be received for an already processed event. Because the event was processed in the previous NMI, this NMI gets dropped and becomes an 'unknown' NMI. This of course will cause printks that scare people. However, we prefer to have extra NMIs as opposed to losing NMIs and as such are have developed a basic mechanism to catch most of them. That will be a later patch. To accomplish this idea, I unhooked the nmi handlers from the notifier routines and created a new mechanism loosely based on doIRQ. The reason for this is the notifier routines have a couple of shortcomings. One we could't guarantee all future NMI handlers used NOTIFY_OK instead of NOTIFY_STOP. Second, we couldn't keep track of the number of events being handled in each routine (most only handle one, perf can handle more than one). Third, I wanted to eventually display which nmi handlers are registered in the system in /proc/interrupts to help see who is generating NMIs. The patch below just implements the new infrastructure but doesn't wire it up yet (that is the next patch). Its design is based on doIRQ structs and the atomic notifier routines. So the rcu stuff in the patch isn't entirely untested (as the notifier routines have soaked it) but it should be double checked in case I copied the code wrong. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:56:52 +02:00
Don Zickus	1d48922c14	x86, nmi: Split out nmi from traps.c The nmi stuff is changing a lot and adding more functionality. Split it out from the traps.c file so it doesn't continue to pollute that file. This makes it easier to find and expand all the future nmi related work. No real functional changes here. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-2-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:56:47 +02:00
Gleb Natapov	144d31e6f1	perf, intel: Use GO/HO bits in perf-ctr Intel does not have guest/host-only bit in perf counters like AMD does. To support GO/HO bits KVM needs to switch EVENTSELn values (or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is configured to count only in a guest mode it stays disabled in a host, but VMX is configured to switch it to enabled value during guest entry. This patch adds GO/HO tracking to Intel perf code and provides interface for KVM to get a list of MSRs that need to be switched on a guest entry. Only cpus with architectural PMU (v1 or later) are supported with this patch. To my knowledge there is not p6 models with VMX but without architectural PMU and p4 with VMX are rare and the interface is general enough to support them if need arise. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:56:42 +02:00
Paul Menzel	29cf7a30f8	x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE In summary, this DMI quirk uses the _CRS info by default for the ASUS M2V-MX SE by turning on `pci=use_crs` and is similar to the quirk added by commit `2491762cfb` ("x86/PCI: use host bridge _CRS info on ASRock ALiveSATA2-GLAN") whose commit message should be read for further information. Since commit `3e3da00c01` ("x86/pci: AMD one chain system to use pci read out res") Linux gives the following oops: parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE] HDA Intel 0000:20:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 HDA Intel 0000:20:01.0: setting latency timer to 64 BUG: unable to handle kernel paging request at ffffc90011c08000 IP: [<ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel] PGD 13781a067 PUD 13781b067 PMD 1300ba067 PTE 800000fd00000173 Oops: 0009 [#1] SMP last sysfs file: /sys/module/snd_pcm/initstate CPU 0 Modules linked in: snd_hda_intel(+) snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event tpm_tis tpm snd_seq tpm_bios psmouse parport_pc snd_timer snd_seq_device parport processor evdev snd i2c_viapro thermal_sys amd64_edac_mod k8temp i2c_core soundcore shpchp pcspkr serio_raw asus_atk0110 pci_hotplug edac_core button snd_page_alloc edac_mce_amd ext3 jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod raid1 md_mod usbhid hid sg sd_mod crc_t10dif sr_mod cdrom ata_generic uhci_hcd sata_via pata_via libata ehci_hcd usbcore scsi_mod via_rhine mii nls_base [last unloaded: scsi_wait_scan] Pid: 1153, comm: work_for_cpu Not tainted 2.6.37-1-amd64 #1 M2V-MX SE/System Product Name RIP: 0010:[<ffffffffa0578402>] [<ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel] RSP: 0018:ffff88013153fe50 EFLAGS: 00010286 RAX: ffffc90011c08000 RBX: ffff88013029ec00 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffff88013341d000 R08: 0000000000000000 R09: 0000000000000040 R10: 0000000000000286 R11: 0000000000003731 R12: ffff88013029c400 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88013341d090 FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:00000000f7610ab0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffc90011c08000 CR3: 0000000132f57000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process work_for_cpu (pid: 1153, threadinfo ffff88013153e000, task ffff8801303c86c0) Stack: 0000000000000005 ffffffff8123ad65 00000000000136c0 ffff88013029c400 ffff8801303c8998 ffff88013341d000 ffff88013341d090 ffff8801322d9dc8 ffff88013341d208 0000000000000000 0000000000000000 ffffffff811ad232 Call Trace: [<ffffffff8123ad65>] ? __pm_runtime_set_status+0x162/0x186 [<ffffffff811ad232>] ? local_pci_probe+0x49/0x92 [<ffffffff8105afc5>] ? do_work_for_cpu+0x0/0x1b [<ffffffff8105afc5>] ? do_work_for_cpu+0x0/0x1b [<ffffffff8105afd0>] ? do_work_for_cpu+0xb/0x1b [<ffffffff8105fd3f>] ? kthread+0x7a/0x82 [<ffffffff8100a824>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8105fcc5>] ? kthread+0x0/0x82 [<ffffffff8100a820>] ? kernel_thread_helper+0x0/0x10 Code: f4 01 00 00 ef 31 f6 48 89 df e8 29 dd ff ff 85 c0 0f 88 2b 03 00 00 48 89 ef e8 b4 39 c3 e0 8b 7b 40 e8 fc 9d b1 e0 48 8b 43 38 <66> 8b 10 66 89 14 24 8b 43 14 83 e8 03 83 f8 01 77 32 31 d2 be RIP [<ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel] RSP <ffff88013153fe50> CR2: ffffc90011c08000 ---[ end trace 8d1f3ebc136437fd ]--- Trusting the ACPI _CRS information (`pci=use_crs`) fixes this problem. $ dmesg \| grep -i crs # with the quirk PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug The match has to be against the DMI board entries though since the vendor entries are not populated. DMI: System manufacturer System Product Name/M2V-MX SE, BIOS 0304 10/30/2007 This quirk should be removed when `pci=use_crs` is enabled for machines from 2006 or earlier or some other solution is implemented. Using coreboot [1] with this board the problem does not exist but this quirk also does not affect it either. To be safe though the check is tightened to only take effect when the BIOS from American Megatrends is used. 15:13 < ruik> but coreboot does not need that 15:13 < ruik> because i have there only one root bus 15:13 < ruik> the audio is behind a bridge $ sudo dmidecode BIOS Information Vendor: American Megatrends Inc. Version: 0304 Release Date: 10/30/2007 [1] http://www.coreboot.org/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=30552 Cc: stable@kernel.org (2.6.34) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by: Paul Menzel <paulepanter@users.sourceforge.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-10-06 16:10:37 -07:00
Joerg Roedel	011af85784	perf, amd: Use GO/HO bits in perf-ctr The AMD perf-counters support counting in guest or host-mode only. Make use of that feature when user-space specified guest/host-mode only counting. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317816084-18026-3-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 13:00:31 +02:00
Ingo Molnar	7b4f86ac05	Merge branch 'ras' of git://amd64.org/linux/bp into perf/core	2011-10-06 12:54:36 +02:00
Ingo Molnar	9d01402023	Merge commit 'v3.1-rc9' into perf/core Merge reason: pick up latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 12:49:21 +02:00
Liu, Jinsong	a3e06bbe84	KVM: emulate lapic tsc deadline timer for guest This patch emulate lapic tsc deadline timer for guest: Enumerate tsc deadline timer capability by CPUID; Enable tsc deadline timer mode by lapic MMIO; Start tsc deadline timer by WRMSR; [jan: use do_div()] [avi: fix for !irqchip_in_kernel()] [marcelo: another fix for !irqchip_in_kernel()] Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-10-05 15:34:56 +02:00
Linus Torvalds	f72a209a3e	Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip * 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: irq: Fix check for already initialized irq_domain in irq_domain_add irq: Add declaration of irq_domain_simple_ops to irqdomain.h * 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86/rtc: Don't recursively acquire rtc_lock * 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: posix-cpu-timers: Cure SMP wobbles sched: Fix up wchan borkage sched/rt: Migrate equal priority tasks to available CPUs	2011-10-01 08:37:25 -07:00
David Vrabel	4dcaebbf65	xen: use generic functions instead of xen_{alloc, free}_vm_area() Replace calls to the Xen-specific xen_alloc_vm_area() and xen_free_vm_area() functions with the generic equivalent (alloc_vm_area() and free_vm_area()). On x86, these were identical already. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-29 15:02:18 -04:00
Ingo Molnar	4167ab90ee	Merge branch 'core' of git://amd64.org/linux/rric into perf/core	2011-09-29 17:35:29 +02:00
David Vrabel	f3f436e33b	xen: release all pages within 1-1 p2m mappings In xen_memory_setup() all reserved regions and gaps are set to an identity (1-1) p2m mapping. If an available page has a PFN within one of these 1-1 mappings it will become inaccessible (as it MFN is lost) so release them before setting up the mapping. This can make an additional 256 MiB or more of RAM available (depending on the size of the reserved regions in the memory map) if the initial pages overlap with reserved regions. The 1:1 p2m mappings are also extended to cover partial pages. This fixes an issue with (for example) systems with a BIOS that puts the DMI tables in a reserved region that begins on a non-page boundary. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-29 11:12:15 -04:00
David Vrabel	dc91c728fd	xen: allow extra memory to be in multiple regions Allow the extra memory (used by the balloon driver) to be in multiple regions (typically two regions, one for low memory and one for high memory). This allows the balloon driver to increase the number of available low pages (if the initial number if pages is small). As a side effect, the algorithm for building the e820 memory map is simpler and more obviously correct as the map supplied by the hypervisor is (almost) used as is (in particular, all reserved regions and gaps are preserved). Only RAM regions are altered and RAM regions above max_pfn + extra_pages are marked as unused (the region is split in two if necessary). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-29 11:12:10 -04:00
David Vrabel	8b5d44a5ac	xen: allow balloon driver to use more than one memory region Allow the xen balloon driver to populate its list of extra pages from more than one region of memory. This will allow platforms to provide (for example) a region of low memory and a region of high memory. The maximum possible number of extra regions is 128 (== E820MAX) which is quite large so xen_extra_mem is placed in __initdata. This is safe as both xen_memory_setup() and balloon_init() are in __init. The balloon regions themselves are not altered (i.e., there is still only the one region). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-29 11:12:10 -04:00
David Vrabel	aa24411b67	xen/balloon: account for pages released during memory setup In xen_memory_setup() pages that occur in gaps in the memory map are released back to Xen. This reduces the domain's current page count in the hypervisor. The Xen balloon driver does not correctly decrease its initial current_pages count to reflect this. If 'delta' pages are released and the target is adjusted the resulting reservation is always 'delta' less than the requested target. This affects dom0 if the initial allocation of pages overlaps the PCI memory region but won't affect most domU guests that have been setup with pseudo-physical memory maps that don't have gaps. Fix this by accouting for the released pages when starting the balloon driver. If the domain's targets are managed by xapi, the domain may eventually run out of memory and die because xapi currently gets its target calculations wrong and whenever it is restarted it always reduces the target by 'delta'. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-29 11:12:09 -04:00
Stefano Stabellini	b17d0b5c08	xen: XEN_PVHVM depends on PCI Xen PV on HVM guests require PCI support because they need the xen-platform-pci driver in order to initialize xenbus. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-29 10:52:16 -04:00
Stefano Stabellini	0930bba674	xen: modify kernel mappings corresponding to granted pages If we want to use granted pages for AIO, changing the mappings of a user vma and the corresponding p2m is not enough, we also need to update the kernel mappings accordingly. Currently this is only needed for pages that are created for user usages through /dev/xen/gntdev. As in, pages that have been in use by the kernel and use the P2M will not need this special mapping. However there are no guarantees that in the future the kernel won't start accessing pages through the 1:1 even for internal usage. In order to avoid the complexity of dealing with highmem, we allocated the pages lowmem. We issue a HYPERVISOR_grant_table_op right away in m2p_add_override and we remove the mappings using another HYPERVISOR_grant_table_op in m2p_remove_override. Considering that m2p_add_override and m2p_remove_override are called once per page we use multicalls and hypercall batching. Use the kmap_op pointer directly as argument to do the mapping as it is guaranteed to be present up until the unmapping is done. Before issuing any unmapping multicalls, we need to make sure that the mapping has already being done, because we need the kmap->handle to be set correctly. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Removed GRANT_FRAME_BIT usage] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-29 10:32:58 -04:00
Jan Beulich	eab9e6137f	x86-64: Fix CFI data for interrupt frames The patch titled "x86: Don't use frame pointer to save old stack on irq entry" did not properly adjust CFI directives, so this patch is a follow-up to that one. With the old stack pointer no longer stored in a callee-saved register (plus some offset), we now have to use a CFA expression to describe the memory location where it is being found. This requires the use of .cfi_escape (allowing arbitrary byte streams to be emitted into .eh_frame), as there is no .cfi_def_cfa_expression (which also cannot reasonably be expected, as it would require a full expression parser). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/4E8360200200007800058467@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-28 19:04:52 +02:00
Jan Beulich	e05139f256	x86-64: Don't apply destructive erratum workaround on unaffected CPUs Erratum 93 applies to AMD K8 CPUs only, and its workaround (forcing the upper 32 bits of %rip to all get set under certain conditions) is actually getting in the way of analyzing page faults occurring during EFI physical mode runtime calls (in particular the page table walk shown is completely unrelated to the actual fault). This is because typically EFI runtime code lives in the space between 2G and 4G, which - modulo the above manipulation - is likely to overlap with the kernel or modules area. While even for the other errata workarounds their taking effect could be limited to just the affected CPUs, none of them appears to be destructive, and they're generally getting called only outside of performance critical paths, so they're being left untouched. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/4E835FE30200007800058464@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-28 19:04:48 +02:00
Jan Beulich	838312be46	apic, i386/bigsmp: Fix false warnings regarding logical APIC ID mismatches These warnings (generally one per CPU) are a result of initializing x86_cpu_to_logical_apicid while apic_default is still in use, but the check in setup_local_APIC() being done when apic_bigsmp was already used as an override in default_setup_apic_routing(): Overriding APIC driver with bigsmp Enabling APIC mode: Physflat. Using 5 I/O APICs ------------[ cut here ]------------ WARNING: at .../arch/x86/kernel/apic/apic.c:1239 ... CPU 1 irqstacks, hard=f1c9a000 soft=f1c9c000 Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 9e000 Initializing CPU#1 ------------[ cut here ]------------ WARNING: at .../arch/x86/kernel/apic/apic.c:1239 setup_local_APIC+0x137/0x46b() Hardware name: ... CPU1 logical APIC ID: 2 != 8 ... Fix this (for the time being, i.e. until x86_32_early_logical_apicid() will get removed again, as Tejun says ought to be possible) by overriding the previously stored values at the point where the APIC driver gets overridden. v2: Move this and the pre-existing override logic into arch/x86/kernel/apic/bigsmp_32.c. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> (2.6.39 and onwards) Link: http://lkml.kernel.org/r/4E835D16020000780005844C@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-28 19:01:53 +02:00
Stephen Rothwell	910b2c5122	x86, amd: Include linux/elf.h since we use stuff from asm/elf.h After merging the moduleh tree, today's linux-next build (x86_64 allmodconfig) failed like this: arch/x86/kernel/sys_x86_64.c:28:10: warning: 'enum align_flags' declared inside parameter list arch/x86/kernel/sys_x86_64.c:28:10: warning: its scope is only this definition or declaration, which is probably not what you want arch/x86/kernel/sys_x86_64.c:28:22: error: parameter 3 ('flags') has incomplete type [...] Presumably caused by the module.h split interacting with a new commit `dfb09f9b7a` ("x86, amd: Avoid cache aliasing penalties on AMD family 15h") from the x8 tree. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/20110928174214.17a58be15d84d67c185930e1@canb.auug.org.au Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-28 10:34:31 +02:00
Ingo Molnar	695d16f787	Merge branch 'upstream/ticketlock-cleanup' of git://github.com/jsgf/linux-xen into x86/spinlocks	2011-09-28 08:57:10 +02:00
Jeremy Fitzhardinge	4a7f340c6a	x86, ticketlock: remove obsolete comment The note about partial registers is not really relevent now that we rely on gcc to generate all the assembler. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-09-27 23:37:20 -07:00
Randy Dunlap	d6eed550a9	x86: Perf_event_amd.c needs <asm/apicdef.h> Fix (rare) build error by adding <asm/apicdef.h> header file: arch/x86/kernel/cpu/perf_event_amd.c:350:2: error: 'BAD_APICID' undeclared (first use in this function) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Robert Richter <robert.richter@amd.com> Cc: Andre Przywara <andre.przywara@amd.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/4E820138.90301@xenotime.net Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-27 19:55:09 +02:00
Paul Bolle	395cf9691d	doc: fix broken references There are numerous broken references to Documentation files (in other Documentation files, in comments, etc.). These broken references are caused by typo's in the references, and by renames or removals of the Documentation files. Some broken references are simply odd. Fix these broken references, sometimes by dropping the irrelevant text they were part of. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-27 18:08:04 +02:00
Jeremy Fitzhardinge	fdb9eb9f15	xen/dom0: set wallclock time in Xen Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-09-26 11:04:39 -07:00
Jeremy Fitzhardinge	eec07a9ecf	xen: add dom0_op hypercall Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-09-26 11:04:39 -07:00
Yu Ke	3e0996798a	xen/acpi: Domain0 acpi parser related platform hypercall This patches implements the xen_platform_op hypercall, to pass the parsed ACPI info to hypervisor. Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Tian Kevin <kevin.tian@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> [v1: Added DEFINE_GUEST.. in appropiate headers] [v2: Ripped out typedefs] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-26 11:04:39 -07:00
Kevin Winchester	de0428a7ad	x86, perf: Clean up perf_event cpu code The CPU support for perf events on x86 was implemented via included C files with #ifdefs. Clean this up by creating a new header file and compiling the vendor-specific files as needed. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314747665-2090-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-26 12:58:00 +02:00
Ingo Molnar	ed3982cf37	Merge commit 'v3.1-rc7' into perf/core Merge reason: Pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-26 12:54:28 +02:00
Liu, Jinsong	b90dfb0419	x86: TSC deadline definitions This pre-defination is preparing for KVM tsc deadline timer emulation, but theirself are not kvm specific. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:53:00 +03:00
Avi Kivity	7460fb4a34	KVM: Fix simultaneous NMIs If simultaneous NMIs happen, we're supposed to queue the second and next (collapsing them), but currently we sometimes collapse the second into the first. Fix by using a counter for pending NMIs instead of a bool; since the counter limit depends on whether the processor is currently in an NMI handler, which can only be checked in vcpu context (via the NMI mask), we add a new KVM_REQ_NMI to request recalculation of the counter. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:59 +03:00
Avi Kivity	1cd196ea42	KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:58 +03:00
Avi Kivity	d4b4325fdb	KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:57 +03:00
Avi Kivity	c191a7a0f4	KVM: x86 emulator: streamline decode of segment registers The opcodes push %seg pop %seg l%seg, %mem, %reg (e.g. lds/les/lss/lfs/lgs) all have an segment register encoded in the instruction. To allow reuse, decode the segment number into src2 during the decode stage instead of the execution stage. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:56 +03:00
Avi Kivity	41ddf9784c	KVM: x86 emulator: simplify OpMem64 decode Use the same technique as the other OpMem variants, and goto mem_common. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:55 +03:00
Avi Kivity	0fe5912884	KVM: x86 emulator: switch src decode to decode_operand() Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:54 +03:00
Avi Kivity	5217973ef8	KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack OpReg decoding has a hack that inhibits byte registers for movsx and movzx instructions. It should be replaced by something better, but meanwhile, qualify that the hack is only active for the destination operand. Note these instructions only use OpReg for the destination, but better to be explicit about it. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:53 +03:00
Avi Kivity	608aabe316	KVM: x86 emulator: switch OpImmUByte decode to decode_imm() Similar to SrcImmUByte. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:52 +03:00
Avi Kivity	20c29ff205	KVM: x86 emulator: free up some flag bits near src, dst Op fields are going to grow by a bit, we need two free bits. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:51 +03:00
Avi Kivity	4dd6a57df7	KVM: x86 emulator: switch src2 to generic decode_operand() Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:50 +03:00
Avi Kivity	b1ea50b2b6	KVM: x86 emulator: expand decode flags to 64 bits Unifiying the operands means not taking advantage of the fact that some operand types can only go into certain operands (for example, DI can only be used by the destination), so we need more bits to hold the operand type. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:49 +03:00
Avi Kivity	a99455499a	KVM: x86 emulator: split dst decode to a generic decode_operand() Instead of decoding each operand using its own code, use a generic function. Start with the destination operand. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:48 +03:00
Avi Kivity	f09ed83e21	KVM: x86 emulator: move memop, memopp into emulation context Simplifies further generalization of decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:47 +03:00
Avi Kivity	3329ece161	KVM: x86 emulator: convert group 3 instructions to direct decode Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:46 +03:00
Jan Kiszka	9bc5791d4a	KVM: x86: Add module parameter for lapic periodic timer limit Certain guests, specifically RTOSes, request faster periodic timers than what we allow by default. Add a module parameter to adjust the limit for non-standard setups. Also add a rate-limited warning in case the guest requested more. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:44 +03:00
Jan Kiszka	bd80158aff	KVM: Clean up and extend rate-limited output The use of printk_ratelimit is discouraged, replace it with pr*_ratelimited or __ratelimit. While at it, convert remaining guest-triggerable printks to rate-limited variants. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:43 +03:00
Jan Kiszka	7712de872c	KVM: x86: Avoid guest-triggerable printks in APIC model Convert remaining printks that the guest can trigger to apic_printk. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:42 +03:00
Jan Kiszka	1e2b1dd797	KVM: x86: Move kvm_trace_exit into atomic vmexit section This avoids that events causing the vmexit are recorded before the actual exit reason. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:41 +03:00
Avi Kivity	caa8a168e3	KVM: x86 emulator: disable writeback for TEST The TEST instruction doesn't write its destination operand. This could cause problems if an MMIO register was accessed using the TEST instruction. Recently Windows XP was observed to use TEST against the APIC ICR; this can cause spurious IPIs. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:40 +03:00
Avi Kivity	e8f2b1d621	KVM: x86 emulator: simplify emulate_1op_rax_rdx() emulate_1op_rax_rdx() is always called with the same parameters. Simplify by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:37 +03:00
Avi Kivity	9fef72ce10	KVM: x86 emulator: merge the two emulate_1op_rax_rdx implementations We have two emulate-with-extended-accumulator implementations: once which expect traps (_ex) and one which doesn't (plain). Drop the plain implementation and always use the one which expects traps; it will simply return 0 in the _ex argument and we can happily ignore it. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:36 +03:00
Avi Kivity	d1eef45d59	KVM: x86 emulator: simplify emulate_1op() emulate_1op() is always called with the same parameters. Simplify by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:35 +03:00
Avi Kivity	29053a60d7	KVM: x86 emulator: simplify emulate_2op_cl() emulate_2op_cl() is always called with the same parameters. Simplify by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:34 +03:00
Avi Kivity	761441b9f4	KVM: x86 emulator: simplify emulate_2op_cl() emulate_2op_cl() is always called with the same parameters. Simplify by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:33 +03:00
Avi Kivity	a31b9ceadb	KVM: x86 emulator: simplify emulate_2op_SrcV() emulate_2op_SrcV(), and its siblings, emulate_2op_SrcV_nobyte() and emulate_2op_SrcB(), all use the same calling conventions and all get passed exactly the same parameters. Simplify them by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:52:32 +03:00
Kevin Tian	58fbbf26eb	KVM: APIC: avoid instruction emulation for EOI writes Instruction emulation for EOI writes can be skipped, since sane guest simply uses MOV instead of string operations. This is a nice improvement when guest doesn't support x2apic or hyper-V EOI support. a single VM bandwidth is observed with ~8% bandwidth improvement (7.4Gbps->8Gbps), by saving ~5% cycles from EOI emulation. Signed-off-by: Kevin Tian <kevin.tian@intel.com> <Based on earlier work from>: Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:52:17 +03:00
Nadav Har'El	45133ecaae	KVM: SVM: Fix TSC MSR read in nested SVM When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be read without exit), we need to return L2's notion of the TSC, not L1's. The current code incorrectly returned L1 TSC, because svm_get_msr() was also used in x86.c where this was assumed, but now that these places call the new svm_read_l1_tsc(), the MSR read can be fixed. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:18:03 +03:00
Nadav Har'El	27fc51b21c	KVM: nVMX: Fix nested VMX TSC emulation This patch fixes two corner cases in nested (L2) handling of TSC-related issues: 1. Somewhat suprisingly, according to the Intel spec, if L1 allows WRMSR to the TSC MSR without an exit, then this should set L1's TSC value itself - not offset by vmcs12.TSC_OFFSET (like was wrongly done in the previous code). 2. Allow L1 to disable the TSC_OFFSETING control, and then correctly ignore the vmcs12.TSC_OFFSET. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:18:02 +03:00
Nadav Har'El	d5c1785d2f	KVM: L1 TSC handling KVM assumed in several places that reading the TSC MSR returns the value for L1. This is incorrect, because when L2 is running, the correct TSC read exit emulation is to return L2's value. We therefore add a new x86_ops function, read_l1_tsc, to use in places that specifically need to read the L1 TSC, NOT the TSC of the current level of guest. Note that one change, of one line in kvm_arch_vcpu_load, is made redundant by a different patch sent by Zachary Amsden (and not yet applied): kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of course we didn't have to change the call of kvm_get_msr() to read_l1_tsc(). [avi: moved callback to kvm_x86_ops tsc block] Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Acked-by: Zachary Amsdem <zamsden@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:18:02 +03:00
Yang, Wei Y	cd46868c7f	KVM: MMU: Fix SMEP failure during fetch This patch fix kvm-unit-tests hanging and incorrect PT_ACCESSED_MASK bit set in the case of SMEP fault. The code updated 'eperm' after the variable was checked. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:18:02 +03:00
Avi Kivity	e4e517b4be	KVM: MMU: Do not unconditionally read PDPTE from guest memory Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded. On SVM, it is not possible to implement this, but on VMX this is possible and was indeed implemented until nested SVM changed this to unconditionally read PDPTEs dynamically. This has noticable impact when running PAE guests. Fix by changing the MMU to read PDPTRs from the cache, falling back to reading from memory for the nested MMU. Signed-off-by: Avi Kivity <avi@redhat.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:18:01 +03:00
Julia Lawall	cf3ace79c0	KVM: VMX: trivial: use BUG_ON Use BUG_ON(x) rather than if(x) BUG(); The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ -if (x) BUG(); +BUG_ON(x); @@ identifier x; @@ -if (!x) BUG(); +BUG_ON(!x); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:18:01 +03:00
Marcelo Tosatti	742bc67042	KVM: x86: report valid microcode update ID Windows Server 2008 SP2 checked build with smp > 1 BSOD's during boot due to lack of microcode update: * Assertion failed: The system BIOS on this machine does not properly support the processor. The system BIOS did not load any microcode update. A BIOS containing the latest microcode update is needed for system reliability. (CurrentUpdateRevision != 0) * Source File: d:\longhorn\base\hals\update\intelupd\update.c, line 440 Report a non-zero microcode update signature to make it happy. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:18:01 +03:00
Takuya Yoshikawa	1d2887e2d8	KVM: x86 emulator: Make x86_decode_insn() return proper macros Return EMULATION_OK/FAILED consistently. Also treat instruction fetch errors, not restricted to X86EMUL_UNHANDLEABLE, as EMULATION_FAILED; although this cannot happen in practice, the current logic will continue the emulation even if the decoder fails to fetch the instruction. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:18:01 +03:00
Takuya Yoshikawa	7d88bb4803	KVM: x86 emulator: Let compiler know insn_fetch() rarely fails Fetching the instruction which was to be executed by the guest cannot fail normally. So compiler should always predict that it will succeed. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:18:00 +03:00
Takuya Yoshikawa	e85a10852c	KVM: x86 emulator: Drop _size argument from insn_fetch() _type is enough to know the size. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:59 +03:00
Takuya Yoshikawa	807941b121	KVM: x86 emulator: Use ctxt->_eip directly in do_insn_fetch_byte() Instead of passing ctxt->_eip from insn_fetch() call sites, get it from ctxt in do_insn_fetch_byte(). This is done by replacing the argument _eip of insn_fetch() with _ctxt, which should be better than letting the macro use ctxt silently in its body. Though this changes the place where ctxt->_eip is incremented from insn_fetch() to do_insn_fetch_byte(), this does not have any real effect. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:59 +03:00
Sasha Levin	743eeb0b01	KVM: Intelligent device lookup on I/O bus Currently the method of dealing with an IO operation on a bus (PIO/MMIO) is to call the read or write callback for each device registered on the bus until we find a device which handles it. Since the number of devices on a bus can be significant due to ioeventfds and coalesced MMIO zones, this leads to a lot of overhead on each IO operation. Instead of registering devices, we now register ranges which points to a device. Lookup is done using an efficient bsearch instead of a linear search. Performance test was conducted by comparing exit count per second with 200 ioeventfds created on one byte and the guest is trying to access a different byte continuously (triggering usermode exits). Before the patch the guest has achieved 259k exits per second, after the patch the guest does 274k exits per second. Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:59 +03:00
Stefan Hajnoczi	0d460ffc09	KVM: Use __print_symbolic() for vmexit tracepoints The vmexit tracepoints format the exit_reason to make it human-readable. Since the exit_reason depends on the instruction set (vmx or svm), formatting is handled with ftrace_print_symbols_seq() by referring to the appropriate exit reason table. However, the ftrace_print_symbols_seq() function is not meant to be used directly in tracepoints since it does not export the formatting table which userspace tools like trace-cmd and perf use to format traces. In practice perf dies when formatting vmexit-related events and trace-cmd falls back to printing the numeric value (with extra formatting code in the kvm plugin to paper over this limitation). Other userspace consumers of vmexit-related tracepoints would be in similar trouble. To avoid significant changes to the kvm_exit tracepoint, this patch moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and selects the right table with __print_symbolic() depending on the instruction set. Note that __print_symbolic() is designed for exporting the formatting table to userspace and allows trace-cmd and perf to work. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:59 +03:00
Stefan Hajnoczi	e097e5ffd6	KVM: Record instruction set in all vmexit tracepoints The kvm_exit tracepoint recently added the isa argument to aid decoding exit_reason. The semantics of exit_reason depend on the instruction set (vmx or svm) and the isa argument allows traces to be analyzed on other machines. Add the isa argument to kvm_nested_vmexit and kvm_nested_vmexit_inject so these tracepoints can also be self-describing. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:58 +03:00
Mike Waychison	d1613ad5d0	KVM: Really fix HV_X64_MSR_APIC_ASSIST_PAGE Commit 0945d4b228 tried to fix the get_msr path for the HV_X64_MSR_APIC_ASSIST_PAGE msr, but was poorly tested. We should be returning 0 if the read succeeded, and passing the value back to the caller via the pdata out argument, not returning the value directly. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:58 +03:00
Mike Waychison	14fa67ee95	KVM: x86: get_msr support for HV_X64_MSR_APIC_ASSIST_PAGE "get" support for the HV_X64_MSR_APIC_ASSIST_PAGE msr was missing, even though it is explicitly enumerated as something the vmm should save in msrs_to_save and reported to userland via the KVM_GET_MSR_INDEX_LIST ioctl. Add "get" support for HV_X64_MSR_APIC_ASSIST_PAGE. We simply return the guest visible value of this register, which seems to be correct as a set on the register is validated for us already. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:17:58 +03:00
Sasha Levin	8c3ba334f8	KVM: x86: Raise the hard VCPU count limit The patch raises the hard limit of VCPU count to 254. This will allow developers to easily work on scalability and will allow users to test high VCPU setups easily without patching the kernel. To prevent possible issues with current setups, KVM_CAP_NR_VCPUS now returns the recommended VCPU limit (which is still 64) - this should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS returns the hard limit which is now 254. Cc: Avi Kivity <avi@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Suggested-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:17:57 +03:00
Xiao Guangrong	22388a3c8c	KVM: x86: cleanup the code of read/write emulation Using the read/write operation to remove the same code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:17 +03:00
Xiao Guangrong	77d197b2ca	KVM: x86: abstract the operation for read/write emulation The operations of read emulation and write emulation are very similar, so we can abstract the operation of them, in larter patch, it is used to cleanup the same code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:17 +03:00
Xiao Guangrong	ca7d58f375	KVM: x86: fix broken read emulation spans a page boundary If the range spans a page boundary, the mmio access can be broke, fix it as write emulation. And we already get the guest physical address, so use it to read guest data directly to avoid walking guest page table again Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:17 +03:00
Avi Kivity	9be3be1f15	KVM: x86 emulator: fix Src2CL decode Src2CL decode (used for double width shifts) erronously decodes only bit 3 of %rcx, instead of bits 7:0. Fix by decoding %cl in its entirety. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:14:58 +03:00
Zhao Jin	41bc3186b3	KVM: MMU: fix incorrect return of spte __update_clear_spte_slow should return original spte while the current code returns low half of original spte combined with high half of new spte. Signed-off-by: Zhao Jin <cronozhj@gmail.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:13:25 +03:00
Konrad Rzeszutek Wilk	0f4b49eaf2	xen/p2m: Use SetPagePrivate and its friends for M2P overrides. We use the page->private field and hence should use the proper macros and set proper bits. Also WARN_ON in case somebody tries to overwrite our data. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-23 22:22:33 -04:00
Konrad Rzeszutek Wilk	a867db10e8	xen/p2m: Make debug/xen/mmu/p2m visible again. We dropped a lot of the MMU debugfs in favour of using tracing API - but there is one which just provides mostly static information that was made invisible by this change. Bring it back. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-23 22:22:32 -04:00
Jan Beulich	55e901fc1f	xen/pci: support multi-segment systems Now that the hypercall interface changes are in -unstable, make the kernel side code not ignore the segment (aka domain) number anymore (which results in pretty odd behavior on such systems). Rather, if only the old interfaces are available, don't call them for devices on non-zero segments at all. Signed-off-by: Jan Beulich <jbeulich@suse.com> [v1: Edited git description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-22 16:23:46 -04:00
H Hartley Sweeten	4a4cc2b6bf	crypto: aes-x86 - quiet sparse noise about symbol not declared Include <asm/aes.h> to pick up the declarations for crypto_aes_encrypt_x86 and crypto_aes_decrypt_x86 to quiet the sparse noise: warning: symbol 'crypto_aes_encrypt_x86' was not declared. Should it be static? warning: symbol 'crypto_aes_decrypt_x86' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Acked-by: Mandeep Singh Baines <msb@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2011-09-22 21:33:01 +10:00
Jussi Kivilinna	64b94ceae8	crypto: blowfish - add x86_64 assembly implementation Patch adds x86_64 assembly implementation of blowfish. Two set of assembler functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'four-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 4-way functions should be equal to 1-way functions with in-order CPUs. Summary of the tcrypt benchmarks: Blowfish assembler vs blowfish C (256bit 8kb block ECB) encrypt: 2.2x speed decrypt: 2.3x speed Blowfish assembler vs blowfish C (256bit 8kb block CBC) encrypt: 1.12x speed decrypt: 2.5x speed Blowfish assembler vs blowfish C (256bit 8kb block CTR) encrypt: 2.5x speed Full output: http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txt http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txt Tests were run on: vendor_id : AuthenticAMD cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1055T Processor stepping : 0 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2011-09-22 21:25:26 +10:00
Matt Fleming	47997d756a	x86/rtc: Don't recursively acquire rtc_lock A deadlock was introduced on x86 in commit `ef68c8f87e` ("x86: Serialize EFI time accesses on rtc_lock") because efi_get_time() and friends can be called with rtc_lock already held by read_persistent_time(), e.g.: timekeeping_init() read_persistent_clock() <-- acquire rtc_lock efi_get_time() phys_efi_get_time() <-- acquire rtc_lock <DEADLOCK> To fix this let's push the locking down into the get_wallclock() and set_wallclock() implementations. Only the clock implementations that access the x86 RTC directly need to acquire rtc_lock, so it makes sense to push the locking down into the rtc, vrtc and efi code. The virtualization implementations don't require rtc_lock to be held because they provide their own serialization. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Acked-by: Jan Beulich <jbeulich@novell.com> Acked-by: Avi Kivity <avi@redhat.com> [for the virtualization aspect] Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-21 16:16:09 +02:00
Jack Steiner	6a469e4665	x86: uv2: Workaround for UV2 Hub bug (system global address format) This is a workaround for a UV2 hub bug that affects the format of system global addresses. The GRU API for UV2 was inadvertently broken by a hardware change. The format of the physical address used for TLB dropins and for addresses used with instructions running in unmapped mode has changed. This change was not documented and became apparent only when diags failed running on system simulators. For UV1, TLB and GRU instruction physical addresses are identical to socket physical addresses (although high NASID bits must be OR'ed into the address). For UV2, socket physical addresses need to be converted. The NODE portion of the physical address needs to be shifted so that the low bit is in bit 39 or bit 40, depending on an MMR value. It is not yet clear if this bug will be fixed in a silicon respin. If it is fixed, the hub revision will be incremented & the workaround disabled. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-09-21 11:23:15 +02:00
Ed Wildgoose	d4f3e35017	x86: geode: New PCEngines Alix system driver This new driver replaces the old PCEngines Alix 2/3 LED driver with a new driver that controls the LEDs through the leds-gpio driver. The old driver accessed GPIOs directly, which created a conflict and prevented also loading the cs5535-gpio driver to read other GPIOs on the Alix board. With this new driver, we hook into leds-gpio which in turn uses GPIO to control the LEDs and therefore it's possible to control both the LEDs and access onboard GPIOs Driver is moved to platform/geode as requested by Grant and any other geode initialisation modules should move here also This driver is inspired by leds-net5501.c by Alessandro Zummo. Ideally, leds-net5501.c should also be moved to platform/geode. Additionally the driver relies on parts of the patch: `7f131cf3ed` ("leds: leds-alix2c - take port address from MSR) by Daniel Mack to perform detection of the Alix board. [akpm@linux-foundation.org: include module.h] Signed-off-by: Ed Wildgoose <kernel@wildgooses.com> Cc: git@wildgooses.com Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Daniel Mack <daniel@caiaq.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Richard Purdie <rpurdie@rpsys.net> Reviewed-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-09-21 11:19:42 +02:00
Suresh Siddha	c020570138	x86, ioapic: Consolidate the explicit EOI code Consolidate the io-apic EOI code in clear_IO_APIC_pin() and eoi_ioapic_irq(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Rafael Wysocki <rjw@novell.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: lchiquitto@novell.com Cc: jbeulich@novell.com Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/20110825190657.259696697@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-21 10:26:28 +02:00
Suresh Siddha	e57253a81d	x86, ioapic: Restore the mask bit correctly in eoi_ioapic_irq() For older IO-APIC's, we were clearing the remote-IRR by changing the RTE trigger mode to edge and then back to level. We wanted to mask the RTE during this process, so we were essentially doing mask+edge and then to unmask+level. As part of the commit `ca64c47cec`, we moved this EOI process earlier where the IO-APIC RTE is masked. So we were wrongly unmasking it in the eoi_ioapic_irq(). So change the remote-IRR clear sequence in eoi_ioapic_irq() to mask + edge and then restore the previous RTE entry which will restore the mask status as well as the level trigger. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Thomas Renninger <trenn@suse.de> Cc: Rafael Wysocki <rjw@novell.com> Cc: lchiquitto@novell.com Cc: jbeulich@novell.com Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/20110825190657.210286410@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-21 10:26:26 +02:00
Suresh Siddha	1e75b31d63	x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC In the kdump scenario mentioned below, we can have a case where the device using level triggered interrupt will not generate any interrupts in the kdump kernel. 1. IO-APIC sends a level triggered interrupt to the CPU's local APIC. 2. Kernel crashed before the CPU services this interrupt, leaving the remote-IRR in the IO-APIC set. 3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC initialization. But this fails to reset remote-IRR bit of the IO-APIC RTE as the remote-IRR bit is read-only. 4. Device using that level triggered entry can't generate any more interrupts because of the remote-IRR bit. In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if so do an explicit attempt to clear it (by doing EOI write on modern io-apic's and changing trigger mode to edge/level on older io-apic's). Also before doing the explicit EOI to the io-apic, ensure that the trigger mode is indeed set to level. This will enable the explicit EOI to the io-apic to reset the remote-IRR bit. Tested-by: Leonardo Chiquitto <lchiquitto@novell.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686 Cc: Rafael Wysocki <rjw@novell.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Thomas Renninger <trenn@suse.de> Cc: jbeulich@novell.com Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/20110825190657.157502602@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-21 10:26:25 +02:00
Suresh Siddha	d3f138106b	iommu: Rename the DMAR and INTR_REMAP config options Change the CONFIG_DMAR to CONFIG_INTEL_IOMMU to be consistent with the other IOMMU options. Rename the CONFIG_INTR_REMAP to CONFIG_IRQ_REMAP to match the irq subsystem name. And define the CONFIG_DMAR_TABLE for the common ACPI DMAR routines shared by both CONFIG_INTEL_IOMMU and CONFIG_IRQ_REMAP. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: youquan.song@intel.com Cc: joerg.roedel@amd.com Cc: tony.luck@intel.com Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/20110824001456.558630224@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-21 10:22:03 +02:00
Suresh Siddha	c39d77ffa2	x86, ioapic: Define irq_remap_modify_chip_defaults() Define irq_remap_modify_chip_defaults() and remove the duplicate code, cleanup the unnecessary ifdefs. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: youquan.song@intel.com Cc: joerg.roedel@amd.com Cc: tony.luck@intel.com Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/20110824001456.499225692@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-21 10:22:01 +02:00
Suresh Siddha	13ea20f7a2	x86, msi, intr-remap: Use the ioapic set affinity routine IRQ set affinity routine is same for the IO-APIC IRQ's aswell as the MSI IRQ's in the presence of interrupt-remapping. This is because we modify the interrupt-remapping table entry and doesn't touch the IO-APIC RTE or the MSI entry. So remove the ir_msi_set_affinity() and re-use the ir_ioapic_set_affinity() Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: youquan.song@intel.com Cc: joerg.roedel@amd.com Cc: tony.luck@intel.com Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/20110824001456.452760446@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-21 10:21:59 +02:00
Suresh Siddha	41750d31fc	x86, x2apic: Enable the bios request for x2apic optout On the platforms which are x2apic and interrupt-remapping capable, Linux kernel is enabling x2apic even if the BIOS doesn't. This is to take advantage of the features that x2apic brings in. Some of the OEM platforms are running into issues because of this, as their bios is not x2apic aware. For example, this was resulting in interrupt migration issues on one of the platforms. Also if the BIOS SMI handling uses APIC interface to send SMI's, then the BIOS need to be aware of x2apic mode that OS has enabled. On some of these platforms, BIOS doesn't have a HW mechanism to turnoff the x2apic feature to prevent OS from enabling it. To resolve this mess, recent changes to the VT-d2 specification: http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf includes a mechanism that provides BIOS a way to request system software to opt out of enabling x2apic mode. Look at the x2apic optout flag in the DMAR tables before enabling the x2apic mode in the platform. Also print a warning that we have disabled x2apic based on the BIOS request. Kernel boot parameter "intremap=no_x2apic_optout" can be used to override the BIOS x2apic optout request. Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: joerg.roedel@amd.com Cc: tony.luck@intel.com Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/20110824001456.171766616@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-21 10:21:50 +02:00
Linus Torvalds	abbe0d3c26	Merge branch 'stable/bug.fixes' of git://oss.oracle.com/git/kwilk/xen * 'stable/bug.fixes' of git://oss.oracle.com/git/kwilk/xen: xen/i386: follow-up to "replace order-based range checking of M2P table by linear one" xen/irq: Alter the locking to use a mutex instead of a spinlock. xen/e820: if there is no dom0_mem=, don't tweak extra_pages. xen: disable PV spinlocks on HVM	2011-09-16 11:28:11 -07:00
Linus Torvalds	a7f934d4f1	asm alternatives: remove incorrect alignment notes On x86-64, they were just wasteful: with the explicitly added (now unnecessary) padding, the size of the alternatives structure was 16 bytes, and an alignment of 8 bytes didn't hurt much. However, it was still silly, since the natural size and alignment for the structure is actually just 12 bytes, 4-byte aligned since commit `59e97e4d6f` ("x86: Make alternative instruction pointers relative"). So removing the padding, and removing the extra alignment is just a good idea. On x86-32, the alignment of 4 bytes was correct, but was incorrectly hardcoded as 8 bytes in <asm/alternative-asm.h>. That header file had used to be an x86-64 only header file, but various unification efforts have made it be used for x86-32 too (ie the unification of rwlock and rwsem). That in turn caused x86-32 boot failures, because the extra alignment would result in random zero-filled words in the altinstructions section, causing oopses early at boot when doing alternative instruction replacement. So just remove all the alignment noise entirely. It's wrong, and it's unnecessary. The section itself is already properly aligned by the linker scripts, and all additions to the section had better be of the proper 12-byte format, keeping it aligned. So if the align directive were to ever make a difference, that would be an indication of a serious bug to begin with. Reported-by: Werner Landgraf <w.landgraf@ru.r> Acked-by: Andrew Lutomirski <luto@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-15 13:28:33 -07:00
Jiri Kosina	e060c38434	Merge branch 'master' into for-next Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.	2011-09-15 15:08:18 +02:00
Jesper Juhl	469bfb4a50	Remove unneeded version.h include from arch/x86/ It was pointed out by 'make versioncheck' that the include of linux/version.h is not needed in arch/x86/mm/mmio-mod.c . This patch removes it. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-15 14:57:06 +02:00
Kamalesh Babulal	ea70ef3d9d	sched: x86_32 Fix typo in switch_to() description This patch fixes the typo in parameters passed to x86_32 switch_to() description. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-15 14:13:48 +02:00
Jan Beulich	61cca2fab7	xen/i386: follow-up to "replace order-based range checking of M2P table by linear one" The numbers obtained from the hypervisor really can't ever lead to an overflow here, only the original calculation going through the order of the range could have. This avoids the (as Jeremy points outs) somewhat ugly NULL-based calculation here. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-15 04:39:46 -04:00
Hidetoshi Seto	9aaef96f61	x86, mce: Do not call del_timer_sync() in IRQ context del_timer_sync() can cause a deadlock when called in interrupt context. It is used with on_each_cpu() in some parts for sysfs files like bank*, check_interval, cmci_disabled and ignore_ce. However, use of on_each_cpu() results in calling the function passed as the argument in interrupt context. This causes a flood of nested warnings from del_timer_sync() (it runs on each CPU) caused even by a simple file access like: $ echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval Fortunately, these MCE-specific files are rarely used and AFAIK only few MCE geeks experience this warning. To remove the warning, move timer deletion outside of the interrupt context. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-09-14 15:50:15 +02:00
David Vrabel	e3b73c4a25	xen/e820: if there is no dom0_mem=, don't tweak extra_pages. The patch "xen: use maximum reservation to limit amount of usable RAM" (`d312ae878b`) breaks machines that do not use 'dom0_mem=' argument with: reserve RAM buffer: 000000133f2e2000 - 000000133fffffff (XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: ... The reason being that the last E820 entry is created using the 'extra_pages' (which is based on how many pages have been freed). The mentioned git commit sets the initial value of 'extra_pages' using a hypercall which returns the number of pages (if dom0_mem has been used) or -1 otherwise. If the later we return with MAX_DOMAIN_PAGES as basis for calculation: return min(max_pages, MAX_DOMAIN_PAGES); and use it: extra_limit = xen_get_max_pages(); if (extra_limit >= max_pfn) extra_pages = extra_limit - max_pfn; else extra_pages = 0; which means we end up with extra_pages = 128GB in PFNs (33554432) - 8GB in PFNs (2097152, on this specific box, can be larger or smaller), and then we add that value to the E820 making it: Xen: 00000000ff000000 - 0000000100000000 (reserved) Xen: 0000000100000000 - 000000133f2e2000 (usable) which is clearly wrong. It should look as so: Xen: 00000000ff000000 - 0000000100000000 (reserved) Xen: 0000000100000000 - 000000027fbda000 (usable) Naturally this problem does not present itself if dom0_mem=max:X is used. CC: stable@kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-13 10:17:32 -04:00
Thomas Gleixner	59d958d2c7	locking, x86: mce: Annotate cmci_discover_lock as raw The cmci_discover_lock can be taken in atomic context (cpu bring up sequence) and therefore cannot be preempted on -rt. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-13 11:12:09 +02:00
Thomas Gleixner	2d21a29fb6	locking, oprofile: Annotate oprofilefs lock as raw The oprofilefs_lock can be taken in atomic context (in profiling interrupts) and therefore cannot cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-13 11:12:05 +02:00
Linus Torvalds	d9543314ee	Merge branch 'upstream/bugfix' of git://github.com/jsgf/linux-xen * 'upstream/bugfix' of git://github.com/jsgf/linux-xen: xen: use non-tracing preempt in xen_clocksource_read()	2011-09-12 17:22:31 -07:00
Frank Arnold	77e75fc764	x86: cache_info: Update calculation of AMD L3 cache indices L3 subcaches 0 and 1 of AMD Family 15h CPUs can have a size of 2MB. Update the calculation routine for the number of L3 indices to reflect that. Signed-off-by: Frank Arnold <frank.arnold@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Rosenfeld Hans <Hans.Rosenfeld@amd.com> Cc: Herrmann3 Andreas <Andreas.Herrmann3@amd.com> Cc: Mike Travis <travis@sgi.com> Cc: Frank Arnold <Frank.Arnold@amd.com> Link: http://lkml.kernel.org/r/20110726170449.GB32536@aftab Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-12 19:29:03 +02:00
Thomas Gleixner	d2946041ff	x86: cache_info: Kill the atomic allocation in amd_init_l3_cache() It's not a good reason to allocate memory in the smp function call just because someone thought it's the most conveniant place. The AMD L3 data is coupled to the northbridge info by a pointer to the corresponding north bridge data. So allocating it with the northbridge data and referencing the northbridge in the cache_info code instead uses less memory and gets rid of that atomic allocation hack in the smp function call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20110723212626.688229918@linutronix.de Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-12 19:28:37 +02:00
Thomas Gleixner	b7d11a768b	x86: cache_info: Kill the moronic shadow struct Commit `f9b90566c` ("x86: reduce stack usage in init_intel_cacheinfo") introduced a shadow structure to reduce the stack usage on large machines instead of making the smaller structure embedded into the large one. That's definitely a candidate for the bad taste award. Move the small struct into the large one and get rid of the ugly type casts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20110723212626.625651773@linutronix.de Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-12 19:28:33 +02:00
Thomas Gleixner	05b217b021	x86: cache_info: Remove bogus free of amd_l3_cache data free_cache_attributes() kfree's: per_cpu(ici_cpuid4_info, cpu)->l3 which is a pointer to memory which was allocated as a block in amd_init_l3_cache(). l3 of a particular cpu points to a part of this memory blob. The part and the rest of the blob are still referenced by other cpus. As far as I can tell from the git history this is a leftover from the conversion from per cpu to node data with commit ba06edb63(x86, cacheinfo: Make L3 cache info per node) and the following commit f658bcfb2(x86, cacheinfo: Cleanup L3 cache index disable support) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20110723212626.550539989@linutronix.de Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-12 19:28:29 +02:00
Shyam Iyer	5307f6d5fb	Fix pointer dereference before call to pcie_bus_configure_settings Commit `b03e7495a8` ("PCI: Set PCI-E Max Payload Size on fabric") introduced a potential NULL pointer dereference in calls to pcie_bus_configure_settings due to attempts to access pci_bus self variables when the self pointer is NULL. To correct this, verify that the self pointer in pci_bus is non-NULL before dereferencing it. Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Shyam Iyer <shyam_iyer@dell.com> Signed-off-by: Jon Mason <mason@myri.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-09 19:49:58 -07:00
Stefano Stabellini	f10cd522c5	xen: disable PV spinlocks on HVM PV spinlocks cannot possibly work with the current code because they are enabled after pvops patching has already been done, and because PV spinlocks use a different data structure than native spinlocks so we cannot switch between them dynamically. A spinlock that has been taken once by the native code (__ticket_spin_lock) cannot be taken by __xen_spin_lock even after it has been released. Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-08 13:59:06 -04:00
Martin Schwidefsky	d1748302f7	clockevents: Make minimum delay adjustments configurable The automatic increase of the min_delta_ns of a clockevents device should be done in the clockevents code as the minimum delay is an attribute of the clockevents device. In addition not all architectures want the automatic adjustment, on a massively virtualized system it can happen that the programming of a clock event fails several times in a row because the virtual cpu has been rescheduled quickly enough. In that case the minimum delay will erroneously be increased with no way back. The new config symbol GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic adjustment. The config option is selected only for x86. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-09-08 11:10:56 +02:00
K. Y. Srinivasan	6f4151c89b	x86: Hyper-V: Integrate the clocksource with Hyper-V detection code The Hyper-V clocksource driver is best integrated with Hyper-V detection code since: (a) Linux guests running on Hyper-V require it (b) Integration into that code significanly reduces code size Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Cc: gregkh@suse.de Cc: devel@linuxdriverproject.org Cc: virtualization@lists.osdl.org Link: http://lkml.kernel.org/r/1315434310-4827-1-git-send-email-kys@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-09-08 10:33:59 +02:00
Linus Torvalds	b0fb422281	Merge branch 'perf-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip * 'perf-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86, perf: Check that current->mm is alive before getting user callchain perf_event: Fix broken calc_timer_values() perf events: Fix slow and broken cgroup context switch code	2011-09-07 13:00:11 -07:00
Linus Torvalds	1154526753	Merge branch 'stable/bug.fixes' of git://oss.oracle.com/git/kwilk/xen * 'stable/bug.fixes' of git://oss.oracle.com/git/kwilk/xen: xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead. xen: x86_32: do not enable iterrupts when returning from exception in interrupt context xen: use maximum reservation to limit amount of usable RAM	2011-09-07 07:46:48 -07:00
Linus Torvalds	768b56f598	Merge branch 'kvm-updates/3.1' of git://github.com/avikivity/kvm * 'kvm-updates/3.1' of git://github.com/avikivity/kvm: KVM: Fix instruction size issue in pvclock scaling	2011-09-07 07:45:43 -07:00
Konrad Rzeszutek Wilk	ed467e69f1	xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead. We have hit a couple of customer bugs where they would like to use those parameters to run an UP kernel - but both of those options turn of important sources of interrupt information so we end up not being able to boot. The correct way is to pass in 'dom0_max_vcpus=1' on the Xen hypervisor line and the kernel will patch itself to be a UP kernel. Fixes bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637308 CC: stable@kernel.org Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-01 12:54:49 -04:00
Igor Mammedov	d198d49914	xen: x86_32: do not enable iterrupts when returning from exception in interrupt context If vmalloc page_fault happens inside of interrupt handler with interrupts disabled then on exit path from exception handler when there is no pending interrupts, the following code (arch/x86/xen/xen-asm_32.S:112): cmpw $0x0001, XEN_vcpu_info_pending(%eax) sete XEN_vcpu_info_mask(%eax) will enable interrupts even if they has been previously disabled according to eflags from the bounce frame (arch/x86/xen/xen-asm_32.S:99) testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp) setz XEN_vcpu_info_mask(%eax) Solution is in setting XEN_vcpu_info_mask only when it should be set according to cmpw $0x0001, XEN_vcpu_info_pending(%eax) but not clearing it if there isn't any pending events. Reproducer for bug is attached to RHBZ 707552 CC: stable@kernel.org Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-01 12:54:42 -04:00
David Vrabel	d312ae878b	xen: use maximum reservation to limit amount of usable RAM Use the domain's maximum reservation to limit the amount of extra RAM for the memory balloon. This reduces the size of the pages tables and the amount of reserved low memory (which defaults to about 1/32 of the total RAM). On a system with 8 GiB of RAM with the domain limited to 1 GiB the kernel reports: Before: Memory: 627792k/4472000k available After: Memory: 549740k/11132224k available A increase of about 76 MiB (~1.5% of the unused 7 GiB). The reserved low memory is also reduced from 253 MiB to 32 MiB. The total additional usable RAM is 329 MiB. For dom0, this requires at patch to Xen ('x86: use 'dom0_mem' to limit the number of pages for dom0') (c/s 23790) CC: stable@kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-01 09:41:40 -04:00
Andrey Vagin	20afc60f89	x86, perf: Check that current->mm is alive before getting user callchain An event may occur when an mm is already released. I added an event in dequeue_entity() and caught a panic with the following backtrace: [ 434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 434.421258] IP: [<ffffffff810464ac>] __get_user_pages_fast+0x9c/0x120 ... [ 434.421258] Call Trace: [ 434.421258] [<ffffffff8101ae81>] copy_from_user_nmi+0x51/0xf0 [ 434.421258] [<ffffffff8109a0d5>] ? sched_clock_local+0x25/0x90 [ 434.421258] [<ffffffff8101b048>] perf_callchain_user+0x128/0x170 [ 434.421258] [<ffffffff811154cd>] ? __perf_event_header__init_id+0xed/0x100 [ 434.421258] [<ffffffff81116690>] perf_prepare_sample+0x200/0x280 [ 434.421258] [<ffffffff81118da8>] __perf_event_overflow+0x1b8/0x290 [ 434.421258] [<ffffffff81065240>] ? tg_shares_up+0x0/0x670 [ 434.421258] [<ffffffff8104fe1a>] ? walk_tg_tree+0x6a/0xb0 [ 434.421258] [<ffffffff81118f44>] perf_swevent_overflow+0xc4/0xf0 [ 434.421258] [<ffffffff81119150>] do_perf_sw_event+0x1e0/0x250 [ 434.421258] [<ffffffff81119204>] perf_tp_event+0x44/0x70 [ 434.421258] [<ffffffff8105701f>] ftrace_profile_sched_block+0xdf/0x110 [ 434.421258] [<ffffffff8106121d>] dequeue_entity+0x2ad/0x2d0 [ 434.421258] [<ffffffff810614ec>] dequeue_task_fair+0x1c/0x60 [ 434.421258] [<ffffffff8105818a>] dequeue_task+0x9a/0xb0 [ 434.421258] [<ffffffff810581e2>] deactivate_task+0x42/0xe0 [ 434.421258] [<ffffffff814bc019>] thread_return+0x191/0x808 [ 434.421258] [<ffffffff81098a44>] ? switch_task_namespaces+0x24/0x60 [ 434.421258] [<ffffffff8106f4c4>] do_exit+0x464/0x910 [ 434.421258] [<ffffffff8106f9c8>] do_group_exit+0x58/0xd0 [ 434.421258] [<ffffffff8106fa57>] sys_exit_group+0x17/0x20 [ 434.421258] [<ffffffff8100b202>] system_call_fastpath+0x16/0x1b Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-08-31 15:56:31 +02:00
Duncan Sands	3b217116ed	KVM: Fix instruction size issue in pvclock scaling Commit `de2d1a524e` ("KVM: Fix register corruption in pvclock_scale_delta") introduced a mul instruction that may have only a memory operand; the assembler therefore cannot select the correct size: pvclock.s:229: Error: no instruction mnemonic suffix given and no register operands; can't size instruction In this example the assembler is: #APP mul -48(%rbp) ; shrd $32, %rdx, %rax #NO_APP A simple solution is to use mulq. Signed-off-by: Duncan Sands <baldrick@free.fr> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-08-30 14:42:30 +03:00
Jeremy Fitzhardinge	61e2cd0acc	x86, cmpxchg: Use __compiletime_error() to make usage messages a bit nicer Use __compiletime_error() to produce a compile-time error rather than link-time, where available. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-29 17:20:40 -07:00
Jeremy Fitzhardinge	229855d6f3	x86, ticketlock: Make __ticket_spin_trylock common Make trylock code common regardless of ticket size. (Also, rename arch_spinlock.slock to head_tail.) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-29 13:46:34 -07:00
Jeremy Fitzhardinge	2994488fe5	x86, ticketlock: Convert __ticket_spin_lock to use xadd() Convert the two variants of __ticket_spin_lock() to use xadd(), which has the effect of making them identical, so remove the duplicate function. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-29 13:46:07 -07:00
Jeremy Fitzhardinge	c576a3ea90	x86, ticketlock: Convert spin loop to C The inner loop of __ticket_spin_lock isn't doing anything very special, so reimplement it in C. For the 8 bit ticket lock variant, we use a register union to get direct access to the lower and upper bytes in the tickets, but unfortunately gcc won't generate a direct comparison between the two halves of the register, so the generated asm isn't quite as pretty as the hand-coded version. However benchmarking shows that this is actually a small improvement in runtime performance on some benchmarks, and never a slowdown. We also need to make sure there's a barrier at the end of the lock loop to make sure that the compiler doesn't move any instructions from within the locked region into the region where we don't yet own the lock. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-29 13:45:43 -07:00
Jeremy Fitzhardinge	84eb950db1	x86, ticketlock: Clean up types and accessors A few cleanups to the way spinlocks are defined and accessed: - define __ticket_t which is the size of a spinlock ticket (ie, enough bits to hold all the cpus) - Define struct arch_spinlock as a union containing plain slock and the head and tail tickets - Use head and tail to implement some of the spinlock predicates. - Make all ticket variables unsigned. - Use TICKET_SHIFT to form constants Most of this will be used in later patches. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-29 13:45:19 -07:00
Jeremy Fitzhardinge	8b8bc2f731	x86: Use xadd helper more widely This covers the trivial cases from open-coded xadd to the xadd macros. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-29 13:44:12 -07:00
Jeremy Fitzhardinge	433b352061	x86: Add xadd helper macro Add a common xadd implementation. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-29 13:42:20 -07:00
Jeremy Fitzhardinge	e9826380d8	x86, cmpxchg: Unify cmpxchg into cmpxchg.h Everything that's actually common between 32 and 64-bit is moved into cmpxchg.h. xchg/cmpxchg will fail with a link error if they're passed an unsupported size (which includes 64-bit args on 32-bit systems). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-29 13:42:10 -07:00

1 2 3 4 5 ...

14096 Commits