Commit Graph

14303 Commits

Author SHA1 Message Date
Linus Torvalds 8e8da023f5 x86: Fix boot failures on older AMD CPU's
People with old AMD chips are getting hung boots, because commit
bcb80e5387 ("x86, microcode, AMD: Add microcode revision to
/proc/cpuinfo") moved the microcode detection too early into
"early_init_amd()".

At that point we are *so* early in the booth that the exception tables
haven't even been set up yet, so the whole

	rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);

doesn't actually work: if the rdmsr does a GP fault (due to non-existant
MSR register on older CPU's), we can't fix it up yet, and the boot fails.

Fix it by simply moving the code to a slightly later point in the boot
(init_amd() instead of early_init_amd()), since the kernel itself
doesn't even really care about the microcode patchlevel at this point
(or really ever: it's made available to user space in /proc/cpuinfo, and
updated if you do a microcode load).

Reported-tested-and-bisected-by:  Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Bob Tracy <rct@gherkin.frus.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-04 11:57:09 -08:00
Konrad Rzeszutek Wilk e5fd47bfab xen/pm_idle: Make pm_idle be default_idle under Xen.
The idea behind commit d91ee5863b ("cpuidle: replace xen access to x86
pm_idle and default_idle") was to have one call - disable_cpuidle()
which would make pm_idle not be molested by other code.  It disallows
cpuidle_idle_call to be set to pm_idle (which is excellent).

But in the select_idle_routine() and idle_setup(), the pm_idle can still
be set to either: amd_e400_idle, mwait_idle or default_idle.  This
depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.

In case of mwait_idle we can hit some instances where the hypervisor
(Amazon EC2 specifically) sets the MWAIT and we get:

  Brought up 2 CPUs
  invalid opcode: 0000 [#1] SMP

  Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
  RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
  ...
  Call Trace:
   [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
   [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
  RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
   RSP <ffff8801d28ddf10>

In the case of amd_e400_idle we don't get so spectacular crashes, but we
do end up making an MSR which is trapped in the hypervisor, and then
follow it up with a yield hypercall.  Meaning we end up going to
hypervisor twice instead of just once.

The previous behavior before v3.0 was that pm_idle was set to
default_idle regardless of select_idle_routine/idle_setup.

We want to do that, but only for one specific case: Xen.  This patch
does that.

Fixes RH BZ #739499 and Ubuntu #881076
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-03 10:49:58 -08:00
Tejun Heo d4bbf7e775 Merge branch 'master' into x86/memblock
Conflicts & resolutions:

* arch/x86/xen/setup.c

	dc91c728fd "xen: allow extra memory to be in multiple regions"
	24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

	conflicted on xen_add_extra_mem() updates.  The resolution is
	trivial as the latter just want to replace
	memblock_x86_reserve_range() with memblock_reserve().

* drivers/pci/intel-iommu.c

	166e9278a3 "x86/ia64: intel-iommu: move to drivers/iommu/"
	5dfe8660a3 "bootmem: Replace work_with_active_regions() with..."

	conflicted as the former moved the file under drivers/iommu/.
	Resolved by applying the chnages from the latter on the moved
	file.

* mm/Kconfig

	6661672053 "memblock: add NO_BOOTMEM config symbol"
	c378ddd53f "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

	conflicted trivially.  Both added config options.  Just
	letting both add their own options resolves the conflict.

* mm/memblock.c

	d1f0ece6cd "mm/memblock.c: small function definition fixes"
	ed7b56a799 "memblock: Remove memblock_memory_can_coalesce()"

	confliected.  The former updates function removed by the
	latter.  Resolution is trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-28 09:46:22 -08:00
Greg Kroah-Hartman dd7c7c3f69 Merge 3.2-rc3 into tty-next to handle merge conflict in tty_ldisc.c
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 20:07:25 -08:00
Jeremy Fitzhardinge 31a8394e06 x86: consolidate xchg and xadd macros
They both have a basic "put new value in location, return old value"
pattern, so they can use the same macro easily.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
2011-11-25 10:43:12 -08:00
Jeremy Fitzhardinge 3d94ae0c70 x86/cmpxchg: add a locked add() helper
Mostly to remove some conditional code in spinlock.h.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
2011-11-25 10:42:59 -08:00
Rafael J. Wysocki 986b11c3ee Merge branch 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into pm-freezer
* 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: (24 commits)
  freezer: fix wait_event_freezable/__thaw_task races
  freezer: kill unused set_freezable_with_signal()
  dmatest: don't use set_freezable_with_signal()
  usb_storage: don't use set_freezable_with_signal()
  freezer: remove unused @sig_only from freeze_task()
  freezer: use lock_task_sighand() in fake_signal_wake_up()
  freezer: restructure __refrigerator()
  freezer: fix set_freezable[_with_signal]() race
  freezer: remove should_send_signal() and update frozen()
  freezer: remove now unused TIF_FREEZE
  freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE
  cgroup_freezer: prepare for removal of TIF_FREEZE
  freezer: clean up freeze_processes() failure path
  freezer: kill PF_FREEZING
  freezer: test freezable conditions while holding freezer_lock
  freezer: make freezing indicate freeze condition in effect
  freezer: use dedicated lock instead of task_lock() + memory barrier
  freezer: don't distinguish nosig tasks on thaw
  freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks
  freezer: rename thaw_process() to __thaw_task() and simplify the implementation
  ...
2011-11-23 21:09:02 +01:00
Paul Bolle 282e5aaba2 x86: Kconfig: drop unknown symbol 'APM_MODULE'
There's no Kconfig symbol APM_MODULE, so the check for it will always
fail. There's no need to append _MODULE to tristate symbols anyhow,
because the config tools will do the right thing automagically.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-11-22 22:57:20 +01:00
Deepak Saxena b0145bf366 time: x86: Remove CLOCK_TICK_RATE from mach_timer.h
CLOCK_TICK_RATE is defined as PIT_TICK_RATE on x86 so we
update mach_timers.h to just use the later as we want
to depecrate CLOCK_TICK_RATE.

Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-21 19:00:57 -08:00
Deepak Saxena b7743970b0 time: x86: Remove CLOCK_TICK_RATE from tsc code
The tsc code uses CLOCK_TICK_RATE which on x86
is defined to just be the same as PIT_TICK_RATE.
This patch updates the code use the later
as we want to depecrate and remove the global
CLOCK_TICK_RATE symbol.

Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-21 19:00:56 -08:00
Tejun Heo d88e4cb671 freezer: remove now unused TIF_FREEZE
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
2011-11-21 12:32:25 -08:00
Al Viro cc11f9edd9 fix braino in um patchset (mea culpa)
wrong register returned...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-21 12:10:21 -08:00
Linus Torvalds a4cc3889f7 Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM guest: prevent tracing recursion with kvmclock
  Revert "KVM: PPC: Add support for explicit HIOR setting"
  KVM: VMX: Check for automatic switch msr table overflow
  KVM: VMX: Add support for guest/host-only profiling
  KVM: VMX: add support for switching of PERF_GLOBAL_CTRL
  KVM: s390: announce SYNC_MMU
  KVM: s390: Fix tprot locking
  KVM: s390: handle SIGP sense running intercepts
  KVM: s390: Fix RUNNING flag misinterpretation
2011-11-20 14:57:43 -08:00
Avi Kivity 95ef1e5292 KVM guest: prevent tracing recursion with kvmclock
Prevent tracing of preempt_disable() in get_cpu_var() in
kvm_clock_read(). When CONFIG_DEBUG_PREEMPT is enabled,
preempt_disable/enable() are traced and this causes the function_graph
tracer to go into an infinite recursion. By open coding the
preempt_disable() around the get_cpu_var(), we can use the notrace
version which prevents preempt_disable/enable() from being traced and
prevents the recursion.

Based on a similar patch for Xen from Jeremy Fitzhardinge.

Tested-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-20 10:53:48 +02:00
Linus Torvalds 5c6b4e84cb Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  random: Fix handing of arch_get_random_long in get_random_bytes()
  x86: Call stop_machine_text_poke() on all CPUs
  x86, ioapic: Only print ioapic debug information for IRQs belonging to an ioapic chip
  x86/mrst: Avoid reporting wrong nmi status
  x86/mrst: Add support for Penwell clock calibration
  x86/apic: Allow use of lapic timer early calibration result
  x86/apic: Do not clear nr_irqs_gsi if no legacy irqs
  x86/platform: Add a wallclock_init func to x86_platforms ops
  x86/mce: Make mce_chrdev_ops 'static const'
2011-11-18 22:16:18 -02:00
Linus Torvalds b684452383 Merge branch 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen-gntalloc: signedness bug in add_grefs()
  xen-gntalloc: integer overflow in gntalloc_ioctl_alloc()
  xen-gntdev: integer overflow in gntdev_alloc_map()
  xen:pvhvm: enable PVHVM VCPU placement when using more than 32 CPUs.
  xen/balloon: Avoid OOM when requesting highmem
  xen: Remove hanging references to CONFIG_XEN_PLATFORM_PCI
  xen: map foreign pages for shared rings by updating the PTEs directly
2011-11-18 13:18:07 -02:00
Gleb Natapov e7fc6f93b4 KVM: VMX: Check for automatic switch msr table overflow
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17 16:28:09 +02:00
Gleb Natapov d7cd97964b KVM: VMX: Add support for guest/host-only profiling
Support guest/host-only profiling by switch perf msrs on
a guest entry if needed.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17 16:28:00 +02:00
Gleb Natapov 8bf00a5299 KVM: VMX: add support for switching of PERF_GLOBAL_CTRL
Some cpus have special support for switching PERF_GLOBAL_CTRL msr.
Add logic to detect if such support exists and works properly and extend
msr switching code to use it if available. Also extend number of generic
msr switching entries to 8.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17 16:27:54 +02:00
Salman Qazi 4cecf6d401 sched, x86: Avoid unnecessary overflow in sched_clock
(Added the missing signed-off-by line)

In hundreds of days, the __cycles_2_ns calculation in sched_clock
has an overflow.  cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero.  We can solve this without losing
any precision.

We can decompose TSC into quotient and remainder of division by the
scale factor, and then use this to convert TSC into nanoseconds.

Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111115221121.7262.88871.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16 19:51:25 +01:00
Zhenzhong Duan 90d4f5534d xen:pvhvm: enable PVHVM VCPU placement when using more than 32 CPUs.
PVHVM running with more than 32 vcpus and pv_irq/pv_time enabled
need VCPU placement to work, or else it will softlockup.

CC: stable@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-16 12:13:44 -05:00
David Vrabel cd12909cb5 xen: map foreign pages for shared rings by updating the PTEs directly
When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs.  This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
[v1: Squashed fix by Michal for no-mmu case]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
2011-11-16 12:13:08 -05:00
Mika Westerberg b82e324b3c serial, mfd: don't hardcode the console
Add support to specify which HSU port to use as an early console. This can
be selected by passing "earlyprintk=hsu<n>" on the kernel command line. By
default port 0 is still used.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-15 15:50:30 -08:00
Ingo Molnar c23205c848 Merge branch 'core' of git://amd64.org/linux/rric into perf/core 2011-11-15 11:05:18 +01:00
Ingo Molnar 4a1dba7238 Merge branch 'urgent' of git://amd64.org/linux/rric into perf/urgent 2011-11-15 11:03:30 +01:00
Rabin Vincent 78345d2edc x86: Call stop_machine_text_poke() on all CPUs
It appears that stop_machine_text_poke() wants to be called on all CPUs,
like it's done from text_poke_smp().  Fix text_poke_smp_batch() to do
this.

Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jason Baron <jbaron@redhat.com>
Link: http://lkml.kernel.org/r/1319702072-32676-1-git-send-email-rabin@rab.in
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14 13:05:15 +01:00
Peter Zijlstra ed13ec58bf perf/x86: Enable raw event access to Intel offcore events
Now that the core offcore support is fixed up (thanks Stephane) and we
have sane generic events utilizing them, re-enable the raw access to
the feature as well.

Note that it doesn't matter if you use event 0x1b7 or 0x1bb to specify
an offcore event, either one works and neither guarantees you'll end
up on a particular offcore MSR.

Based on original patch from: Vince Weaver <vweaver1@eecs.utk.edu>.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vince Weaver <vweaver1@eecs.utk.edu>.
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1108031200390.703@cl320.eecs.utk.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14 13:03:44 +01:00
Peter Zijlstra aa2bc1ade5 perf: Don't use -ENOSPC for out of PMU resources
People (Linus) objected to using -ENOSPC to signal not having enough
resources on the PMU to satisfy the request. Use -EINVAL.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-xv8geaz2zpbjhlx0svmpp28n@git.kernel.org
[ merged to newer kernel, fixed up MIPS impact ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14 13:01:24 +01:00
Peter Zijlstra 57d1c0c03c perf/x86: Fix PEBS instruction unwind
Masami spotted that we always try to decode the instruction stream as
64bit instructions when running a 64bit kernel, this doesn't work for
ia32-compat proglets.

Use TIF_IA32 to detect if we need to use the 32bit instruction
decoder.

Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14 13:01:20 +01:00
William Douglas 9f80d8b68f bma023: Add SFI translation for this device
This needed the sfi IRQ 0xFF fix to go in first. It simply plumbs in the
bma023 driver with the firmware naming of it.

Signed-off-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-11 23:58:58 -02:00
Feng Tang 57e6319dd6 vrtc: change its year offset from 1960 to 1972
Real world year equals the value in vrtc YEAR register plus an offset.
We used 1960 as the offset to make leap year consistent, but for a
device's first use, its YEAR register is 0 and the system year will
be parsed as 1960 which is not a valid UNIX time and will cause many
applications to fail mysteriously. So we use 1972 instead to fix this
issue.

Updated patch which adds a sanity check suggested by Mathias

This isn't a change in behaviour for systems, because 1972 is the one we
actually use. It's the old version in upstream which is out of sync with
all devices.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-11 23:58:58 -02:00
Zhang Rui f2ee442115 ce4100: fix a build error
Fix a build error. CE4100 with no serial errors because the alternate
function is only a prototype not a null function as intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-11 23:58:58 -02:00
Mathias Nyman 6fd36ba021 x86, ioapic: Only print ioapic debug information for IRQs belonging to an ioapic chip
with "apic=verbose" the print_IO_APIC() function tries to print
IRQ to pin mappings for every active irq. It assumes chip_data
is of type irq_cfg and may cause an oops if not.

As the print_IO_APIC() is called from a late_initcall other
chained irq chips may already be registered with custom
chip_data information, causing an oops. This is the case with
intel MID SoC devices with gpio demuxers registered as irq_chips.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
[ -v2: fixed build failure ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10 18:31:23 +01:00
Jacob Pan 064a59b6dd x86/mrst: Avoid reporting wrong nmi status
Moorestown/Medfield platform does not have port 0x61 to report
NMI status, nor does it have external NMI sources. The only NMI
sources are from lapic, as results of perf counter overflow or
IPI, e.g. NMI watchdog or spin lock debug.

Reading port 0x61 on Moorestown will return 0xff which misled
NMI handlers to false critical errors such memory parity error.
The subsequent ioport access for NMI handling can also cause
undefined behavior on Moorestown.

This patch allows kernel process NMI due to watchdog or backrace
dump without unnecessary hangs.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[hand applied]
Signed-off-by: Alan Cox <alan@linux.intel.com>
2011-11-10 16:21:01 +01:00
Dirk Brandewie 0a9153261d x86/mrst: Add support for Penwell clock calibration
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10 16:20:59 +01:00
Jacob Pan 1ade93efd0 x86/apic: Allow use of lapic timer early calibration result
lapic timer calibration can be combined with tsc in platform
specific calibration functions. if such calibration result is
obtained early, we can skip the redundant calibration loops.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10 16:20:57 +01:00
Jacob Pan bb84ac2d3a x86/apic: Do not clear nr_irqs_gsi if no legacy irqs
nr_legacy_irqs is set in probe_nr_irqs_gsi, we should not clear
it after that. Otherwise, the result is that MSI irqs will be
allocated from the wrong range for the systems without legacy
PIC.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10 16:20:55 +01:00
Feng Tang cf8ff6b6ab x86/platform: Add a wallclock_init func to x86_platforms ops
Some wall clock devices use MMIO based HW register, this new
function will give them a chance to do some initialization work
before their get/set_time service get called.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10 16:20:53 +01:00
Masami Hiramatsu 1ec454baf1 x86, perf: Add a build-time sanity test to the x86 decoder
Add a sanity test of x86 insn decoder against a stream
of randomly generated input, at build time.

This test is also able to reproduce any bug that might
trigger by allowing the passing of random-seed and
iteration-number to the test, or by passing input
which has invalid byte code.

Changes in V2:
 - Code cleanup.
 - Show how to reproduce the error by insn_sanity test.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: acme@redhat.com
Cc: ming.m.lin@intel.com
Cc: robert.richter@amd.com
Cc: ravitillo@lbl.gov
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20111020140109.20938.92572.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10 12:38:51 +01:00
Luck, Tony 66f5ddf30a x86/mce: Make mce_chrdev_ops 'static const'
Arjan would like to make struct file_operations const, but
mce-inject directly writes to the mce_chrdev_ops to install its
write handler. In an ideal world mce-inject would have its own
character device, but we have a sizable legacy of test scripts
that hardwire "/dev/mcelog", so it would be painful to switch to
a separate device now. Instead, this patch switches to a stub
function in the mce code, with a registration helper that
mce-inject can call when it is loaded.

Note that this would also allow for a sane process to allow
mce-inject to be unloaded again (with an unregister function,
and appropriate module_{get,put}() calls), but that is left for
potential future patches.

Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4eb2e1971326651a3b@agluck-desktop.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-08 16:17:11 +01:00
Robert Richter de346b6949 Merge branch 'perf/core' into oprofile/master
Merge reason: Resolve conflicts with Don's NMI rework:

    commit 9c48f1c629
    Author: Don Zickus <dzickus@redhat.com>
    Date:   Fri Sep 30 15:06:21 2011 -0400
    x86, nmi: Wire up NMI handlers to new routines

Conflicts:
	arch/x86/oprofile/nmi_timer_int.c

Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-11-08 15:52:15 +01:00
Linus Torvalds 3c00303206 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  cpuidle: Single/Global registration of idle states
  cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
  cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()
  cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
  ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning
  ACPI: Export FADT pm_profile integer value to userspace
  thermal: Prevent polling from happening during system suspend
  ACPI: Drop ACPI_NO_HARDWARE_INIT
  ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast()
  PNPACPI: Simplify disabled resource registration
  ACPI: Fix possible recursive locking in hwregs.c
  ACPI: use kstrdup()
  mrst pmu: update comment
  tools/power turbostat: less verbose debugging
2011-11-07 10:13:52 -08:00
Linus Torvalds b32fc0a062 Merge branch 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  jump-label: initialize jump-label subsystem much earlier
  x86/jump_label: add arch_jump_label_transform_static()
  s390/jump-label: add arch_jump_label_transform_static()
  jump_label: add arch_jump_label_transform_static() to optimise non-live code updates
  sparc/jump_label: drop arch_jump_label_text_poke_early()
  x86/jump_label: drop arch_jump_label_text_poke_early()
  jump_label: if a key has already been initialized, don't nop it out
  stop_machine: make stop_machine safe and efficient to call early
  jump_label: use proper atomic_t initializer

Conflicts:
 - arch/x86/kernel/jump_label.c
	Added __init_or_module to arch_jump_label_text_poke_early vs
	removal of that function entirely
 - kernel/stop_machine.c
	same patch ("stop_machine: make stop_machine safe and efficient
	to call early") merged twice, with whitespace fix in one version
2011-11-06 20:20:46 -08:00
Linus Torvalds 403299a851 Merge branch 'upstream/xen-settime' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/xen-settime' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen/dom0: set wallclock time in Xen
  xen: add dom0_op hypercall
  xen/acpi: Domain0 acpi parser related platform hypercall
2011-11-06 20:15:05 -08:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds 02ebbbd481 Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  scsi: drop unused Kconfig symbol
  pci: drop unused Kconfig symbol
  stmmac: drop unused Kconfig symbol
  x86: drop unused Kconfig symbol
  powerpc: drop unused Kconfig symbols
  powerpc: 40x: drop unused Kconfig symbol
  mips: drop unused Kconfig symbols
  openrisc: drop unused Kconfig symbols
  arm: at91: drop unused Kconfig symbol
  samples: drop unused Kconfig symbol
  m32r: drop unused Kconfig symbol
  score: drop unused Kconfig symbols
  sh: drop unused Kconfig symbol
  um: drop unused Kconfig symbol
  sparc: drop unused Kconfig symbol
  alpha: drop unused Kconfig symbol

Fix up trivial conflict in drivers/net/ethernet/stmicro/stmmac/Kconfig
as per Michal: the STMMAC_DUAL_MAC config variable is still unused and
should be deleted.
2011-11-06 18:54:53 -08:00
Linus Torvalds 06d381484f Merge branch 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  net: xen-netback: use API provided by xenbus module to map rings
  block: xen-blkback: use API provided by xenbus module to map rings
  xen: use generic functions instead of xen_{alloc, free}_vm_area()
2011-11-06 18:31:36 -08:00
Len Brown 22f4521d66 mrst pmu: update comment
referenced MeeGo, in particular, but really means Linux, in general.

Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 18:32:45 -05:00
Robert Richter dcfce4a095 oprofile, x86: Reimplement nmi timer mode using perf event
The legacy x86 nmi watchdog code was removed with the implementation
of the perf based nmi watchdog. This broke Oprofile's nmi timer
mode. To run nmi timer mode we relied on a continuous ticking nmi
source which the nmi watchdog provided. The nmi tick was no longer
available and current watchdog can not be used anymore since it runs
with very long periods in the range of seconds. This patch
reimplements the nmi timer mode using a perf counter nmi source.

V2:
* removing pr_info()
* fix undefined reference to `__udivdi3' for 32 bit build
* fix section mismatch of .cpuinit.data:nmi_timer_cpu_nb
* removed nmi timer setup in arch/x86
* implemented function stubs for op_nmi_init/exit()
* made code more readable in oprofile_init()

V3:
* fix architectural initialization in oprofile_init()
* fix CONFIG_OPROFILE_NMI_TIMER dependencies

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-11-04 16:27:18 +01:00
Robert Richter 159a80b214 oprofile, x86: Add kernel parameter oprofile.cpu_type=timer
We need this to better test x86 NMI timer mode. Otherwise it is very
hard to setup systems with NMI timer enabled, but we have this e.g. in
virtual machine environments.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-11-04 15:04:34 +01:00
Robert Richter 97f7f8189f oprofile, x86: Fix crash when unloading module (nmi timer mode)
If oprofile uses the nmi timer interrupt there is a crash while
unloading the module. The bug can be triggered with oprofile build as
module and kernel parameter nolapic set. This patch fixes this.

oprofile: using NMI timer interrupt.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
PGD 42dbca067 PUD 41da6a067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
CPU 5
Modules linked in: oprofile(-) [last unloaded: oprofile]

Pid: 2518, comm: modprobe Not tainted 3.1.0-rc7-00019-gb2fb49d #19 Advanced Micro Device Anaheim/Anaheim
RIP: 0010:[<ffffffff8123c226>]  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
RSP: 0018:ffff88041ef71e98  EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffffffffa0017100 RCX: dead000000200200
RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620
RBP: ffff88041ef71ea8 R08: 0000000000000001 R09: 0000000000000082
R10: 0000000000000000 R11: ffff88041ef71de8 R12: 0000000000000080
R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210
FS:  00007fc902f20700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000041cdb6000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 2518, threadinfo ffff88041ef70000, task ffff88041d348040)
Stack:
 ffff88041ef71eb8 ffffffffa0017790 ffff88041ef71eb8 ffffffffa0013532
 ffff88041ef71ec8 ffffffffa00132d6 ffff88041ef71ed8 ffffffffa00159b2
 ffff88041ef71f78 ffffffff81073115 656c69666f72706f 0000000000610200
Call Trace:
 [<ffffffffa0013532>] op_nmi_exit+0x15/0x17 [oprofile]
 [<ffffffffa00132d6>] oprofile_arch_exit+0xe/0x10 [oprofile]
 [<ffffffffa00159b2>] oprofile_exit+0x1e/0x20 [oprofile]
 [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f
 [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b
Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81
 89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b
RIP  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
 RSP <ffff88041ef71e98>
CR2: 0000000000000008
---[ end trace 43a541a52956b7b0 ]---

CC: stable@kernel.org # 2.6.37+
Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-11-04 15:04:33 +01:00
Linus Torvalds a0a4194c94 Merge branch 'for-next' of git://git.infradead.org/users/sameo/mfd-2.6
* 'for-next' of git://git.infradead.org/users/sameo/mfd-2.6: (80 commits)
  mfd: Fix missing abx500 header file updates
  mfd: Add missing <linux/io.h> include to intel_msic
  x86, mrst: add platform support for MSIC MFD driver
  mfd: Expose TurnOnStatus in ab8500 sysfs
  mfd: Remove support for early drop ab8500 chip
  mfd: Add support for ab8500 v3.3
  mfd: Add ab8500 interrupt disable hook
  mfd: Convert db8500-prcmu panic() into pr_crit()
  mfd: Refactor db8500-prcmu request_clock() function
  mfd: Rename db8500-prcmu init function
  mfd: Fix db5500-prcmu defines
  mfd: db8500-prcmu voltage domain consumers additions
  mfd: db8500-prcmu reset code retrieval
  mfd: db8500-prcmu tweak for modem wakeup
  mfd: Add db8500-pcmu watchdog accessor functions for watchdog
  mfd: hwacc power state db8500-prcmu accessor
  mfd: Add db8500-prcmu accessors for PLL and SGA clock
  mfd: Move to the new db500 PRCMU API
  mfd: Create a common interface for dbx500 PRCMU drivers
  mfd: Initialize DB8500 PRCMU regs
  ...

Fix up trivial conflicts in
	arch/arm/mach-imx/mach-mx31moboard.c
	arch/arm/mach-omap2/board-omap3beagle.c
	arch/arm/mach-u300/include/mach/irqs.h
	drivers/mfd/wm831x-spi.c
2011-11-03 09:40:51 -07:00
Linus Torvalds 6681ba7ec4 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (21 commits)
  MAINTAINERS: add an entry for Edac Sandy Bridge driver
  edac: tag sb_edac as EXPERIMENTAL, as it requires more testing
  EDAC: Fix incorrect edac mode reporting in sb_edac
  edac: sb_edac: Add it to the building system
  edac: Add an experimental new driver to support Sandy Bridge CPU's
  i7300_edac: Fix error cleanup logic
  i7core_edac: Initialize memory name with cpu, channel, bank
  i7core_edac: Fix compilation on 32 bits arch
  i7core_edac: scrubbing fixups
  EDAC: Correct Kconfig dependencies
  i7core_edac: return -ENODEV if no MC is found
  i7core_edac: use edac's own way to print errors
  MAINTAINERS: remove dropped edac_mce.* from the file
  i7core_edac: Drop the edac_mce facility
  x86, MCE: Use notifier chain only for MCE decoding
  EDAC i7core: Use mce socketid for better compatibility
  i7core_edac: Don't enable memory scrubbing for Xeon 35xx
  i7core_edac: Add scrubbing support
  edac: Move edac main structs to include/linux/edac.h
  i7core_edac: Fix oops when trying to inject errors
  ...
2011-11-02 16:55:15 -07:00
Linus Torvalds 092f4c56c1 Merge branch 'akpm' (Andrew's incoming - part two)
Says Andrew:

 "60 patches.  That's good enough for -rc1 I guess.  I have quite a lot
  of detritus to be rechecked, work through maintainers, etc.

 - most of the remains of MM
 - rtc
 - various misc
 - cgroups
 - memcg
 - cpusets
 - procfs
 - ipc
 - rapidio
 - sysctl
 - pps
 - w1
 - drivers/misc
 - aio"

* akpm: (60 commits)
  memcg: replace ss->id_lock with a rwlock
  aio: allocate kiocbs in batches
  drivers/misc/vmw_balloon.c: fix typo in code comment
  drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
  w1: disable irqs in critical section
  drivers/w1/w1_int.c: multiple masters used same init_name
  drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
  drivers/power/ds2780_battery.c: add a nolock function to w1 interface
  drivers/power/ds2780_battery.c: create central point for calling w1 interface
  w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
  pps gpio client: add missing dependency
  pps: new client driver using GPIO
  pps: default echo function
  include/linux/dma-mapping.h: add dma_zalloc_coherent()
  sysctl: make CONFIG_SYSCTL_SYSCALL default to n
  sysctl: add support for poll()
  RapidIO: documentation update
  drivers/net/rionet.c: fix ethernet address macros for LE platforms
  RapidIO: fix potential null deref in rio_setup_device()
  RapidIO: add mport driver for Tsi721 bridge
  ...
2011-11-02 16:07:27 -07:00
Andrea Arcangeli b35a35b556 thp: share get_huge_page_tail()
This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Andrea Arcangeli 70b50f94f1 mm: thp: tail page refcounting fix
Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page->_count in __split_huge_page_refcount() (in addition to the
head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages.  The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it.  A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Linus Torvalds de0a5345a5 Merge branch 'for-linus' of git://github.com/richardweinberger/linux
* 'for-linus' of git://github.com/richardweinberger/linux: (90 commits)
  um: fix ubd cow size
  um: Fix kmalloc argument order in um/vdso/vma.c
  um: switch to use of drivers/Kconfig
  UserModeLinux-HOWTO.txt: fix a typo
  UserModeLinux-HOWTO.txt: remove ^H characters
  um: we need sys/user.h only on i386
  um: merge delay_{32,64}.c
  um: distribute exports to where exported stuff is defined
  um: kill system-um.h
  um: generic ftrace.h will do...
  um: segment.h is x86-only and needed only there
  um: asm/pda.h is not needed anymore
  um: hw_irq.h can go generic as well
  um: switch to generic-y
  um: clean Kconfig up a bit
  um: a couple of missing dependencies...
  um: kill useless argument of free_chan() and free_one_chan()
  um: unify ptrace_user.h
  um: unify KSTK_...
  um: fix gcov build breakage
  ...
2011-11-02 09:45:39 -07:00
Dave Jones 0d65ede0a6 um: Fix kmalloc argument order in um/vdso/vma.c
kmalloc size is 1st arg, not second.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

Cc: <stable@kernel.org> # 3.0.x
[richard@nod.at: on 3.0 the to be patched file is
arch/um/sys-x86_64/vdso/vma.c]
2011-11-02 14:15:42 +01:00
Richard Weinberger 38b64aed78 um: we need sys/user.h only on i386
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:38 +01:00
Richard Weinberger d0af6cbfa2 um: merge delay_{32,64}.c
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:37 +01:00
Al Viro a34978cbd9 um: kill system-um.h
most of it belonged in irqflags.h, actually

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:34 +01:00
Al Viro 46ecca8ae1 um: segment.h is x86-only and needed only there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:33 +01:00
Al Viro 966e803ab1 um: unify ptrace_user.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:27 +01:00
Al Viro a10c95d84c um: unify KSTK_...
... and switch get_thread_register() to HOST_... for register numbers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:26 +01:00
Al Viro 4d211093e8 um: fix gcov build breakage
a) exports in gmon_syms.c duplicate kernel/gcov/* ones
b) excluding -pg in vdso compile is not enough - -fprofile-arcs
and -ftest-coverage also needs to be excluded

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:26 +01:00
Al Viro 3fb77d7256 um: irq_vectors.h just shadows x86 one
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:24 +01:00
Al Viro ff9586e98f um: required-features.h is there only to shadow x86 one...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:23 +01:00
Al Viro 8807c1d561 um: asm/apic.h is there only to shadow the x86 one...
... so take it to arch/um/x86/asm.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:22 +01:00
Al Viro b3ee571e58 um: take ldt.h to arch/x86/um/asm/mm_context.h
it's x86-only and we have no business playing with it in asm/mmu.h; make
the latter have
	struct uml_arch_mm_context arch;
instead of
	struct uml_ldt ldt;
and let arch/<subarch>/um/asm/mm_context.h decide what'll be in there.
While we are at it, kill host_ldt.h - it's not needed in part of places
that include it (we want asm/ldt.h in those) and it can be trivially
expanded into the single remaining one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:21 +01:00
Al Viro f67aa2ffb7 um: merge signal_{32,64}.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:20 +01:00
Al Viro fbe9868693 um: no need to play with save_sp in signal frame setup anymore
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:19 +01:00
Al Viro c7ea591c91 um: increase stack growth cushion in pagefault
analog of [PATCH] i386: let usermode execute the "enter" instruction from
circa 2006.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:19 +01:00
Al Viro 3579a38973 um: merge HOST_... of registers common on i386 and amd64
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:18 +01:00
Al Viro 8edc4147be um: sanitize paths in sys_call_table* includes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:18 +01:00
Al Viro 1bbd5f21f4 um: merge os-Linux/tls.c into arch/x86/um/os-Linux/tls.c
it's i386-specific; moreover, analogs on other targets have
incompatible interface - PTRACE_GET_THREAD_AREA does exist
elsewhere, but struct user_desc does *not*

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:17 +01:00
Al Viro c5cc32fe14 um: move asm/desc.h into arch/x86/um/asm
its only purpose is to shadow the x86 asm/desc.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:16 +01:00
Al Viro 2014d01878 um: merge host_ldt_{32,64}.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:15 +01:00
Al Viro 09e129a603 um: merge tls_{32,64}.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:15 +01:00
Al Viro 5ade8878e0 um: kill shared/task.h and HOST_TASK_REGS
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:08 +01:00
Al Viro 0acdbbeb6d um: bury unused macros around ptrace.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:05 +01:00
Al Viro 5c48b108ec um: take arch/um/sys-x86 to arch/x86/um
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:05 +01:00
Linus Torvalds dc47d3810c Merge git://github.com/herbertx/crypto
* git://github.com/herbertx/crypto: (48 commits)
  crypto: user - Depend on NET instead of selecting it
  crypto: user - Add dependency on NET
  crypto: talitos - handle descriptor not found in error path
  crypto: user - Initialise match in crypto_alg_match
  crypto: testmgr - add twofish tests
  crypto: testmgr - add blowfish test-vectors
  crypto: Make hifn_795x build depend on !ARCH_DMA_ADDR_T_64BIT
  crypto: twofish-x86_64-3way - fix ctr blocksize to 1
  crypto: blowfish-x86_64 - fix ctr blocksize to 1
  crypto: whirlpool - count rounds from 0
  crypto: Add userspace report for compress type algorithms
  crypto: Add userspace report for cipher type algorithms
  crypto: Add userspace report for rng type algorithms
  crypto: Add userspace report for pcompress type algorithms
  crypto: Add userspace report for nivaead type algorithms
  crypto: Add userspace report for aead type algorithms
  crypto: Add userspace report for givcipher type algorithms
  crypto: Add userspace report for ablkcipher type algorithms
  crypto: Add userspace report for blkcipher type algorithms
  crypto: Add userspace report for ahash type algorithms
  ...
2011-11-01 09:24:41 -07:00
Borislav Petkov 4140c54266 i7core_edac: Drop the edac_mce facility
Remove edac_mce pieces and use the normal MCE decoder notifier chain by
retaining the same functionality with considerably less code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:24 -02:00
Christopher Yeoh fcf634098c Cross Memory Attach
The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:44 -07:00
Paul Gortmaker 39a0e33da0 lguest: add export.h to lguest files for THIS_MODULE/EXPORT_SYMBOL
We need this in advance of the module.h cleanup, or we'll
get compile errors like this:

  CC      drivers/lguest/lguest_device.o
drivers/lguest/lguest_device.c: In function ‘lguest_devices_init’:
drivers/lguest/lguest_device.c:490: error: ‘THIS_MODULE’ undeclared (first use in this function)

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:13 -04:00
Paul Gortmaker 783ac47c2a x86: efi_32.c is implicitly getting asm/desc.h via module.h
We want to clean up the chain of includes stumbling through
module.h, and when we do that, we'll see:

  CC      arch/x86/platform/efi/efi_32.o
  efi/efi_32.c: In function ‘efi_call_phys_prelog’:
  efi/efi_32.c:80: error: implicit declaration of function ‘get_cpu_gdt_table’
  efi/efi_32.c:82: error: implicit declaration of function ‘load_gdt’
  make[4]: *** [arch/x86/platform/efi/efi_32.o] Error 1

Include asm/desc.h so that there are no implicit include assumptions.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:36 -04:00
Paul Gortmaker 7c52d55170 x86: fix up files really needing to include module.h
These files aren't just exporting symbols -- they are also defining
a MODULE_LICENSE etc. so give them the full module.h file.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:36 -04:00
Paul Gortmaker 69c60c88ee x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULE
These files were implicitly getting EXPORT_SYMBOL via device.h
which was including module.h, but that will be fixed up shortly.

By fixing these now, we can avoid seeing things like:

arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’

[ with input from Randy Dunlap <rdunlap@xenotime.net> and also
  from Stephen Rothwell <sfr@canb.auug.org.au> ]

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:35 -04:00
Paul Gortmaker 2957402297 x86: fix implicit include of <linux/topology.h> in vsyscall_64
In removing the presence of <linux/module.h> from some of the
more common <linux/something.h> files, this implict include
of <linux/topology.h> was uncovered.

  CC      arch/x86/kernel/vsyscall_64.o
  arch/x86/kernel/vsyscall_64.c: In function ‘vsyscall_set_cpu’:
  arch/x86/kernel/vsyscall_64.c:259: error: implicit declaration of function ‘cpu_to_node’

Explicitly call it out so the cleanup can take place.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:34 -04:00
Paul Bolle 4f4d7a9ba0 x86: drop unused Kconfig symbol
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-10-31 23:39:52 +01:00
Borislav Petkov f0cb545243 x86, MCE: Use notifier chain only for MCE decoding
Drop the edac_mce custom hook in favor of the generic notifier
mechanism. Also, do not log the error to mcelog if the notified agent
was able to decode it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31 15:10:05 -02:00
Linus Torvalds 0cfdc72439 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits)
  iommu/core: Remove global iommu_ops and register_iommu
  iommu/msm: Use bus_set_iommu instead of register_iommu
  iommu/omap: Use bus_set_iommu instead of register_iommu
  iommu/vt-d: Use bus_set_iommu instead of register_iommu
  iommu/amd: Use bus_set_iommu instead of register_iommu
  iommu/core: Use bus->iommu_ops in the iommu-api
  iommu/core: Convert iommu_found to iommu_present
  iommu/core: Add bus_type parameter to iommu_domain_alloc
  Driver core: Add iommu_ops to bus_type
  iommu/core: Define iommu_ops and register_iommu only with CONFIG_IOMMU_API
  iommu/amd: Fix wrong shift direction
  iommu/omap: always provide iommu debug code
  iommu/core: let drivers know if an iommu fault handler isn't installed
  iommu/core: export iommu_set_fault_handler()
  iommu/omap: Fix build error with !IOMMU_SUPPORT
  iommu/omap: Migrate to the generic fault report mechanism
  iommu/core: Add fault reporting mechanism
  iommu/core: Use PAGE_SIZE instead of hard-coded value
  iommu/core: use the existing IS_ALIGNED macro
  iommu/msm: ->unmap() should return order of unmapped page
  ...

Fixup trivial conflicts in drivers/iommu/Makefile: "move omap iommu to
dedicated iommu folder" vs "Rename the DMAR and INTR_REMAP config
options" just happened to touch lines next to each other.
2011-10-30 15:46:19 -07:00
Linus Torvalds 1bc87b0055 Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (75 commits)
  KVM: SVM: Keep intercepting task switching with NPT enabled
  KVM: s390: implement sigp external call
  KVM: s390: fix register setting
  KVM: s390: fix return value of kvm_arch_init_vm
  KVM: s390: check cpu_id prior to using it
  KVM: emulate lapic tsc deadline timer for guest
  x86: TSC deadline definitions
  KVM: Fix simultaneous NMIs
  KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode
  KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode
  KVM: x86 emulator: streamline decode of segment registers
  KVM: x86 emulator: simplify OpMem64 decode
  KVM: x86 emulator: switch src decode to decode_operand()
  KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack
  KVM: x86 emulator: switch OpImmUByte decode to decode_imm()
  KVM: x86 emulator: free up some flag bits near src, dst
  KVM: x86 emulator: switch src2 to generic decode_operand()
  KVM: x86 emulator: expand decode flags to 64 bits
  KVM: x86 emulator: split dst decode to a generic decode_operand()
  KVM: x86 emulator: move memop, memopp into emulation context
  ...
2011-10-30 15:36:45 -07:00
Jan Kiszka f1c1da2bde KVM: SVM: Keep intercepting task switching with NPT enabled
AMD processors apparently have a bug in the hardware task switching
support when NPT is enabled. If the task switch triggers a NPF, we can
get wrong EXITINTINFO along with that fault. On resume, spurious
exceptions may then be injected into the guest.

We were able to reproduce this bug when our guest triggered #SS and the
handler were supposed to run over a separate task with not yet touched
stack pages.

Work around the issue by continuing to emulate task switches even in
NPT mode.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-10-30 12:24:10 +02:00
Linus Torvalds ce949717b5 Merge git://github.com/rustyrussell/linux
* git://github.com/rustyrussell/linux:
  lguest: move process freezing before pending signals check
  lguest: don't allow KVM-detection cpuid.
  lguest: Allow running under paravirt-enabled KVM.
2011-10-29 07:52:16 -07:00
Linus Torvalds 0e59e7e7fe Merge branch 'next-rebase' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci
* 'next-rebase' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci:
  PCI: Clean-up MPS debug output
  pci: Clamp pcie_set_readrq() when using "performance" settings
  PCI: enable MPS "performance" setting to properly handle bridge MPS
  PCI: Workaround for Intel MPS errata
  PCI: Add support for PASID capability
  PCI: Add implementation for PRI capability
  PCI: Export ATS functions to modules
  PCI: Move ATS implementation into own file
  PCI / PM: Remove unnecessary error variable from acpi_dev_run_wake()
  PCI hotplug: acpiphp: Prevent deadlock on PCI-to-PCI bridge remove
  PCI / PM: Extend PME polling to all PCI devices
  PCI quirk: mmc: Always check for lower base frequency quirk for Ricoh 1180:e823
  PCI: Make pci_setup_bridge() non-static for use by arch code
  x86: constify PCI raw ops structures
  PCI: Add quirk for known incorrect MPSS
  PCI: Add Solarflare vendor ID and SFC4000 device IDs
2011-10-28 14:20:44 -07:00
Linus Torvalds f362f98e7c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits)
  leases: fix write-open/read-lease race
  nfs: drop unnecessary locking in llseek
  ext4: replace cut'n'pasted llseek code with generic_file_llseek_size
  vfs: add generic_file_llseek_size
  vfs: do (nearly) lockless generic_file_llseek
  direct-io: merge direct_io_walker into __blockdev_direct_IO
  direct-io: inline the complete submission path
  direct-io: separate map_bh from dio
  direct-io: use a slab cache for struct dio
  direct-io: rearrange fields in dio/dio_submit to avoid holes
  direct-io: fix a wrong comment
  direct-io: separate fields only used in the submission path from struct dio
  vfs: fix spinning prevention in prune_icache_sb
  vfs: add a comment to inode_permission()
  vfs: pass all mask flags check_acl and posix_acl_permission
  vfs: add hex format for MAY_* flag values
  vfs: indicate that the permission functions take all the MAY_* flags
  compat: sync compat_stats with statfs.
  vfs: add "device" tag to /proc/self/mountstats
  cleanup: vfs: small comment fix for block_invalidatepage
  ...

Fix up trivial conflict in fs/gfs2/file.c (llseek changes)
2011-10-28 10:49:34 -07:00
Eric W. Biederman 1448c721e4 compat: sync compat_stats with statfs.
This was found by inspection while tracking a similar
bug in compat_statfs64, that has been fixed in mainline
since decemeber.

- This fixes a bug where not all of the f_spare fields
  were cleared on mips and s390.
- Add the f_flags field to struct compat_statfs
- Copy f_flags to userspace in case someone cares.
- Use __clear_user to copy the f_spare field to userspace
  to ensure that all of the elements of f_spare are cleared.
  On some architectures f_spare is has 5 ints and on some
  architectures f_spare only has 4 ints.  Which makes
  the previous technique of clearing each int individually
  broken.

I don't expect anyone actually uses the old statfs system
call anymore but if they do let them benefit from having
the compat and the native version working the same.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:53 +02:00
Linus Torvalds ca836a2543 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64, doc: Remove int 0xcc from entry_64.S documentation
  x86, vsyscall: Add missing <asm/fixmap.h> to arch/x86/mm/fault.c

Fix up trivial conflicts in arch/x86/mm/fault.c (asm/fixmap.h vs
asm/vsyscall.h: both work, which to use? Whatever..)
2011-10-28 05:46:02 -07:00
Linus Torvalds d630ba565f Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: uv2: Workaround for UV2 Hub bug (system global address format)
2011-10-28 05:43:56 -07:00
Linus Torvalds 5fd862f1db Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, ticketlock: remove obsolete comment
  x86, cmpxchg: Use __compiletime_error() to make usage messages a bit nicer
  x86, ticketlock: Make __ticket_spin_trylock common
  x86, ticketlock: Convert __ticket_spin_lock to use xadd()
  x86, ticketlock: Convert spin loop to C
  x86, ticketlock: Clean up types and accessors
  x86: Use xadd helper more widely
  x86: Add xadd helper macro
  x86, cmpxchg: Unify cmpxchg into cmpxchg.h
  x86, cmpxchg: Move 64-bit set64_bit() to match 32-bit
  x86, cmpxchg: Move 32-bit __cmpxchg_wrong_size to match 64 bit.
  x86, cmpxchg: <linux/alternative.h> has LOCK_PREFIX
2011-10-28 05:43:29 -07:00
Linus Torvalds 8e6d539e0f Merge branch 'x86-rdrand-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-rdrand-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, random: Verify RDRAND functionality and allow it to be disabled
  x86, random: Architectural inlines to get random integers with RDRAND
  random: Add support for architectural random hooks

Fix up trivial conflicts in drivers/char/random.c: the architectural
random hooks touched "get_random_int()" that was simplified to use MD5
and not do the keyptr thing any more (see commit 6e5714eaf77d: "net:
Compute protocol sequence numbers and fragment IDs using MD5").
2011-10-28 05:29:07 -07:00
Linus Torvalds 8237eb946a Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Add microcode revision to /proc/cpuinfo
  x86, microcode: Correct microcode revision format
  coretemp: Get microcode revision from cpu_data
  x86, intel: Use c->microcode for Atom errata check
  x86, intel: Output microcode revision in /proc/cpuinfo
  x86, microcode: Don't request microcode from userspace unnecessarily

Fix up trivial conflicts in arch/x86/kernel/cpu/amd.c (conflict between
moving AMD BSP code to cpu_dev helper function and adding AMD microcode
revision to /proc/cpuinfo code)
2011-10-28 05:14:48 -07:00
Linus Torvalds cc21fe518a Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Hyper-V: Integrate the clocksource with Hyper-V detection code

Fix up conflicts in drivers/staging/hv/Makefile manually (some of the hv
code has moved out of staging to drivers/hv/)
2011-10-28 05:08:40 -07:00
Linus Torvalds a93f3e9f42 Merge branch 'x86-geode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-geode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: geode: New PCEngines Alix system driver
2011-10-28 05:04:26 -07:00
Linus Torvalds 107095a946 Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpu: Add cpufeature flag for PCIDs
2011-10-28 05:04:04 -07:00
Linus Torvalds e34eb39c1c Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, amd: Include linux/elf.h since we use stuff from asm/elf.h
  x86: cache_info: Update calculation of AMD L3 cache indices
  x86: cache_info: Kill the atomic allocation in amd_init_l3_cache()
  x86: cache_info: Kill the moronic shadow struct
  x86: cache_info: Remove bogus free of amd_l3_cache data
  x86, amd: Include elf.h explicitly, prepare the code for the module.h split
  x86-32, amd: Move va_align definition to unbreak 32-bit build
  x86, amd: Move BSP code to cpu_dev helper
  x86: Add a BSP cpu_dev helper
  x86, amd: Avoid cache aliasing penalties on AMD family 15h
2011-10-28 05:03:12 -07:00
Rusty Russell 89cfc99177 lguest: don't allow KVM-detection cpuid.
Host might be running under KVM, but we shouldn't allow Guest to think it
can use KVM hypercalls (it can't, and it will embarrass itself if it tries).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-10-27 10:56:17 +10:30
Linus Torvalds dfa4a423cf Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64, unistd: Remove bogus __IGNORE_getcpu
  x86, mm, trivial: Remove unnecessary get_order() in free_thread_info()
  x86, cleanup: Remove unneeded version.h include from arch/x86/
2011-10-26 17:43:08 +02:00
Linus Torvalds 7b86572a7a Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64: Fix CFI data for interrupt frames
  x86-64: Don't apply destructive erratum workaround on unaffected CPUs
2011-10-26 17:42:03 +02:00
Linus Torvalds 0791e98dd1 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Standardize on CONFIG_SPARSE_IRQ=y
  x86, ioapic: Clean up ioapic/apic_id usage
  x86, ioapic: Factor out print_IO_APIC() to only print one io apic
  x86, ioapic: Print out irte with right ioapic index
  x86, ioapic: Split up setup_ioapic_entry()
  x86, ioapic: Pass struct irq_attr * to setup_ioapic_irq()
  apic, i386/bigsmp: Fix false warnings regarding logical APIC ID mismatches
2011-10-26 17:30:33 +02:00
Linus Torvalds 39adff5f69 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  time, s390: Get rid of compile warning
  dw_apb_timer: constify clocksource name
  time: Cleanup old CONFIG_GENERIC_TIME references that snuck in
  time: Change jiffies_to_clock_t() argument type to unsigned long
  alarmtimers: Fix error handling
  clocksource: Make watchdog reset lockless
  posix-cpu-timers: Cure SMP accounting oddities
  s390: Use direct ktime path for s390 clockevent device
  clockevents: Add direct ktime programming function
  clockevents: Make minimum delay adjustments configurable
  nohz: Remove "Switched to NOHz mode" debugging messages
  proc: Consider NO_HZ when printing idle and iowait times
  nohz: Make idle/iowait counter update conditional
  nohz: Fix update_ts_time_stat idle accounting
  cputime: Clean up cputime_to_usecs and usecs_to_cputime macros
  alarmtimers: Rework RTC device selection using class interface
  alarmtimers: Add try_to_cancel functionality
  alarmtimers: Add more refined alarm state tracking
  alarmtimers: Remove period from alarm structure
  alarmtimers: Remove interval cap limit hack
  ...
2011-10-26 17:15:03 +02:00
Linus Torvalds 8686a0e200 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix insn decoder for longer instruction
2011-10-26 17:07:07 +02:00
Linus Torvalds 7115e3fcf4 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (121 commits)
  perf symbols: Increase symbol KSYM_NAME_LEN size
  perf hists browser: Refuse 'a' hotkey on non symbolic views
  perf ui browser: Use libslang to read keys
  perf tools: Fix tracing info recording
  perf hists browser: Elide DSO column when it is set to just one DSO, ditto for threads
  perf hists: Don't consider filtered entries when calculating column widths
  perf hists: Don't decay total_period for filtered entries
  perf hists browser: Honour symbol_conf.show_{nr_samples,total_period}
  perf hists browser: Do not exit on tab key with single event
  perf annotate browser: Don't change selection line when returning from callq
  perf tools: handle endianness of feature bitmap
  perf tools: Add prelink suggestion to dso update message
  perf script: Fix unknown feature comment
  perf hists browser: Apply the dso and thread filters when merging new batches
  perf hists: Move the dso and thread filters from hist_browser
  perf ui browser: Honour the xterm colors
  perf top tui: Give color hints just on the percentage, like on --stdio
  perf ui browser: Make the colors configurable and change the defaults
  perf tui: Remove unneeded call to newtCls on startup
  perf hists: Don't format the percentage on hist_entry__snprintf
  ...

Fix up conflicts in arch/x86/kernel/kprobes.c manually.

Ingo's tree did the insane "add volatile to const array", which just
doesn't make sense ("volatile const"?).  But we could remove the const
*and* make the array volatile to make doubly sure that gcc doesn't
optimize it away..

Also fix up kernel/trace/ring_buffer.c non-data-conflicts manually: the
reader_lock has been turned into a raw lock by the core locking merge,
and there was a new user of it introduced in this perf core merge.  Make
sure that new use also uses the raw accessor functions.
2011-10-26 17:03:38 +02:00
Linus Torvalds 3cfef95246 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  rtmutex: Add missing rcu_read_unlock() in debug_rt_mutex_print_deadlock()
  lockdep: Comment all warnings
  lib: atomic64: Change the type of local lock to raw_spinlock_t
  locking, lib/atomic64: Annotate atomic64_lock::lock as raw
  locking, x86, iommu: Annotate qi->q_lock as raw
  locking, x86, iommu: Annotate irq_2_ir_lock as raw
  locking, x86, iommu: Annotate iommu->register_lock as raw
  locking, dma, ipu: Annotate bank_lock as raw
  locking, ARM: Annotate low level hw locks as raw
  locking, drivers/dca: Annotate dca_lock as raw
  locking, powerpc: Annotate uic->lock as raw
  locking, x86: mce: Annotate cmci_discover_lock as raw
  locking, ACPI: Annotate c3_lock as raw
  locking, oprofile: Annotate oprofilefs lock as raw
  locking, video: Annotate vga console lock as raw
  locking, latencytop: Annotate latency_lock as raw
  locking, timer_stats: Annotate table_lock as raw
  locking, rwsem: Annotate inner lock as raw
  locking, semaphores: Annotate inner lock as raw
  locking, sched: Annotate thread_group_cputimer as raw
  ...

Fix up conflicts in kernel/posix-cpu-timers.c manually: making
cputimer->cputime a raw lock conflicted with the ABBA fix in commit
bcd5cff721 ("cputimer: Cure lock inversion").
2011-10-26 16:17:32 +02:00
Linus Torvalds 982653009b Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, ioapic: Consolidate the explicit EOI code
  x86, ioapic: Restore the mask bit correctly in eoi_ioapic_irq()
  x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
  iommu: Rename the DMAR and INTR_REMAP config options
  x86, ioapic: Define irq_remap_modify_chip_defaults()
  x86, msi, intr-remap: Use the ioapic set affinity routine
  iommu: Cleanup ifdefs in detect_intel_iommu()
  iommu: No need to set dmar_disabled in check_zero_address()
  iommu: Move IOMMU specific code to intel-iommu.c
  intr_remap: Call dmar_dev_scope_init() explicitly
  x86, x2apic: Enable the bios request for x2apic optout
2011-10-26 16:11:53 +02:00
Linus Torvalds aa77677e0a Merge branch 'staging-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
* 'staging-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1519 commits)
  staging: et131x: Remove redundant check and return statement
  staging: et131x: Mainly whitespace changes to appease checkpatch
  staging: et131x: Remove last of the forward declarations
  staging: et131x: Remove even more forward declarations
  staging: et131x: Remove yet more forward declarations
  staging: et131x: Remove more forward declarations
  staging: et131x: Remove forward declaration of et131x_adapter_setup
  staging: et131x: Remove some forward declarations
  staging: et131x: Remove unused rx_ring.recv_packet_pool
  staging: et131x: Remove call to find pci pm capability
  staging: et131x: Remove redundant et131x_reset_recv() call
  staging: et131x: Remove unused rx_ring.recv_buffer_pool
  Staging: bcm: Fix three initialization errors in InterfaceDld.c
  Staging: bcm: Fix coding style issues in InterfaceDld.c
  staging:iio:dac: Add AD5360 driver
  staging:iio:trigger:bfin-timer: Fix compile error
  Staging: vt6655: add some range checks before memcpy()
  Staging: vt6655: whitespace fixes to iotcl.c
  Staging: vt6656: add some range checks before memcpy()
  Staging: vt6656: whitespace cleanups in ioctl.c
  ...

Fix up conflicts in:
 - drivers/{Kconfig,Makefile}, drivers/staging/{Kconfig,Makefile}:
	vg driver movement
 - drivers/staging/brcm80211/brcmfmac/{dhd_linux.c,mac80211_if.c}:
	driver removal vs now stale changes
 - drivers/staging/rtl8192e/r8192E_core.c:
	driver removal vs now stale changes
 - drivers/staging/et131x/et131*:
	driver consolidation into one file, tried to do fixups
2011-10-26 15:39:02 +02:00
Linus Torvalds efb8d21b2c Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits)
  TTY: serial_core: Fix crash if DCD drop during suspend
  tty/serial: atmel_serial: bootconsole removed from auto-enumerates
  Revert "TTY: call tty_driver_lookup_tty unconditionally"
  tty/serial: atmel_serial: add device tree support
  tty/serial: atmel_serial: auto-enumerate ports
  tty/serial: atmel_serial: whitespace and braces modifications
  tty/serial: atmel_serial: change platform_data variable name
  tty/serial: RS485 bindings for device tree
  TTY: call tty_driver_lookup_tty unconditionally
  TTY: pty, release tty in all ptmx_open fail paths
  TTY: make tty_add_file non-failing
  TTY: drop driver reference in tty_open fail path
  8250_pci: Fix kernel panic when pch_uart is disabled
  h8300: drivers/serial/Kconfig was moved
  parport_pc: release IO region properly if unsupported ITE887x card is found
  tty: Support compat_ioctl get/set termios_locked
  hvc_console: display printk messages on console.
  TTY: snyclinkmp: forever loop in tx_load_dma_buffer()
  tty/n_gsm: avoid fifo overflow in gsm_dlci_data_output
  tty/n_gsm: fix a bug in gsm_dlci_data_output (adaption = 2 case)
  ...

Fix up Conflicts in:
 - drivers/tty/serial/8250_pci.c
	Trivial conflict with removed duplicate device ID
 - drivers/tty/serial/atmel_serial.c
	Annoying silly conflict between "specify the port num via
	platform_data" and other changes to atmel_console_init
2011-10-26 15:11:09 +02:00
Jeremy Fitzhardinge e71a5be15e x86/jump_label: add arch_jump_label_transform_static()
This allows jump-label entries to be cheaply updated on code which is
not yet live.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2011-10-25 11:54:42 -07:00
Jeremy Fitzhardinge b7e3155818 x86/jump_label: drop arch_jump_label_text_poke_early()
It is no longer used.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2011-10-25 11:54:21 -07:00
Linus Torvalds 59e5253417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
  MAINTAINERS: linux-m32r is moderated for non-subscribers
  linux@lists.openrisc.net is moderated for non-subscribers
  Drop default from "DM365 codec select" choice
  parisc: Kconfig: cleanup Kernel page size default
  Kconfig: remove redundant CONFIG_ prefix on two symbols
  cris: remove arch/cris/arch-v32/lib/nand_init.S
  microblaze: add missing CONFIG_ prefixes
  h8300: drop puzzling Kconfig dependencies
  MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
  tty: drop superfluous dependency in Kconfig
  ARM: mxc: fix Kconfig typo 'i.MX51'
  Fix file references in Kconfig files
  aic7xxx: fix Kconfig references to READMEs
  Fix file references in drivers/ide/
  thinkpad_acpi: Fix printk typo 'bluestooth'
  bcmring: drop commented out line in Kconfig
  btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
  doc: raw1394: Trivial typo fix
  CIFS: Don't free volume_info->UNC until we are entirely done with it.
  treewide: Correct spelling of successfully in comments
  ...
2011-10-25 12:11:02 +02:00
Linus Torvalds 04a8752485 Merge branches 'stable/drivers-3.2', 'stable/drivers.bugfixes-3.2' and 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/drivers-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xenbus: don't rely on xen_initial_domain to detect local xenstore
  xenbus: Fix loopback event channel assuming domain 0
  xen/pv-on-hvm:kexec: Fix implicit declaration of function 'xen_hvm_domain'
  xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel
  xen/pv-on-hvm kexec: update xs_wire.h:xsd_sockmsg_type from xen-unstable
  xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
  xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
  xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when stale watch events arrive

* 'stable/drivers.bugfixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pciback: Check if the device is found instead of blindly assuming so.
  xen/pciback: Do not dereference psdev during printk when it is NULL.
  xen: remove XEN_PLATFORM_PCI config option
  xen: XEN_PVHVM depends on PCI
  xen/pciback: double lock typo
  xen/pciback: use mutex rather than spinlock in vpci backend
  xen/pciback: Use mutexes when working with Xenbus state transitions.
  xen/pciback: miscellaneous adjustments
  xen/pciback: use mutex rather than spinlock in passthrough backend
  xen/pciback: use resource_size()

* 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pci: support multi-segment systems
  xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.
  xen/pci: make bus notifier handler return sane values
  xen-swiotlb: fix printk and panic args
  xen-swiotlb: Fix wrong panic.
  xen-swiotlb: Retry up three times to allocate Xen-SWIOTLB
  xen-pcifront: Update warning comment to use 'e820_host' option.
2011-10-25 09:19:36 +02:00
Greg Kroah-Hartman 43a3beb6da Merge branch 'staging-next' into Linux 3.1
This was done to resolve a conflict in the
drivers/staging/comedi/drivers/ni_labpc.c file that resolved a build
bugfix in Linus's tree with a "better" bugfix that was in the
staging-next tree that resolved the issue in a more complete manner.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-25 09:18:11 +02:00
Linus Torvalds 31018acd4c Merge branches 'stable/bug.fixes-3.2' and 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/p2m/debugfs: Make type_name more obvious.
  xen/p2m/debugfs: Fix potential pointer exception.
  xen/enlighten: Fix compile warnings and set cx to known value.
  xen/xenbus: Remove the unnecessary check.
  xen/irq: If we fail during msi_capability_init return proper error code.
  xen/events: Don't check the info for NULL as it is already done.
  xen/events: BUG() when we can't allocate our event->irq array.

* 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix selfballooning and ensure it doesn't go too far
  xen/gntdev: Fix sleep-inside-spinlock
  xen: modify kernel mappings corresponding to granted pages
  xen: add an "highmem" parameter to alloc_xenballooned_pages
  xen/p2m: Use SetPagePrivate and its friends for M2P overrides.
  xen/p2m: Make debug/xen/mmu/p2m visible again.
  Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
2011-10-25 09:17:47 +02:00
Linus Torvalds 5eef150c1d Merge branch 'stable/e820-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/e820-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: release all pages within 1-1 p2m mappings
  xen: allow extra memory to be in multiple regions
  xen: allow balloon driver to use more than one memory region
  xen/balloon: simplify test for the end of usable RAM
  xen/balloon: account for pages released during memory setup
2011-10-25 09:17:07 +02:00
Josh Stone 315eb8a2a1 x86: Fix compilation bug in kprobes' twobyte_is_boostable
When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I
noticed a warning about the asm operand for test_bit in kprobes'
can_boost.  I discovered that this caused only the first long of
twobyte_is_boostable[] to be output.

Jakub filed and fixed gcc PR50571 to correct the warning and this output
issue.  But to solve it for less current gcc, we can make kprobes'
twobyte_is_boostable[] non-const, and it won't be optimized out.

Before:

    CC      arch/x86/kernel/kprobes.o
  In file included from include/linux/bitops.h:22:0,
                   from include/linux/kernel.h:17,
                   from [...]/arch/x86/include/asm/percpu.h:44,
                   from [...]/arch/x86/include/asm/current.h:5,
                   from [...]/arch/x86/include/asm/processor.h:15,
                   from [...]/arch/x86/include/asm/atomic.h:6,
                   from include/linux/atomic.h:4,
                   from include/linux/mutex.h:18,
                   from include/linux/notifier.h:13,
                   from include/linux/kprobes.h:34,
                   from arch/x86/kernel/kprobes.c:43:
  [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’:
  [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input
        without lvalue in asm operand 1 is deprecated [enabled by default]

  $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
       551:	0f a3 05 00 00 00 00 	bt     %eax,0x0
                          554: R_386_32	.rodata.cst4

  $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o

  arch/x86/kernel/kprobes.o:     file format elf32-i386

  Contents of section .data:
   0000 48000000 00000000 00000000 00000000  H...............
  Contents of section .rodata.cst4:
   0000 4c030000                             L...

Only a single long of twobyte_is_boostable[] is in the object file.

After, without the const on twobyte_is_boostable:

  $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
       551:	0f a3 05 20 00 00 00 	bt     %eax,0x20
                          554: R_386_32	.data

  $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o

  arch/x86/kernel/kprobes.o:     file format elf32-i386

  Contents of section .data:
   0000 48000000 00000000 00000000 00000000  H...............
   0010 00000000 00000000 00000000 00000000  ................
   0020 4c030000 0f000200 ffff0000 ffcff0c0  L...............
   0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77  ....;.......&..w

Now all 32 bytes are output into .data instead.

Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-25 09:00:14 +02:00
Mika Westerberg 360545c1fc x86, mrst: add platform support for MSIC MFD driver
The MSIC MFD driver creates platform devices for MSIC device drivers so we
don't need to create them in platform code anymore.

This patch adds a new runtime check which determines whether we are running
on a Medfield platform and enables the MSIC MFD driver accordingly.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-10-24 14:09:20 +02:00
Alan Cox 42c2544b2d x86, mrst: Some drivers need to known when an SCU is available
Add a notifier so that drivers can hook into SCU availability in order to
take actions post initialisation when/if the SCU becomes available.

In the ideal world we wouldn't need this and we could avoid any init
dependancies of this form, but in practice we can't do it for some cases.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-10-24 14:09:15 +02:00
Takashi Iwai 8548c84da2 x86: Fix S4 regression
Commit 4b239f458 ("x86-64, mm: Put early page table high") causes a S4
regression since 2.6.39, namely the machine reboots occasionally at S4
resume.  It doesn't happen always, overall rate is about 1/20.  But,
like other bugs, once when this happens, it continues to happen.

This patch fixes the problem by essentially reverting the memory
assignment in the older way.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: <stable@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Yinghai Lu <yinghai.lu@oracle.com>
[ We'll hopefully find the real fix, but that's too late for 3.1 now ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-24 06:55:20 +02:00
Joerg Roedel 1abb4ba596 Merge branches 'amd/fixes', 'debug/dma-api', 'arm/omap', 'arm/msm', 'core', 'iommu/fault-reporting' and 'api/iommu-ops-per-bus' into next
Conflicts:
	drivers/iommu/amd_iommu.c
	drivers/iommu/iommu.c
2011-10-21 14:38:55 +02:00
Joerg Roedel a1b60c1cd9 iommu/core: Convert iommu_found to iommu_present
With per-bus iommu_ops the iommu_found function needs to
work on a bus_type too. This patch adds a bus_type parameter
to that function and converts all call-places.
The function is also renamed to iommu_present because the
function now checks if an iommu is present for a given bus
and does not check for a global iommu anymore.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-10-21 14:37:20 +02:00
Jussi Kivilinna 906b2c9f2d crypto: twofish-x86_64-3way - fix ctr blocksize to 1
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-10-21 14:28:57 +02:00
Jussi Kivilinna a516ebafdf crypto: blowfish-x86_64 - fix ctr blocksize to 1
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-10-21 14:28:57 +02:00
Jussi Kivilinna 8280daad43 crypto: twofish - add 3-way parallel x86_64 assembler implemention
Patch adds 3-way parallel x86_64 assembly implementation of twofish as new
module. New assembler functions crypt data in three blocks chunks, improving
cipher performance on out-of-order CPUs.

Patch has been tested with tcrypt and automated filesystem tests.

Summary of the tcrypt benchmarks:

Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB)
 encrypt: 1.3x speed
 decrypt: 1.3x speed

Twofish 3-way-asm vs twofish asm (128bit 8kb block CBC)
 encrypt: 1.07x speed
 decrypt: 1.4x speed

Twofish 3-way-asm vs twofish asm (128bit 8kb block CTR)
 encrypt: 1.4x speed

Twofish 3-way-asm vs AES asm (128bit 8kb block ECB)
 encrypt: 1.0x speed
 decrypt: 1.0x speed

Twofish 3-way-asm vs AES asm (128bit 8kb block CBC)
 encrypt: 0.84x speed
 decrypt: 1.09x speed

Twofish 3-way-asm vs AES asm (128bit 8kb block CTR)
 encrypt: 1.15x speed

Full output:
 http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txt
 http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txt
 http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txt

Tests were run on:
 vendor_id  : AuthenticAMD
 cpu family : 16
 model      : 10
 model name : AMD Phenom(tm) II X6 1055T Processor

Also userspace test were run on:
 vendor_id  : GenuineIntel
 cpu family : 6
 model      : 15
 model name : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
 stepping   : 11

Userspace test results:

Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II:
 encrypt: 1.27x
 decrypt: 1.25x

Encryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330:
 encrypt: 1.36x
 decrypt: 1.36x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-10-21 14:23:08 +02:00
Jussi Kivilinna 91d41f159d crypto: twofish-x86-asm - make assembler functions use twofish_ctx instead of crypto_tfm
This needed by 3-way twofish patch to be able to easily use one block
assembler functions. As glue code is shared between i586/x86_64 apply
change to i586 assembler too. Also export assembler functions for
3-way parallel twofish module.

CC: Joachim Fritschi <jfritschi@freenet.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-10-21 14:23:08 +02:00
Jussi Kivilinna a071d06e34 crypto: blowfish-x86_64 - add credits
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-10-21 14:23:07 +02:00
Jussi Kivilinna e827bb09c8 crypto: blowfish-x86_64 - improve x86_64 blowfish 4-way performance
This patch adds improved F-macro for 4-way parallel functions. With new
F-macro for 4-way parallel functions, blowfish sees ~15% improvement in
speed tests on AMD Phenom II (~5% on Intel Xeon E7330).

However when used in 1-way blowfish function new macro would be ~10%
slower than original, so old F-macro is kept for 1-way functions.
Patch cleans up old F-macro as it is no longer needed in 4-way part.

Patch also does register macro renaming to reduce stack usage.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-10-21 14:23:07 +02:00
Konrad Rzeszutek Wilk a491dbef56 xen/p2m/debugfs: Make type_name more obvious.
Per Ian Campbell suggestion to defend against future breakage
in case we expand the P2M values, incorporate the defines
in the string array.

Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:03:34 -04:00
Konrad Rzeszutek Wilk 8404877ee1 xen/p2m/debugfs: Fix potential pointer exception.
We could be referencing the last + 1 element of level_name[]
array which would cause a pointer exception, because of the
initial setup of lvl=4.

[v1: No need to do this for type_name, pointed out by Ian Campbell]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:03:32 -04:00
Konrad Rzeszutek Wilk 5e28783013 xen/enlighten: Fix compile warnings and set cx to known value.
We get:
linux/arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’:
linux/arch/x86/xen/enlighten.c:226: warning: ‘cx’ may be used uninitialized in this function
linux/arch/x86/xen/enlighten.c:240: note: ‘cx’ was declared here

and the cx is really not set but passed in the xen_cpuid instruction
which masks the value with returned masked_ecx from cpuid. This
can potentially lead to invalid data being stored in cx.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:03:31 -04:00
Konrad Rzeszutek Wilk e6599225db xen/irq: If we fail during msi_capability_init return proper error code.
There are three different modes: PV, HVM, and initial domain 0. In all
the cases we would return -1 for failure instead of a proper error code.
Fix this by propagating the error code from the generic IRQ code.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:03:28 -04:00
Borislav Petkov bcb80e5387 x86, microcode, AMD: Add microcode revision to /proc/cpuinfo
Enable microcode revision output for AMD after 506ed6b53e ("x86,
intel: Output microcode revision in /proc/cpuinfo") did it for Intel.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-19 16:07:30 +02:00
Borislav Petkov 881e23e567 x86, microcode: Correct microcode revision format
506ed6b53e ("x86, intel: Output microcode revision in /proc/cpuinfo")
added microcode revision format to /proc/cpuinfo and the MCE handler in
decimal format but both AMD and Intel patch levels are handled as hex
numbers. Fix it.

Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-19 15:47:48 +02:00
Josh Stone db45bd90be x86, perf, kprobes: Make kprobes's twobyte_is_boostable volatile
When compiling an i386_defconfig kernel with
gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand
for test_bit in kprobes' can_boost. I discovered that this
caused only the first long of twobyte_is_boostable[] to be
output.

Jakub filed and fixed gcc PR50571 to correct the warning and
this output issue.  But to solve it for less current gcc, we can
make kprobes' twobyte_is_boostable[] volatile, and it won't be
optimized out.

Before:

    CC      arch/x86/kernel/kprobes.o
  In file included from include/linux/bitops.h:22:0,
                   from include/linux/kernel.h:17,
                   from [...]/arch/x86/include/asm/percpu.h:44,
                   from [...]/arch/x86/include/asm/current.h:5,
                   from [...]/arch/x86/include/asm/processor.h:15,
                   from [...]/arch/x86/include/asm/atomic.h:6,
                   from include/linux/atomic.h:4,
                   from include/linux/mutex.h:18,
                   from include/linux/notifier.h:13,
                   from include/linux/kprobes.h:34,
                   from arch/x86/kernel/kprobes.c:43:
  [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’:
  [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default]

  $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
       551:	0f a3 05 00 00 00 00 	bt     %eax,0x0
                          554: R_386_32	.rodata.cst4

  $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o

  arch/x86/kernel/kprobes.o:     file format elf32-i386

  Contents of section .data:
   0000 48000000 00000000 00000000 00000000  H...............
  Contents of section .rodata.cst4:
   0000 4c030000                             L...

Only a single long of twobyte_is_boostable[] is in the object
file.

After, with volatile:

  $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
       551:	0f a3 05 20 00 00 00 	bt     %eax,0x20
                          554: R_386_32	.data

  $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o

  arch/x86/kernel/kprobes.o:     file format elf32-i386

  Contents of section .data:
   0000 48000000 00000000 00000000 00000000  H...............
   0010 00000000 00000000 00000000 00000000  ................
   0020 4c030000 0f000200 ffff0000 ffcff0c0  L...............
   0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77  ....;.......&..w

Now all 32 bytes are output into .data instead.

Signed-off-by: Josh Stone <jistone@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Link: http://lkml.kernel.org/r/1318899645-4068-1-git-send-email-jistone@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-18 08:43:08 +02:00
Jan Beulich 72da0b07b1 x86: constify PCI raw ops structures
As with any other such change, the goal is to prevent inadvertent
writes to these structures (assuming DEBUG_RODATA is enabled), and to
separate data (possibly frequently) written to from such never getting
modified.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-10-14 09:05:28 -07:00
Andi Kleen 30963c0ac7 x86, intel: Use c->microcode for Atom errata check
Now that the cpu update level is available the Atom PSE errata
check can use it directly without reading the MSR again.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1318466795-7393-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-14 13:16:38 +02:00
Andi Kleen 506ed6b53e x86, intel: Output microcode revision in /proc/cpuinfo
I got a request to make it easier to determine the microcode
update level on Intel CPUs. This patch adds a new "microcode"
field to /proc/cpuinfo.

The microcode level is also outputed on fatal machine checks
together with the other CPUID model information.

I removed the respective code from the microcode update driver,
it just reads the field from cpu_data. Also when the microcode
is updated it fills in the new values too.

I had to add a memory barrier to native_cpuid to prevent it
being optimized away when the result is not used.

This turns out to clean up further code which already got this
information manually. This is done in followon patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1318466795-7393-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-14 13:16:35 +02:00
Linus Torvalds 2ad53110d6 Merge branch 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86: Default to vsyscall=native for now
2011-10-14 16:54:56 +12:00
Mika Westerberg 153b19a3b9 x86, mrst: use a temporary variable for SFI irq
SFI tables reside in RAM and should not be modified once they are
written.  Current code went to set pentry->irq to zero which causes
subsequent reads to fail with invalid SFI table checksum.  This will
break kexec as the second kernel fails to validate SFI tables.

To fix this we use temporary variable for irq number.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-14 16:53:27 +12:00
Srivatsa S. Bhat 70989449da x86, microcode: Don't request microcode from userspace unnecessarily
Requesting the microcode from userspace *every time* when onlining CPUs
(during a CPU hotplug operation) is unnecessary. Thus, ensure that
once the kernel gets the microcode after booting, it is not freed nor
invalidated when a CPU goes offline, so that it can be reused when that
CPU comes back online, without requesting userspace for it again. As a
result, the CPU hotplug operations become faster as well.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/4E91F908.5010006@linux.vnet.ibm.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-13 16:20:35 +02:00
Yinghai Lu 141d55e6cc x86/irq: Standardize on CONFIG_SPARSE_IRQ=y
Sparseirq got introduced in v2.6.28 and Thomas did a huge cleanup
around v2.6.38 that eliminated basically all disadvantages
of it.

So we can remove non-sparseirq support now and simplify
our IRQ degrees of freedom a bit.

Suggested-and-acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4E95E21D.6090200@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-13 12:12:12 +02:00
Ingo Molnar 910e94dd0c Merge branch 'tip/perf/core' of git://github.com/rostedt/linux into perf/core 2011-10-12 17:14:47 +02:00
Yinghai Lu 6f50d45fae x86, ioapic: Clean up ioapic/apic_id usage
While looking at the code, apic_id sometime is referred to index
of ioapic, but sometime is used for phys apic id. and some even
use apic for real apic id. It is very confusing.

So try to limit apic_id or ioapic_id to be real apic id for
ioapic, and use ioapic_idx for ioapic index in the array.

-v2: Suggested by Ingo, use ioapic_idx consistently, instead of ioapic

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542DC.3090509@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-12 09:55:28 +02:00
Yinghai Lu cda417dd87 x86, ioapic: Factor out print_IO_APIC() to only print one io apic
It is getting too big after the interrupt remaping entries debug
print out was added.

Original print_IO_APIC() becomes print_IO_APICs().
New print_IO_APIC() will only print one ioapic's registers

As a side-effect this clean-up also made checkpatch.pl happier.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542D3.5000008@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-12 09:55:25 +02:00
Yinghai Lu 3a61d7feca x86, ioapic: Print out irte with right ioapic index
While checking irte dump in dmesg, the print out is confusing
ioapic index with real io apic id:

IOAPIC[0]: Set routing entry (1-1 -> 0x31 -> IRQ 1 Mode:0
Active:0 Dest:1) IOAPIC[1]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1
Redir_hint:1 Trig_Mode:0 Dlvry_Mode:1 Avail:0 Vector:31
Dest:00000001 SID:00FF SQ:0 SVT:1) IOAPIC[0]: Set routing entry
(1-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:1) IOAPIC[1]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:1 Avail:0 Vector:30 Dest:00000001 SID:00FF SQ:0 SVT:1)

The system's first ioapic id is 1.

This commit:

| commit 3040db92ee
| Author: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
| Date:   Tue Jul 12 21:17:41 2011 +0000
|
|    x86, ioapic: Print IRTE when IR is enabled

Confused apic_id with the ioapic ID - fix it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542C8.8040209@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-12 09:55:21 +02:00
Yinghai Lu c5b4712c3f x86, ioapic: Split up setup_ioapic_entry()
Ingo pointed out that setup_ioapic_entry() is way too big now.

Split the intr-remap code out into setup_ir_ioapic_entry().

Also pass struct io_apic_irq_attr * instead of 5 parameters
in those two functions.

At last in setup_ir_ioapic_entry() we don't need to panic.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542BB.4070807@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-12 09:55:18 +02:00
Yinghai Lu e4aff81182 x86, ioapic: Pass struct irq_attr * to setup_ioapic_irq()
Do not expand that struct, and just pass pointer to reduce the
number of parameters in related functions.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542B1.7050800@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-12 09:55:15 +02:00
Adrian Bunk 2b666859ec x86: Default to vsyscall=native for now
This UML breakage:

  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790

Is caused by commit 3ae36655 ("x86-64: Rework vsyscall emulation and add
vsyscall= parameter") - the vsyscall emulation code is not fully cooked
yet as UML relies on some rather fragile SIGSEGV semantics.

Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default
to vsyscall=native for now, this patch implements that.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Andrew Lutomirski <luto@mit.edu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fi
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-11 08:23:34 +02:00
Masami Hiramatsu 53a019a951 x86: Fix insn decoder for longer instruction
Fix x86 insn decoder for hardening against invalid length
instructions. This adds length checkings for each byte-read
site and if it exceeds MAX_INSN_SIZE, returns immediately.
This can happen when decoding user-space binary.

Caller can check whether it happened by checking insn.*.got
member is set or not.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: acme@redhat.com
Cc: ming.m.lin@intel.com
Cc: robert.richter@amd.com
Cc: ravitillo@lbl.gov
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20111007133155.10933.58577.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 09:05:51 +02:00
Ingo Molnar d48b0e1737 x86, nmi, drivers: Fix nmi splitup build bug
nmi.c needs an #include <linux/mca.h>:

 arch/x86/kernel/nmi.c: In function ‘unknown_nmi_error’:
 arch/x86/kernel/nmi.c:286:6: error: ‘MCA_bus’ undeclared (first use in this function)
 arch/x86/kernel/nmi.c:286:6: note: each undeclared identifier is reported only once for each function it appears in

Another one is the hpwdt driver:

 drivers/watchdog/hpwdt.c:507:9: error: ‘NMI_DONE’ undeclared (first use in this function)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:57:21 +02:00
Robert Richter b716916679 perf, x86: Implement IBS initialization
This patch implements IBS feature detection and initialzation. The
code is shared between perf and oprofile. If IBS is available on the
system for perf, a pmu is setup.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1316597423-25723-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:57:16 +02:00
Robert Richter ee5789dbcc perf, x86: Share IBS macros between perf and oprofile
Moving IBS macros from oprofile to <asm/perf_event.h> to make it
available to perf. No additional changes.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1316597423-25723-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:57:11 +02:00
Don Zickus efc3aac5f3 x86, nmi: Track NMI usage stats
Now that the NMI handler are broken into lists, increment the appropriate
stats for each list.  This allows us to see what is going on when they
get printed out in the next patch.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-6-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:57:06 +02:00
Don Zickus b227e23399 x86, nmi: Add in logic to handle multiple events and unknown NMIs
Previous patches allow the NMI subsystem to process multipe NMI events
in one NMI.  As previously discussed this can cause issues when an event
triggered another NMI but is processed in the current NMI.  This causes the
next NMI to go unprocessed and become an 'unknown' NMI.

To handle this, we first have to flag whether or not the NMI handler handled
more than one event or not.  If it did, then there exists a chance that
the next NMI might be already processed.  Once the NMI is flagged as a
candidate to be swallowed, we next look for a back-to-back NMI condition.

This is determined by looking at the %rip from pt_regs.  If it is the same
as the previous NMI, it is assumed the cpu did not have a chance to jump
back into a non-NMI context and execute code and instead handled another NMI.

If both of those conditions are true then we will swallow any unknown NMI.

There still exists a chance that we accidentally swallow a real unknown NMI,
but for now things seem better.

An optimization has also been added to the nmi notifier rountine.  Because x86
can latch up to one NMI while currently processing an NMI, we don't have to
worry about executing _all_ the handlers in a standalone NMI.  The idea is
if multiple NMIs come in, the second NMI will represent them.  For those
back-to-back NMI cases, we have the potentail to drop NMIs.  Therefore only
execute all the handlers in the second half of a detected back-to-back NMI.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:57:01 +02:00
Don Zickus 9c48f1c629 x86, nmi: Wire up NMI handlers to new routines
Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion.  A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.

[Thanks to Ying for finding out the history behind that mce call

https://lkml.org/lkml/2010/5/27/114

And Boris responding that he would like to remove that call because of it

https://lkml.org/lkml/2011/9/21/163]

The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Corey Minyard <minyard@acm.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:56:57 +02:00
Don Zickus c9126b2ee8 x86, nmi: Create new NMI handler routines
The NMI handlers used to rely on the notifier infrastructure.  This worked
great until we wanted to support handling multiple events better.

One of the key ideas to the nmi handling is to process _all_ the handlers for
each NMI.  The reason behind this switch is because NMIs are edge triggered.
If enough NMIs are triggered, then they could be lost because the cpu can
only latch at most one NMI (besides the one currently being processed).

In order to deal with this we have decided to process all the NMI handlers
for each NMI.  This allows the handlers to determine if they recieved an
event or not (the ones that can not determine this will be left to fend
for themselves on the unknown NMI list).

As a result of this change it is now possible to have an extra NMI that
was destined to be received for an already processed event.  Because the
event was processed in the previous NMI, this NMI gets dropped and becomes
an 'unknown' NMI.  This of course will cause printks that scare people.

However, we prefer to have extra NMIs as opposed to losing NMIs and as such
are have developed a basic mechanism to catch most of them.  That will be
a later patch.

To accomplish this idea, I unhooked the nmi handlers from the notifier
routines and created a new mechanism loosely based on doIRQ.  The reason
for this is the notifier routines have a couple of shortcomings.  One we
could't guarantee all future NMI handlers used NOTIFY_OK instead of
NOTIFY_STOP.  Second, we couldn't keep track of the number of events being
handled in each routine (most only handle one, perf can handle more than one).
Third, I wanted to eventually display which nmi handlers are registered in
the system in /proc/interrupts to help see who is generating NMIs.

The patch below just implements the new infrastructure but doesn't wire it up
yet (that is the next patch).  Its design is based on doIRQ structs and the
atomic notifier routines.  So the rcu stuff in the patch isn't entirely untested
(as the notifier routines have soaked it) but it should be double checked in
case I copied the code wrong.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:56:52 +02:00
Don Zickus 1d48922c14 x86, nmi: Split out nmi from traps.c
The nmi stuff is changing a lot and adding more functionality.  Split it
out from the traps.c file so it doesn't continue to pollute that file.

This makes it easier to find and expand all the future nmi related work.

No real functional changes here.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-2-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:56:47 +02:00
Gleb Natapov 144d31e6f1 perf, intel: Use GO/HO bits in perf-ctr
Intel does not have guest/host-only bit in perf counters like AMD
does.  To support GO/HO bits KVM needs to switch EVENTSELn values
(or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is
configured to count only in a guest mode it stays disabled in a host,
but VMX is configured to switch it to enabled value during guest entry.

This patch adds GO/HO tracking to Intel perf code and provides interface
for KVM to get a list of MSRs that need to be switched on a guest entry.

Only cpus with architectural PMU (v1 or later) are supported with this
patch.  To my knowledge there is not p6 models with VMX but without
architectural PMU and p4 with VMX are rare and the interface is general
enough to support them if need arise.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:56:42 +02:00
Paul Menzel 29cf7a30f8 x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE
In summary, this DMI quirk uses the _CRS info by default for the ASUS
M2V-MX SE by turning on `pci=use_crs` and is similar to the quirk
added by commit 2491762cfb ("x86/PCI: use host bridge _CRS info on
ASRock ALiveSATA2-GLAN") whose commit message should be read for further
information.

Since commit 3e3da00c01 ("x86/pci: AMD one chain system to use pci
read out res") Linux gives the following oops:

    parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
    HDA Intel 0000:20:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
    HDA Intel 0000:20:01.0: setting latency timer to 64
    BUG: unable to handle kernel paging request at ffffc90011c08000
    IP: [<ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel]
    PGD 13781a067 PUD 13781b067 PMD 1300ba067 PTE 800000fd00000173
    Oops: 0009 [#1] SMP
    last sysfs file: /sys/module/snd_pcm/initstate
    CPU 0
    Modules linked in: snd_hda_intel(+) snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event tpm_tis tpm snd_seq tpm_bios psmouse parport_pc snd_timer snd_seq_device parport processor evdev snd i2c_viapro thermal_sys amd64_edac_mod k8temp i2c_core soundcore shpchp pcspkr serio_raw asus_atk0110 pci_hotplug edac_core button snd_page_alloc edac_mce_amd ext3 jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod raid1 md_mod usbhid hid sg sd_mod crc_t10dif sr_mod cdrom ata_generic uhci_hcd sata_via pata_via libata ehci_hcd usbcore scsi_mod via_rhine mii nls_base [last unloaded: scsi_wait_scan]
    Pid: 1153, comm: work_for_cpu Not tainted 2.6.37-1-amd64 #1 M2V-MX SE/System Product Name
    RIP: 0010:[<ffffffffa0578402>]  [<ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel]
    RSP: 0018:ffff88013153fe50  EFLAGS: 00010286
    RAX: ffffc90011c08000 RBX: ffff88013029ec00 RCX: 0000000000000006
    RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffff88013341d000 R08: 0000000000000000 R09: 0000000000000040
    R10: 0000000000000286 R11: 0000000000003731 R12: ffff88013029c400
    R13: 0000000000000000 R14: 0000000000000000 R15: ffff88013341d090
    FS:  0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:00000000f7610ab0
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffffc90011c08000 CR3: 0000000132f57000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process work_for_cpu (pid: 1153, threadinfo ffff88013153e000, task ffff8801303c86c0)
    Stack:
     0000000000000005 ffffffff8123ad65 00000000000136c0 ffff88013029c400
     ffff8801303c8998 ffff88013341d000 ffff88013341d090 ffff8801322d9dc8
     ffff88013341d208 0000000000000000 0000000000000000 ffffffff811ad232
    Call Trace:
     [<ffffffff8123ad65>] ? __pm_runtime_set_status+0x162/0x186
     [<ffffffff811ad232>] ? local_pci_probe+0x49/0x92
     [<ffffffff8105afc5>] ? do_work_for_cpu+0x0/0x1b
     [<ffffffff8105afc5>] ? do_work_for_cpu+0x0/0x1b
     [<ffffffff8105afd0>] ? do_work_for_cpu+0xb/0x1b
     [<ffffffff8105fd3f>] ? kthread+0x7a/0x82
     [<ffffffff8100a824>] ? kernel_thread_helper+0x4/0x10
     [<ffffffff8105fcc5>] ? kthread+0x0/0x82
     [<ffffffff8100a820>] ? kernel_thread_helper+0x0/0x10
    Code: f4 01 00 00 ef 31 f6 48 89 df e8 29 dd ff ff 85 c0 0f 88 2b 03 00 00 48 89 ef e8 b4 39 c3 e0 8b 7b 40 e8 fc 9d b1 e0 48 8b 43 38 <66> 8b 10 66 89 14 24 8b 43 14 83 e8 03 83 f8 01 77 32 31 d2 be
    RIP  [<ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel]
     RSP <ffff88013153fe50>
    CR2: ffffc90011c08000
    ---[ end trace 8d1f3ebc136437fd ]---

Trusting the ACPI _CRS information (`pci=use_crs`) fixes this problem.

    $ dmesg | grep -i crs # with the quirk
    PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug

The match has to be against the DMI board entries though since the vendor entries are not populated.

    DMI: System manufacturer System Product Name/M2V-MX SE, BIOS 0304    10/30/2007

This quirk should be removed when `pci=use_crs` is enabled for machines
from 2006 or earlier or some other solution is implemented.

Using coreboot [1] with this board the problem does not exist but this
quirk also does not affect it either. To be safe though the check is
tightened to only take effect when the BIOS from American Megatrends is
used.

        15:13 < ruik> but coreboot does not need that
        15:13 < ruik> because i have there only one root bus
        15:13 < ruik> the audio is behind a bridge

        $ sudo dmidecode
        BIOS Information
                Vendor: American Megatrends Inc.
                Version: 0304
                Release Date: 10/30/2007

[1] http://www.coreboot.org/

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=30552

Cc: stable@kernel.org (2.6.34)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Paul Menzel <paulepanter@users.sourceforge.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-06 16:10:37 -07:00
Joerg Roedel 011af85784 perf, amd: Use GO/HO bits in perf-ctr
The AMD perf-counters support counting in guest or host-mode
only. Make use of that feature when user-space specified
guest/host-mode only counting.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317816084-18026-3-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-06 13:00:31 +02:00
Ingo Molnar 7b4f86ac05 Merge branch 'ras' of git://amd64.org/linux/bp into perf/core 2011-10-06 12:54:36 +02:00
Ingo Molnar 9d01402023 Merge commit 'v3.1-rc9' into perf/core
Merge reason: pick up latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-06 12:49:21 +02:00
Liu, Jinsong a3e06bbe84 KVM: emulate lapic tsc deadline timer for guest
This patch emulate lapic tsc deadline timer for guest:
Enumerate tsc deadline timer capability by CPUID;
Enable tsc deadline timer mode by lapic MMIO;
Start tsc deadline timer by WRMSR;

[jan: use do_div()]
[avi: fix for !irqchip_in_kernel()]
[marcelo: another fix for !irqchip_in_kernel()]

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-10-05 15:34:56 +02:00
Linus Torvalds f72a209a3e Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  irq: Fix check for already initialized irq_domain in irq_domain_add
  irq: Add declaration of irq_domain_simple_ops to irqdomain.h

* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86/rtc: Don't recursively acquire rtc_lock

* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  posix-cpu-timers: Cure SMP wobbles
  sched: Fix up wchan borkage
  sched/rt: Migrate equal priority tasks to available CPUs
2011-10-01 08:37:25 -07:00
David Vrabel 4dcaebbf65 xen: use generic functions instead of xen_{alloc, free}_vm_area()
Replace calls to the Xen-specific xen_alloc_vm_area() and
xen_free_vm_area() functions with the generic equivalent
(alloc_vm_area() and free_vm_area()).

On x86, these were identical already.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 15:02:18 -04:00
Ingo Molnar 4167ab90ee Merge branch 'core' of git://amd64.org/linux/rric into perf/core 2011-09-29 17:35:29 +02:00
David Vrabel f3f436e33b xen: release all pages within 1-1 p2m mappings
In xen_memory_setup() all reserved regions and gaps are set to an
identity (1-1) p2m mapping.  If an available page has a PFN within one
of these 1-1 mappings it will become inaccessible (as it MFN is lost)
so release them before setting up the mapping.

This can make an additional 256 MiB or more of RAM available
(depending on the size of the reserved regions in the memory map) if
the initial pages overlap with reserved regions.

The 1:1 p2m mappings are also extended to cover partial pages.  This
fixes an issue with (for example) systems with a BIOS that puts the
DMI tables in a reserved region that begins on a non-page boundary.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 11:12:15 -04:00
David Vrabel dc91c728fd xen: allow extra memory to be in multiple regions
Allow the extra memory (used by the balloon driver) to be in multiple
regions (typically two regions, one for low memory and one for high
memory).  This allows the balloon driver to increase the number of
available low pages (if the initial number if pages is small).

As a side effect, the algorithm for building the e820 memory map is
simpler and more obviously correct as the map supplied by the
hypervisor is (almost) used as is (in particular, all reserved regions
and gaps are preserved).  Only RAM regions are altered and RAM regions
above max_pfn + extra_pages are marked as unused (the region is split
in two if necessary).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 11:12:10 -04:00
David Vrabel 8b5d44a5ac xen: allow balloon driver to use more than one memory region
Allow the xen balloon driver to populate its list of extra pages from
more than one region of memory.  This will allow platforms to provide
(for example) a region of low memory and a region of high memory.

The maximum possible number of extra regions is 128 (== E820MAX) which
is quite large so xen_extra_mem is placed in __initdata.  This is safe
as both xen_memory_setup() and balloon_init() are in __init.

The balloon regions themselves are not altered (i.e., there is still
only the one region).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 11:12:10 -04:00
David Vrabel aa24411b67 xen/balloon: account for pages released during memory setup
In xen_memory_setup() pages that occur in gaps in the memory map are
released back to Xen.  This reduces the domain's current page count in
the hypervisor.  The Xen balloon driver does not correctly decrease
its initial current_pages count to reflect this.  If 'delta' pages are
released and the target is adjusted the resulting reservation is
always 'delta' less than the requested target.

This affects dom0 if the initial allocation of pages overlaps the PCI
memory region but won't affect most domU guests that have been setup
with pseudo-physical memory maps that don't have gaps.

Fix this by accouting for the released pages when starting the balloon
driver.

If the domain's targets are managed by xapi, the domain may eventually
run out of memory and die because xapi currently gets its target
calculations wrong and whenever it is restarted it always reduces the
target by 'delta'.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 11:12:09 -04:00
Stefano Stabellini b17d0b5c08 xen: XEN_PVHVM depends on PCI
Xen PV on HVM guests require PCI support because they need the
xen-platform-pci driver in order to initialize xenbus.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 10:52:16 -04:00
Stefano Stabellini 0930bba674 xen: modify kernel mappings corresponding to granted pages
If we want to use granted pages for AIO, changing the mappings of a user
vma and the corresponding p2m is not enough, we also need to update the
kernel mappings accordingly.
Currently this is only needed for pages that are created for user usages
through /dev/xen/gntdev. As in, pages that have been in use by the
kernel and use the P2M will not need this special mapping.
However there are no guarantees that in the future the kernel won't
start accessing pages through the 1:1 even for internal usage.

In order to avoid the complexity of dealing with highmem, we allocated
the pages lowmem.
We issue a HYPERVISOR_grant_table_op right away in
m2p_add_override and we remove the mappings using another
HYPERVISOR_grant_table_op in m2p_remove_override.
Considering that m2p_add_override and m2p_remove_override are called
once per page we use multicalls and hypercall batching.

Use the kmap_op pointer directly as argument to do the mapping as it is
guaranteed to be present up until the unmapping is done.
Before issuing any unmapping multicalls, we need to make sure that the
mapping has already being done, because we need the kmap->handle to be
set correctly.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Removed GRANT_FRAME_BIT usage]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 10:32:58 -04:00
Jan Beulich eab9e6137f x86-64: Fix CFI data for interrupt frames
The patch titled "x86: Don't use frame pointer to save old stack
on irq entry" did not properly adjust CFI directives, so this
patch is a follow-up to that one.

With the old stack pointer no longer stored in a callee-saved
register (plus some offset), we now have to use a CFA expression
to describe the memory location where it is being found. This
requires the use of .cfi_escape (allowing arbitrary byte streams
to be emitted into .eh_frame), as there is no
.cfi_def_cfa_expression (which also cannot reasonably be
expected, as it would require a full expression parser).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/4E8360200200007800058467@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-28 19:04:52 +02:00
Jan Beulich e05139f256 x86-64: Don't apply destructive erratum workaround on unaffected CPUs
Erratum 93 applies to AMD K8 CPUs only, and its workaround
(forcing the upper 32 bits of %rip to all get set under certain
conditions) is actually getting in the way of analyzing page
faults occurring during EFI physical mode runtime calls (in
particular the page table walk shown is completely unrelated to
the actual fault). This is because typically EFI runtime code
lives in the space between 2G and 4G, which - modulo the above
manipulation - is likely to overlap with the kernel or modules
area.

While even for the other errata workarounds their taking effect
could be limited to just the affected CPUs, none of them appears
to be destructive, and they're generally getting called only
outside of performance critical paths, so they're being left
untouched.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/4E835FE30200007800058464@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-28 19:04:48 +02:00
Jan Beulich 838312be46 apic, i386/bigsmp: Fix false warnings regarding logical APIC ID mismatches
These warnings (generally one per CPU) are a result of
initializing x86_cpu_to_logical_apicid while apic_default is
still in use, but the check in setup_local_APIC() being done
when apic_bigsmp was already used as an override in
default_setup_apic_routing():

 Overriding APIC driver with bigsmp
 Enabling APIC mode:  Physflat.  Using 5 I/O APICs
 ------------[ cut here ]------------
 WARNING: at .../arch/x86/kernel/apic/apic.c:1239
 ...
 CPU 1 irqstacks, hard=f1c9a000 soft=f1c9c000
 Booting Node   0, Processors  #1
 smpboot cpu 1: start_ip = 9e000
 Initializing CPU#1
 ------------[ cut here ]------------
 WARNING: at .../arch/x86/kernel/apic/apic.c:1239
 setup_local_APIC+0x137/0x46b() Hardware name: ...
 CPU1 logical APIC ID: 2 != 8
 ...

Fix this (for the time being, i.e. until
x86_32_early_logical_apicid() will get removed again, as Tejun
says ought to be possible) by overriding the previously stored
values at the point where the APIC driver gets overridden.

v2: Move this and the pre-existing override logic into
    arch/x86/kernel/apic/bigsmp_32.c.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org> (2.6.39 and onwards)
Link: http://lkml.kernel.org/r/4E835D16020000780005844C@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-28 19:01:53 +02:00
Stephen Rothwell 910b2c5122 x86, amd: Include linux/elf.h since we use stuff from asm/elf.h
After merging the moduleh tree, today's linux-next build (x86_64
allmodconfig) failed like this:

  arch/x86/kernel/sys_x86_64.c:28:10: warning: 'enum align_flags' declared inside parameter list
  arch/x86/kernel/sys_x86_64.c:28:10: warning: its scope is only this definition or declaration, which is probably not what you
  want arch/x86/kernel/sys_x86_64.c:28:22: error: parameter 3 ('flags') has incomplete type
  [...]

Presumably caused by the module.h split interacting with a
new commit dfb09f9b7a ("x86, amd: Avoid cache aliasing penalties
on AMD family 15h") from the x8 tree.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/20110928174214.17a58be15d84d67c185930e1@canb.auug.org.au
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-28 10:34:31 +02:00
Ingo Molnar 695d16f787 Merge branch 'upstream/ticketlock-cleanup' of git://github.com/jsgf/linux-xen into x86/spinlocks 2011-09-28 08:57:10 +02:00
Jeremy Fitzhardinge 4a7f340c6a x86, ticketlock: remove obsolete comment
The note about partial registers is not really relevent now that we
rely on gcc to generate all the assembler.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-09-27 23:37:20 -07:00
Randy Dunlap d6eed550a9 x86: Perf_event_amd.c needs <asm/apicdef.h>
Fix (rare) build error by adding <asm/apicdef.h> header file:

  arch/x86/kernel/cpu/perf_event_amd.c:350:2: error: 'BAD_APICID' undeclared (first use in this function)

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Andre Przywara <andre.przywara@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/4E820138.90301@xenotime.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-27 19:55:09 +02:00
Paul Bolle 395cf9691d doc: fix broken references
There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.

Fix these broken references, sometimes by dropping the irrelevant text
they were part of.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-27 18:08:04 +02:00
Jeremy Fitzhardinge fdb9eb9f15 xen/dom0: set wallclock time in Xen
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-09-26 11:04:39 -07:00
Jeremy Fitzhardinge eec07a9ecf xen: add dom0_op hypercall
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-09-26 11:04:39 -07:00
Yu Ke 3e0996798a xen/acpi: Domain0 acpi parser related platform hypercall
This patches implements the xen_platform_op hypercall, to pass the parsed
ACPI info to hypervisor.

Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Added DEFINE_GUEST.. in appropiate headers]
[v2: Ripped out typedefs]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-26 11:04:39 -07:00
Kevin Winchester de0428a7ad x86, perf: Clean up perf_event cpu code
The CPU support for perf events on x86 was implemented via included C files
with #ifdefs.  Clean this up by creating a new header file and compiling
the vendor-specific files as needed.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1314747665-2090-1-git-send-email-kjwinchester@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-26 12:58:00 +02:00
Ingo Molnar ed3982cf37 Merge commit 'v3.1-rc7' into perf/core
Merge reason: Pick up the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-26 12:54:28 +02:00
Liu, Jinsong b90dfb0419 x86: TSC deadline definitions
This pre-defination is preparing for KVM tsc deadline timer emulation, but
theirself are not kvm specific.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25 19:53:00 +03:00
Avi Kivity 7460fb4a34 KVM: Fix simultaneous NMIs
If simultaneous NMIs happen, we're supposed to queue the second
and next (collapsing them), but currently we sometimes collapse
the second into the first.

Fix by using a counter for pending NMIs instead of a bool; since
the counter limit depends on whether the processor is currently
in an NMI handler, which can only be checked in vcpu context
(via the NMI mask), we add a new KVM_REQ_NMI to request recalculation
of the counter.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25 19:52:59 +03:00
Avi Kivity 1cd196ea42 KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25 19:52:58 +03:00
Avi Kivity d4b4325fdb KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25 19:52:57 +03:00
Avi Kivity c191a7a0f4 KVM: x86 emulator: streamline decode of segment registers
The opcodes

  push %seg
  pop %seg
  l%seg, %mem, %reg  (e.g. lds/les/lss/lfs/lgs)

all have an segment register encoded in the instruction.  To allow reuse,
decode the segment number into src2 during the decode stage instead of the
execution stage.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25 19:52:56 +03:00