Commit Graph

20060 Commits

Author SHA1 Message Date
Andi Kleen 2dee5c43da x86: Fix section conflict for numachip
A variable cannot be both __read_mostly and const. This
is a meaningless combination.

Just make it only const.

This fixes the LTO build with numachip enabled.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1411533139-25708-1-git-send-email-andi@firstfloor.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 11:18:49 +02:00
Ben Hutchings 0e6d3112a4 x86: Reject x32 executables if x32 ABI not supported
It is currently possible to execve() an x32 executable on an x86_64
kernel that has only ia32 compat enabled.  However all its syscalls
will fail, even _exit().  This usually causes it to segfault.

Change the ELF compat architecture check so that x32 executables are
rejected if we don't support the x32 ABI.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 11:17:42 +02:00
Bryan O'Donoghue aece118e48 x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()
Intel processors which don't report cache information via cpuid(2)
or cpuid(4) need quirk code in the legacy_cache_size callback to
report this data. For Intel that callback is is intel_size_cache().

This patch enables calling of cpu_detect_cache_sizes() inside of
init_intel() and hence the calling of the legacy_cache callback in
intel_size_cache(). Adding this call will ensure that PIII Tualatin
currently in intel_size_cache() and Quark SoC X1000 being added to
intel_size_cache() in this patch will report their respective cache
sizes.

This model of calling cpu_detect_cache_sizes() is consistent with
AMD/Via/Cirix/Transmeta and Centaur.

Also added is a string to idenitfy the Quark as Quark SoC X1000
giving better and more descriptive output via /proc/cpuinfo

Adding cpu_detect_cache_sizes to init_intel() will enable calling
of intel_size_cache() on Intel processors which currently no code
can reach. Therefore this patch will also re-enable reporting
of PIII Tualatin cache size information as well as add
Quark SoC X1000 support.

Comment text and cache flow logic suggested by Thomas Gleixner

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: davej@redhat.com
Cc: hmh@hmh.eng.br
Link: http://lkml.kernel.org/r/1412641189-12415-3-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 10:07:46 +02:00
Bryan O'Donoghue 2075244f9b x86: Quark: Comment setup_arch() to document TLB/PGE bug
Quark SoC X1000 advertises Page Global Enable for it's
Translation Lookaside Buffer via cpuid. The silicon does not
in fact support PGE and hence will not flush the TLB when CR4.PGE
is rewritten. The Quark documentation makes clear the necessity to
instead rewrite CR3 in order to flush any TLB entries, irrespective
of the state of CR4.PGE or an individual PTE.PGE

See Intel Quark Core DevMan_001.pdf section 6.4.11

In setup.c setup_arch() the code will load_cr3() and then do a
__flush_tlb_all().

On Quark the entire TLB will be flushed at the load_cr3().
The __flush_tlb_all() have no effect and can be safely ignored.

Later on in the boot process we switch off the flag for cpu_has_pge()
which means that subsequent calls to __flush_tlb_all() will
call __flush_tlb() not __flush_tlb_global() flushing the TLB in the
correct way via load_cr3() not CR4.PGE rewrite

This patch documents the behaviour of flushing the TLB for Quark in
setup_arch()

Comment text suggested by Thomas Gleixner

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: davej@redhat.com
Cc: hmh@hmh.eng.br
Link: http://lkml.kernel.org/r/1412641189-12415-2-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 10:07:46 +02:00
Jan Beulich 5f1d919a8c x86: Improve cmpxchg8b_emu.S
- don't include unneeded headers
- drop redundant entry point label
- complete unwind annotations
- use .L prefix on local labels to not clutter the symbol table

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5422917E0200007800038081@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 10:05:49 +02:00
Jan Beulich 3f63572187 x86: Improve cmpxchg16b_emu.S
- don't include unneeded headers
- don't open-code PER_CPU_VAR()
- drop redundant entry point label
- complete unwind annotations
- use .L prefix on local label to not clutter the symbol table

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/542290BC020000780003807D@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 10:05:49 +02:00
Linus Torvalds 74da38631a Tinification for 3.18
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJUL0J0AAoJEA7Zo9+K/4c9w40P/iMFPfCethdBtPz5rI88CVr2
 7yU99TdbEPoRJm+rU4ohvHdB73p2KWINIKvpSThvegvjXbEcKxQkdpVWHsFJZeHS
 bZiYmhjxdCBvJGLrYo5IwqH0PrSjokTPzMUekUCk7BkUKNJRaDjfUBHvUmKsinUR
 dQL+3KE3edy6W3DL+FOd0QZwSOgmOfEibTWpfmg+n16kFNa75Kg/QLwjYRvtQplP
 eElywDZN07IhAeBFqKhKvlKmDSAeqMd8RfoPPo9Ts+reeIrWYjVNbl9ISOqXqy2x
 JoLeZQmwSXj/C9Ehr5e+aId2eO8In5xueQfXP8SS8dCC7VLwRbnNgyAQQZEslEBk
 QH0GhT6GqTamBdiNI3I+usfs65cEaialXh2afcoLwGS/iGD8MhZ8Dt+m4iyXNxEZ
 kT9VA4974mPjJ1g0mDDnYIxNjxF43m+SD5K1sR/XGpMcA8NdqMUmvKNcbePCobVa
 WTutIemQqGipNeWE94XwZEbc0B+aWwH7eiZOBMVGhWsHInd7QeTBTbfZlctyBkzf
 AswgsFjC5FW05CWK6J1Lf/UI1FD9PmHMKpmQUPED1+7okDTfqGjKjdREWgZSixUt
 LIRfWqWEaNpRRBFbDyt0C+F4pBRPLiRDaOyNhwEdtXuVGKRXb1G3qX7nFOJAZo6G
 GDTZo9iIRNSfm/M4tJ+n
 =2VyW
 -----END PGP SIGNATURE-----

Merge tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux

Pull "tinification" patches from Josh Triplett.

Work on making smaller kernels.

* tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
  bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_
  mm: Support compiling out madvise and fadvise
  x86: Support compiling out human-friendly processor feature names
  x86: Drop support for /proc files when !CONFIG_PROC_FS
  x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK
  x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE
  x86, boot: Use the usual -y -n mechanism for objects in vmlinux
  x86: Add "make tinyconfig" to configure the tiniest possible kernel
  x86, platform, kconfig: move kvmconfig functionality to a helper
2014-10-07 08:51:59 -04:00
Rafael J. Wysocki 88b42a4883 Merge branch 'pm-genirq'
* pm-genirq:
  PM / genirq: Document rules related to system suspend and interrupts
  PCI / PM: Make PCIe PME interrupts wake up from suspend-to-idle
  x86 / PM: Set IRQCHIP_SKIP_SET_WAKE for IOAPIC IRQ chip objects
  genirq: Simplify wakeup mechanism
  genirq: Mark wakeup sources as armed on suspend
  genirq: Create helper for flow handler entry check
  genirq: Distangle edge handler entry
  genirq: Avoid double loop on suspend
  genirq: Move MASK_ON_SUSPEND handling into suspend_device_irqs()
  genirq: Make use of pm misfeature accounting
  genirq: Add sanity checks for PM options on shared interrupt lines
  genirq: Move suspend/resume logic into irq/pm code
  PM / sleep: Mechanism for aborting system suspends unconditionally
2014-10-07 01:17:21 +02:00
Andy Lutomirski 8c7aa698ba x86_64, entry: Filter RFLAGS.NT on entry from userspace
The NT flag doesn't do anything in long mode other than causing IRET
to #GP.  Oddly, CPL3 code can still set NT using popf.

Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.

If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble.  For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen?  Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault.  That
segfault cannot be handled, because signal delivery fails, too.

This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.

To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it.  As a result,
it seems to have very little effect on SYSENTER performance on my
machine.

There's another minor bug fix in here: it looks like the CFI
annotations were wrong if CONFIG_AUDITSYSCALL=n.

Testers beware: on Xen, SYSENTER with NT set turns into a GPF.

I haven't touched anything on 32-bit kernels.

The syscall mask change comes from a variant of this patch by Anish
Bhatt.

Note to stable maintainers: there is no known security issue here.
A misguided program can set NT and cause the kernel to try and fail
to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275

Cc: <stable@vger.kernel.org>
Reported-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-10-06 10:53:26 -07:00
Mukesh Rathor a2ef5dc2c7 x86/xen: Set EFER.NX and EFER.SCE in PVH guests
This fixes two bugs in PVH guests:

  - Not setting EFER.NX means the NX bit in page table entries is
    ignored on Intel processors and causes reserved bit page faults on
    AMD processors.

  - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE for
    pvh guest") PVH guests are required to set EFER.SCE to enable the
    SYSCALL instruction.

Secondary VCPUs are started with pagetables with the NX bit set so
EFER.NX must be set before using any stack or data segment.
xen_pvh_cpu_early_init() is the new secondary VCPU entry point that
sets EFER before jumping to cpu_bringup_and_idle().

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-06 10:27:47 +01:00
Juergen Gross d1e9abd630 xen: eliminate scalability issues from initrd handling
Size restrictions native kernels wouldn't have resulted from the initrd
getting mapped into the initial mapping. The kernel doesn't really need
the initrd to be mapped, so use infrastructure available in Xen to avoid
the mapping and hence the restriction.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-03 12:34:52 +01:00
Pranith Kumar 2291059c85 locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()
Use the much more reader friendly ACCESS_ONCE() instead of the cast to volatile.
This is purely a stylistic change.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1411482607-20948-1-git-send-email-bobby.prani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 06:06:23 +02:00
Wei Huang cc6cd47e73 perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment
PMU checking can fail due to various reasons. On native machine, this
is mostly caused by faulty hardware and it is reasonable to use
KERN_ERR in reporting. However, when kernel is running on virtualized
environment, this checking can fail if virtual PMU is not supported
(e.g. KVM on AMD host). It is annoying to see an error message on
splash screen, even though we know such failure is benign on
virtualized environment.

This patch checks if the kernel is running in a virtualized environment.
If so, it will use KERN_INFO in reporting, which reduces the syslog
priority of them. This patch was tested successfully on KVM.

Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411617314-24659-1-git-send-email-wei@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 06:04:41 +02:00
Andi Kleen 4f971248bc perf/x86/intel/uncore: Fix minor race in box set up
I was looking for the trinity oops cause in the uncore driver.
(so far didn't found it)

However I found this tiny race: when a box is set up two threads on the
same CPU, they may be setting up the box in parallel (e.g. with kernel
preemption). This could lead to the reference count being increasing
too much. Always recheck there is no existing cpu reference inside the lock.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411424826-15629-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 06:02:49 +02:00
Dave Hansen 728e5653e6 sched/x86: Fix up typo in topology detection
Commit:

  cebf15eb09 ("x86, sched: Add new topology for multi-NUMA-node CPUs")

some code to try to detect the situation where we have a NUMA node
inside of the "DIE" sched domain.

It detected this by looking for cpus which match_die() but do not match
NUMA nodes via topology_same_node().

I wrote it up as:

	if (match_die(c, o) == !topology_same_node(c, o))

which actually seemed to work some of the time, albiet
accidentally.

It should have been doing an &&, not an ==.

This code essentially chopped off the "DIE" domain on one of
Andrew Morton's systems.  He reported that this patch fixed his
issue.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Lan Tianyu <tianyu.lan@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/20140930214546.FD481CFF@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 05:46:52 +02:00
David S. Miller 739e4a758e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	net/netfilter/nfnetlink.c

Both r8152 and nfnetlink conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-02 11:25:43 -07:00
Paolo Bonzini f439ed27f8 kvm: do not handle APIC access page if in-kernel irqchip is not in use
This fixes the following OOPS:

   loaded kvm module (v3.17-rc1-168-gcec26bc)
   BUG: unable to handle kernel paging request at fffffffffffffffe
   IP: [<ffffffff81168449>] put_page+0x9/0x30
   PGD 1e15067 PUD 1e17067 PMD 0
   Oops: 0000 [#1] PREEMPT SMP
    [<ffffffffa063271d>] ? kvm_vcpu_reload_apic_access_page+0x5d/0x70 [kvm]
    [<ffffffffa013b6db>] vmx_vcpu_reset+0x21b/0x470 [kvm_intel]
    [<ffffffffa0658816>] ? kvm_pmu_reset+0x76/0xb0 [kvm]
    [<ffffffffa064032a>] kvm_vcpu_reset+0x15a/0x1b0 [kvm]
    [<ffffffffa06403ac>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
    [<ffffffffa062e540>] kvm_vm_ioctl+0x200/0x780 [kvm]
    [<ffffffff81212170>] do_vfs_ioctl+0x2d0/0x4b0
    [<ffffffff8108bd99>] ? __mmdrop+0x69/0xb0
    [<ffffffff812123d1>] SyS_ioctl+0x81/0xa0
    [<ffffffff8112a6f6>] ? __audit_syscall_exit+0x1f6/0x2a0
    [<ffffffff817229e9>] system_call_fastpath+0x16/0x1b
   Code: c6 78 ce a3 81 4c 89 e7 e8 d9 80 ff ff 0f 0b 4c 89 e7 e8 8f f6 ff ff e9 fa fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 1e 8b 47 1c 85 c0 74 27 f0
   RIP  [<ffffffff81193045>] put_page+0x5/0x50

when not using the in-kernel irqchip ("-machine kernel_irqchip=off"
with QEMU).  The fix is to make the same check in
kvm_vcpu_reload_apic_access_page that we already have
in vmx.c's vm_need_virtualize_apic_accesses().

Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Fixes: 4256f43f9f
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-02 19:26:11 +02:00
Mathias Krause 5cfed7b335 Revert "crypto: aesni - disable "by8" AVX CTR optimization"
This reverts commit 7da4b29d49.

Now, that the issue is fixed, we can re-enable the code.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-10-02 14:40:28 +08:00
Herbert Xu 9561dccb45 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merging the crypto tree for 3.17 to pull in the "by8" AVX CTR revert.
2014-10-02 14:37:20 +08:00
Mathias Krause e3b3bb5ac1 crypto: aesni - remove unused defines in "by8" variant
The defines for xkey3, xkey6 and xkey9 are not used in the code. They're
probably left overs from merging the three source files for 128, 192 and
256 bit AES. They can safely be removed.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-10-02 14:35:03 +08:00
Mathias Krause 80dca4734b crypto: aesni - fix counter overflow handling in "by8" variant
The "by8" CTR AVX implementation fails to propperly handle counter
overflows. That was the reason it got disabled in commit 7da4b29d49
("crypto: aesni - disable "by8" AVX CTR optimization").

Fix the overflow handling by incrementing the counter block as a double
quad word, i.e. a 128 bit, and testing for overflows afterwards. We need
to use VPTEST to do so as VPADD* does not set the flags itself and
silently drops the carry bit.

As this change adds branches to the hot path, minor performance
regressions  might be a side effect. But, OTOH, we now have a conforming
implementation -- the preferable goal.

A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical
numbers for the old and this version with differences within the noise
range. A dm-crypt test with the fixed version gave even slightly better
results for this version. So the performance impact might not be as big
as expected.

Tested-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-10-02 14:35:03 +08:00
Kees Cook 20cc28882b x86, boot, kaslr: Fix nuisance warning on 32-bit builds
Building 32-bit threw a warning on kASLR enabled builds:

arch/x86/boot/compressed/aslr.c: In function ‘mem_avoid_overlap’:
arch/x86/boot/compressed/aslr.c:198:17: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   avoid.start = (u64)ptr;
                 ^

This fixes the warning; unsigned long should have been used here.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20141001183632.GA11431@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-10-01 11:41:24 -07:00
Linus Torvalds cd40fab6db Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This has:

   - EFI revert to fix a boot regression
   - early_ioremap() fix for boot failure
   - KASLR fix for possible boot failures
   - EFI fix for corrupted string printing
   - remove a misleading EFI bootup 'failed!' error message

  Unfortunately it's all rather close to the merge window"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Truncate 64-bit values when calling 32-bit OutputString()
  x86/efi: Delete misleading efi_printk() error message
  Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
  x86/kaslr: Avoid the setup_data area when picking location
  x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
2014-09-27 14:23:13 -07:00
Alexei Starovoitov 749730ce42 bpf: enable bpf syscall on x64 and i386
done as separate commit to ease conflict resolution

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:05:14 -04:00
Linus Torvalds d2865c7d4e Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A CONFIG_STACK_GROWSUP=y fix, and a hotplug llc CPU mask fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix unreleased llc_shared_mask bit during CPU hotplug
  sched: Fix end_of_stack() and location of stack canary for architectures using CONFIG_STACK_GROWSUP
2014-09-26 08:38:09 -07:00
Ingo Molnar 29282ac0bd * Revert the static library changes from the merge window since they're
causing issues for Macbooks and Fedora + Grub2 - Matt Fleming
 
  * Delete the misleading "setup_efi_pci() failed!" message which some
    people are seeing when booting EFI - Matt Fleming
 
  * Fix printing strings from the 32-bit EFI boot stub by only passing
    32-bit addresses to the firmware - Matt Fleming
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUIzOFAAoJEC84WcCNIz1VpSwQAJhp9Yu60dWXyqRpV+ER1yau
 MWHXunkeaccbBXNABkzuDUqb2a6DRIJow+/n+dYjIGY6Nf9zzLQFQ+s/EsS+IiyY
 s4rRvAqzfGYk1d6xzgvEccrdD4fRP32kFKnSlZpfTRuJZHZieD+f2y6TP7D33Ja3
 HV/ivPQHZNjxgsExpcE8Rz/QyOZpqRacTr9Gr7IusBRL6IMCyycfCTmt6d5pf1iD
 kQGSGwIBvgN4xMqPUdxTNo31bQZA5ZeywNOh9WhdSCL7FAIDfG9TmXt/J7ckq5ax
 0f6X92qgCs3peLY+/szgSZ2LsZI6I/FM1udc2SFiIOPvCwwytcJ94Wro5pLYzZ4i
 SnqB2xLLEmsR2J3MXIeY0aVy2VtHT4bYRnXYNd9G0eaVfrlJ+4lgwqJavAJmtDZx
 88ey1R8LKRQr+ueSv/BnOvE6T2+38HrjrMooFQsPvolRR0S6MITBr8I2hoRASkUt
 YsA+7s6+tO2QBmQYrKCYSAi9A7onMA9Fh93dmv7XLqFw/SsfVm3RnrNhOVsO9kPC
 zIsWZoS+PGwb4RRvM2i7JAEqUCbuLpIAHYEU6gqprWm1ERHsX9mfFYfsJQHzHuOY
 rg6+wtWQ9MGxek8POac4d2mC+PwC4DA0AkaTTZQBcdAJu+h/gZNt4w7mpk5v4Th1
 QYr/otShMvc84Zd+RMeV
 =5mVJ
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

  * Revert the static library changes from the merge window since they're
    causing issues for Macbooks and Fedora + Grub2 (Matt Fleming)

  * Delete the misleading "setup_efi_pci() failed!" message which some
    people are seeing when booting EFI (Matt Fleming)

  * Fix printing strings from the 32-bit EFI boot stub by only passing
    32-bit addresses to the firmware (Matt Fleming)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-25 16:40:08 +02:00
Matt Fleming 115c6628a5 x86/efi: Truncate 64-bit values when calling 32-bit OutputString()
If we're executing the 32-bit efi_char16_printk() code path (i.e.
running on top of 32-bit firmware) we know that efi_early->text_output
will be a 32-bit value, even though ->text_output has type u64.

Unfortunately, we currently pass ->text_output directly to
efi_early->call() so for CONFIG_X86_32 the compiler will push a 64-bit
value onto the stack, causing the other parameters to be misaligned.

The way we handle this in the rest of the EFI boot stub is to pass
pointers as arguments to efi_early->call(), which automatically do the
right thing (pointers are 32-bit on CONFIG_X86_32, and we simply ignore
the upper 32-bits of the argument register if running in 64-bit mode
with 32-bit firmware).

This fixes a corruption bug when printing strings from the 32-bit EFI
boot stub.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=84241
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-24 21:56:46 +01:00
David S. Miller 4daaab4f0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-24 16:48:32 -04:00
Tejun Heo d06efebf0c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18
This is to receive 0a30288da1 ("blk-mq, percpu_ref: implement a
kludge for SCSI blk-mq stall during probe") which implements
__percpu_ref_kill_expedited() to work around SCSI blk-mq stall.  The
commit reverted and patches to implement proper fix will be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
2014-09-24 13:00:21 -04:00
Linus Torvalds 2368a9426f Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes three issues:

   - if ccp is loaded on a machine without ccp, it will incorrectly
     activate causing all requests to fail.  Fixed by preventing ccp
     from loading if hardware isn't available.

   - not all IRQs were enabled for the qat driver, leading to potential
     stalls when it is used

   - disabled buggy AVX CTR implementation in aesni"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aesni - disable "by8" AVX CTR optimization
  crypto: ccp - Check for CCP before registering crypto algs
  crypto: qat - Enable all 32 IRQs
2014-09-24 09:37:35 -07:00
Ben Hutchings eeeda4cd06 x86/relocs: Make per_cpu_load_addr static
per_cpu_load_addr is only used for 64-bit relocations, but is
declared in both configurations of relocs.c - with different
types.  This has undefined behaviour in general.  GNU ld is
documented to use the larger size in this case, but other tools
may differ and some warn about this.

References: https://bugs.debian.org/748577
Reported-by: Michael Tautschnig <mt@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: 748577@bugs.debian.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1411561812.3659.23.camel@decadent.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 15:17:47 +02:00
Oleg Nesterov 212be3b232 x86/lib/Makefile: Remove the unnecessary "+= thunk_64.o"
Trivial. We have "lib-y += thunk_$(BITS).o" at the start, no
need to add thunk_64.o if !CONFIG_X86_32.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140921184232.GB23727@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 15:15:39 +02:00
Oleg Nesterov 0ad6e3c519 x86: Speed up ___preempt_schedule*() by using THUNK helpers
___preempt_schedule() does SAVE_ALL/RESTORE_ALL but this is
suboptimal, we do not need to save/restore the callee-saved
register. And we already have arch/x86/lib/thunk_*.S which
implements the similar asm wrappers, so it makes sense to
redefine ___preempt_schedule() as "THUNK ..." and remove
preempt.S altogether.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140921184153.GA23727@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 15:15:38 +02:00
Mathias Krause 7da4b29d49 crypto: aesni - disable "by8" AVX CTR optimization
The "by8" implementation introduced in commit 22cddcc7df ("crypto: aes
- AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it
handles counter block overflows differently. It only accounts the right
most 32 bit as a counter -- not the whole block as all other
implementations do. This makes it fail the cryptomgr test #4 that
specifically tests this corner case.

As we're quite late in the release cycle, just disable the "by8" variant
for now.

Reported-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-09-24 21:15:31 +08:00
Wanpeng Li 03bd4e1f72 sched: Fix unreleased llc_shared_mask bit during CPU hotplug
The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
	PGD 5a9d5067 PUD 13067 PMD 0
	Oops: 0000 [#3] SMP
	[...]
	Call Trace:
	load_balance
	? _raw_spin_unlock_irqrestore
	idle_balance
	__schedule
	schedule
	schedule_timeout
	? lock_timer_base
	schedule_timeout_uninterruptible
	msleep
	lock_device_hotplug_sysfs
	online_store
	dev_attr_store
	sysfs_write_file
	vfs_write
	SyS_write
	system_call_fastpath

Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.

However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.

This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.

Yasuaki also reported that this can happen on real hardware:

  https://lkml.org/lkml/2014/7/22/1018

His case is here:

	==
	Here is an example on my system.
	My system has 4 sockets and each socket has 15 cores and HT is
	enabled. In this case, each core of sockes is numbered as
	follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-44, 90-104
	Socket#3 | 45-59, 105-119

	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.

	It means that last level cache of Socket#2 is shared with
	CPU#30-44 and 90-104.

	When hot-removing socket#2 and #3, each core of sockets is
	numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89

	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
	remains having 0x3fff80000001fffc0000000.

	After that, when hot-adding socket#2 and #3, each core of
	sockets is numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-59
	Socket#3 | 90-119

	Then llc_shared_mask of CPU#30 becomes
	0x3fff8000fffffffc0000000. It means that last level cache of
	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
	the wrong value.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Linn Crosetto <linn@hp.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 15:13:20 +02:00
Bryan O'Donoghue ee1b5b165c x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
Quark x1000 advertises PGE via the standard CPUID method
PGE bits exist in Quark X1000's PTEs. In order to flush
an individual PTE it is necessary to reload CR3 irrespective
of the PTE.PGE bit.

See Quark Core_DevMan_001.pdf section 6.4.11

This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to
crash and burn on this platform.

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1411514784-14885-1-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 15:06:15 +02:00
Lan Tianyu 2ed53c0d6c x86/smpboot: Speed up suspend/resume by avoiding 100ms sleep for CPU offline during S3
With certain kernel configurations, CPU offline consumes more than
100ms during S3.

It's a timing related issue: native_cpu_die() would occasionally fall
into a 100ms sleep when the CPU idle loop thread marked the CPU state
to DEAD too slowly.

What native_cpu_die() does is that it polls the CPU state and waits
for 100ms if CPU state hasn't been marked to DEAD. The 100ms sleep
doesn't make sense and is purely historic.

To avoid such long sleeping, this patch adds a 'struct completion'
to each CPU, waits for the completion in native_cpu_die() and wakes
up the completion when the CPU state is marked to DEAD.

Tested on an Intel Xeon server with 48 cores, Ivybridge and on
Haswell laptops. The CPU offlining cost on these machines is
reduced from more than 100ms to less than 5ms. The system
suspend time is reduced by 2.3s on the servers.

Borislav and Prarit also helped to test the patch on an AMD
machine and a few systems of various sizes and configurations
(multi-socket, single-socket, no hyper threading, etc.). No
issues were seen.

Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: srostedt@redhat.com
Cc: toshi.kani@hp.com
Cc: imammedo@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409039025-32310-1-git-send-email-tianyu.lan@intel.com
[ Improved a few minor details in the code, cleaned up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 15:02:06 +02:00
Stephane Eranian 521e8bac67 perf/x86/intel/uncore: Update support for client uncore IMC PMU
This patch restructures the memory controller (IMC) uncore PMU support
for client SNB/IVB/HSW processors. The main change is that it can now
cope with more than one PCI device ID per processor model. There are
many flavors of memory controllers for each processor. They have
different PCI device ID, yet they behave the same w.r.t. the memory
controller PMU that we are interested in.

The patch now supports two distinct memory controllers for IVB
processors: one for mobile, one for desktop.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140917090616.GA11281@quad
Cc: ak@linux.intel.com
Cc: kan.liang@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:25 +02:00
Andi Kleen b10fc1c3e3 perf/x86/intel/uncore: Fix PCU filter setup for Sandy/Ivy/Haswell EP
The PCU frequency band filters use 8 bit each in a register.
When setting up the value the shift value was not correctly
scaled, which resulted in all filters except for band 0 to
be zero. Fix the scaling.

This allows to correctly monitor multiple uncore frequency bands.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409872109-31645-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:24 +02:00
Andi Kleen 7e96ae1a89 perf/x86/intel/uncore: Add missing cbox filter flags on IvyBridge-EP uncore driver
The IvyBridge-EP uncore driver was missing three filter flags:
NC, ISOC, C6 which are useful in some cases. Support them in the same way
as the Haswell EP driver, by allowing to set them and exposing
them in the sysfs formats.

Also fix a typo in a define.

Relies on the Haswell EP driver to be applied earlier.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1409872109-31645-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:23 +02:00
Yan, Zheng 513d793e5f perf/x86/intel/uncore: Register the PMU only if the uncore pci device exists
Current code registers PMUs for all possible uncore pci devices.
This is not good because, on some machines, one or more uncore pci
devices can be missing. The missing pci device make corresponding
PMU unusable. Register the PMU only if the uncore device exists.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409872109-31645-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:22 +02:00
Yan, Zheng e735b9db12 perf/x86/intel/uncore: Add Haswell-EP uncore support
The uncore subsystem in Haswell-EP is similar to Sandy/Ivy
Bridge-EP. There are some differences in config register
encoding and pci device IDs. The Haswell-EP uncore also
supports a few new events. Add the Haswell-EP driver to
the snbep split driver.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
[ Add missing break. Add imc events. Add cbox nc/isoc/c6. ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409872109-31645-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:21 +02:00
Andi Kleen fdda3c4aac perf/x86/intel: Use Broadwell cache event list for Haswell
Use the newly added Broadwell cache event list for Haswell too.
All Haswell and Broadwell events and offcore masks used in these lists
are identical.

However Haswell is very different from the Sandy Bridge
list that was used previously. That fixes a wide range of mis-counting
cache events.

The node events are now only for retired memory events, so prefetching
and speculative memory accesses are not included. They are PEBS
capable now, which makes it much easier to sample for them, plus it's
possible to create address maps with -d.

The prefetch events are gone now. They way the hardware counts
them is very misleading (some prefetches included, others not), so
it seemed best to leave them out.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409683455-29168-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:20 +02:00
Andi Kleen c46e665f03 perf/x86: Add INST_RETIRED.ALL workarounds
On Broadwell INST_RETIRED.ALL cannot be used with any period
that doesn't have the lowest 6 bits cleared. And the period
should not be smaller than 128.

Add a new callback to enforce this, and set it for Broadwell.

This is erratum BDM57 and BDM11.

How does this handle the case when an app requests a specific
period with some of the bottom bits set

The apps thinks it is sampling at X occurences per sample, when it is
in fact at X - 63 (worst case).

Short answer:

Any useful instruction sampling period needs to be 4-6 orders
of magnitude larger than 128, as an PMI every 128 instructions
would instantly overwhelm the system and be throttled.
So the +-64 error from this is really small compared to the
period, much smaller than normal system jitter.

Long answer:

<write up by Peter:>

IFF we guarantee perf_event_attr::sample_period >= 128.

Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.

So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.

So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.

So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1409683455-29168-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:19 +02:00
Andi Kleen 86a349a28b perf/x86/intel: Add Broadwell core support
Add Broadwell support for Broadwell Client to perf.  This is very
similar to Haswell.  It uses a new cache event table, because there
were various changes there.

The constraint list has one new event that needs to be handled over
Haswell.

The PEBS event list is the same, so we reuse Haswell's.

[fengguang.wu: make intel_bdw_event_constraints[] static]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:18 +02:00
Andi Kleen d86c8eaf95 perf/x86/intel: Document all Haswell models
Add names for each Haswell model as requested by Peter.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409683455-29168-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:16 +02:00
Andi Kleen b76146851e perf/x86/intel: Remove incorrect model number from Haswell perf
71 is a Broadwell, not a Haswell. The model number was added
by mistake earlier.

Remove it for now, until it can be re-added later with
real Broadwell support.

In practice it does not cause a lot of issues because the Broadwell
PMU is very similar to Haswell, but some details were wrong,
and it's better to handle it correctly.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1409683455-29168-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:15 +02:00
Dave Hansen cebf15eb09 x86, sched: Add new topology for multi-NUMA-node CPUs
I'm getting the spew below when booting with Haswell (Xeon
E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled
in the BIOS.  It seems similar to the issue that some folks from
AMD ran in to on their systems and addressed in this commit:

  161270fc1f ("x86/smp: Fix topology checks on AMD MCM CPUs")

Both these Intel and AMD systems break an assumption which is
being enforced by topology_sane(): a socket may not contain more
than one NUMA node.

AMD special-cased their system by looking for a cpuid flag.  The
Intel mode is dependent on BIOS options and I do not know of a
way which it is enumerated other than the tables being parsed
during the CPU bringup process.  In other words, we have to trust
the ACPI tables <shudder>.

This detects the situation where a NUMA node occurs at a place in
the middle of the "CPU" sched domains.  It replaces the default
topology with one that relies on the NUMA information from the
firmware (SRAT table) for all levels of sched domains above the
hyperthreads.

This also fixes a sysfs bug.  We used to freak out when we saw
the "mc" group cross a node boundary, so we stopped building the
MC group.  MC gets exported as the 'core_siblings_list' in
/sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with
the same 'physical_package_id' to not be listed together in
'core_siblings_list'.  This violates a statement from
Documentation/ABI/testing/sysfs-devices-system-cpu:

	core_siblings: internal kernel map of cpu#'s hardware threads
	within the same physical_package_id.

	core_siblings_list: human-readable list of the logical CPU
	numbers within the same physical_package_id as cpu#.

The sysfs effects here cause an issue with the hwloc tool where
it gets confused and thinks there are more sockets than are
physically present.

Before this patch, there are two packages:

# cd /sys/devices/system/cpu/
# cat cpu*/topology/physical_package_id | sort | uniq -c
     18 0
     18 1

But 4 _sets_ of core siblings:

# cat cpu*/topology/core_siblings_list | sort | uniq -c
      9 0-8
      9 18-26
      9 27-35
      9 9-17

After this set, there are only 2 sets of core siblings, which
is what we expect for a 2-socket system.

# cat cpu*/topology/physical_package_id | sort | uniq -c
     18 0
     18 1
# cat cpu*/topology/core_siblings_list | sort | uniq -c
     18 0-17
     18 18-35

Example spew:
...
	NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
	 #2  #3  #4  #5  #6  #7  #8
	.... node  #1, CPUs:    #9
	------------[ cut here ]------------
	WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90()
	sched: CPU #9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
	Modules linked in:
	CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty #631
	Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014
	0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48
	ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009
	ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98
	Call Trace:
	[<ffffffff8172e485>] dump_stack+0x45/0x56
	[<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0
	[<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50
	[<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90
	[<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0
	[<ffffffff8107568d>] start_secondary+0x1ad/0x240
	---[ end trace 3fe5f587a9fcde61 ]---
	#10 #11 #12 #13 #14 #15 #16 #17
	.... node  #2, CPUs:   #18 #19 #20 #21 #22 #23 #24 #25 #26
	.... node  #3, CPUs:   #27 #28 #29 #30 #31 #32 #33 #34 #35

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
[ Added LLC domain and s/match_mc/match_die/ ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: brice.goglin@gmail.com
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:47:14 +02:00
Mathias Krause 615f77511e x86/PCI: Mark PCI BIOS initialization code as such
The pci_find_bios() function is only ever called from initialization code,
therefore can be marked as such, too.  This, in turn, allows marking other
functions called only in this context as well.

The bios32_indirect variable can be marked as __initdata as it is only
referenced from __init functions now.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 06:46:27 -06:00
Mathias Krause 6af13bac77 x86/PCI: Constify pci_mmcfg_probes[] array
The pci_mmcfg_probes[] array is only ever read, therefore make it const.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 06:46:22 -06:00
Mathias Krause 776f7ad632 x86/PCI: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst
The constants in pci_mmcfg_nvidia_mcp55() need to be marked as __initconst
or they will remain in memory after init memory was released.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 06:46:17 -06:00
Mathias Krause 64474b5235 x86/PCI: Move __init annotation to the correct place
According to include/linux/init.h, the __init annotation should be added
immediately before the function name.  However, for quite a few functions
in mmconfig-shared.c this is not the case.  It's either before the return
type or even in the middle of it.  Beside gcc still getting it right, we
should change them to comply to the rules of include/linux/init.h.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 06:46:01 -06:00
Tang Chen c24ae0dcd3 kvm: x86: Unpin and remove kvm_arch->apic_access_page
In order to make the APIC access page migratable, stop pinning it in
memory.

And because the APIC access page is not pinned in memory, we can
remove kvm_arch->apic_access_page.  When we need to write its
physical address into vmcs, we use gfn_to_page() to get its page
struct, which is needed to call page_to_phys(); the page is then
immediately unpinned.

Suggested-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:08:01 +02:00
Tang Chen 38b9917350 kvm: vmx: Implement set_apic_access_page_addr
Currently, the APIC access page is pinned by KVM for the entire life
of the guest.  We want to make it migratable in order to make memory
hot-unplug available for machines that run KVM.

This patch prepares to handle this for the case where there is no nested
virtualization, or where the nested guest does not have an APIC page of
its own.  All accesses to kvm->arch.apic_access_page are changed to go
through kvm_vcpu_reload_apic_access_page.

If the APIC access page is invalidated when the host is running, we update
the VMCS in the next guest entry.

If it is invalidated when the guest is running, the MMU notifier will force
an exit, after which we will handle everything as in the previous case.

If it is invalidated when a nested guest is running, the request will update
either the VMCS01 or the VMCS02.  Updating the VMCS01 is done at the
next L2->L1 exit, while updating the VMCS02 is done in prepare_vmcs02.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:08:01 +02:00
Tang Chen 4256f43f9f kvm: x86: Add request bit to reload APIC access page address
Currently, the APIC access page is pinned by KVM for the entire life
of the guest.  We want to make it migratable in order to make memory
hot-unplug available for machines that run KVM.

This patch prepares to handle this in generic code, through a new
request bit (that will be set by the MMU notifier) and a new hook
that is called whenever the request bit is processed.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:08:00 +02:00
Tang Chen fe71557afb kvm: Add arch specific mmu notifier for page invalidation
This will be used to let the guest run while the APIC access page is
not pinned.  Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:59 +02:00
Andres Lagar-Cavilla 5712846808 kvm: Fix page ageing bugs
1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.

Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:58 +02:00
Andres Lagar-Cavilla 8a9522d2fe kvm/x86/mmu: Pass gfn and level to rmapp callback.
Callbacks don't have to do extra computation to learn what the caller
(lvm_handle_hva_range()) knows very well. Useful for
debugging/tracing/printk/future.

Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:57 +02:00
Paolo Bonzini c1118b3602 x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
In that case, KVM will fail to patch VMCALL instructions to VMMCALL
as required on AMD processors.

The failure mode is currently a divide-by-zero exception, which obviously
is a KVM bug that has to be fixed.  However, picking the right instruction
between VMCALL and VMMCALL will be faster and will help if you cannot upgrade
the hypervisor.

Reported-by: Chris Webb <chris@arachsys.com>
Tested-by: Chris Webb <chris@arachsys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:57 +02:00
Chen Yucong 81760dccf8 kvm: x86: use macros to compute bank MSRs
Avoid open coded calculations for bank MSRs by using well-defined
macros that hide the index of higher bank MSRs.

No semantic changes.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:56 +02:00
Nadav Amit d5262739cb KVM: x86: Remove debug assertion of non-PAE reserved bits
Commit 346874c950 ("KVM: x86: Fix CR3 reserved bits") removed non-PAE
reserved bits which were not according to Intel SDM.  However, residue was left
in a debug assertion (CR3_NONPAE_RESERVED_BITS).  Remove it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:55 +02:00
Tiejun Chen b461966063 kvm: x86: fix two typos in comment
s/drity/dirty and s/vmsc01/vmcs01

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:53 +02:00
Nadav Amit 4566654bb9 KVM: vmx: Inject #GP on invalid PAT CR
Guest which sets the PAT CR to invalid value should get a #GP.  Currently, if
vmx supports loading PAT CR during entry, then the value is not checked.  This
patch makes the required check in that case.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:52 +02:00
Nadav Amit 040c8dc8a5 KVM: x86: emulating descriptor load misses long-mode case
In 64-bit mode a #GP should be delivered to the guest "if the code segment
descriptor pointed to by the selector in the 64-bit gate doesn't have the L-bit
set and the D-bit clear." - Intel SDM "Interrupt 13—General Protection
Exception (#GP)".

This patch fixes the behavior of CS loading emulation code. Although the
comment says that segment loading is not supported in long mode, this function
is executed in long mode, so the fix is necassary.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:52 +02:00
Liang Chen 77c3913b74 KVM: x86: directly use kvm_make_request again
A one-line wrapper around kvm_make_request is not particularly
useful. Replace kvm_mmu_flush_tlb() with kvm_make_request().

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:51 +02:00
Radim Krčmář a70656b63a KVM: x86: count actual tlb flushes
- we count KVM_REQ_TLB_FLUSH requests, not actual flushes
  (KVM can have multiple requests for one flush)
- flushes from kvm_flush_remote_tlbs aren't counted
- it's easy to make a direct request by mistake

Solve these by postponing the counting to kvm_check_request().

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:50 +02:00
Marcelo Tosatti bc6134942d KVM: nested VMX: disable perf cpuid reporting
Initilization of L2 guest with -cpu host, on L1 guest with -cpu host
triggers:

(qemu) KVM: entry failed, hardware error 0x7
...
nested_vmx_run: VMCS MSR_{LOAD,STORE} unsupported

Nested VMX MSR load/store support is not sufficient to
allow perf for L2 guest.

Until properly fixed, trap CPUID and disable function 0xA.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:50 +02:00
Nadav Amit a2b9e6c1a3 KVM: x86: Don't report guest userspace emulation error to userspace
Commit fc3a9157d3 ("KVM: X86: Don't report L2 emulation failures to
user-space") disabled the reporting of L2 (nested guest) emulation failures to
userspace due to race-condition between a vmexit and the instruction emulator.
The same rational applies also to userspace applications that are permitted by
the guest OS to access MMIO area or perform PIO.

This patch extends the current behavior - of injecting a #UD instead of
reporting it to userspace - also for guest userspace code.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:49 +02:00
Paolo Bonzini 1f755a8275 kvm: Make init_rmode_tss() return 0 on success.
In init_rmode_tss(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_set_tss_addr(), and ret is redundant.

This patch removes the redundant variable, by making init_rmode_tss()
return 0 on success, -errno on failure.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:48 +02:00
Nadav Amit dd598091de KVM: x86: Warn if guest virtual address space is not 48-bits
The KVM emulator code assumes that the guest virtual address space (in 64-bit)
is 48-bits wide.  Fail the KVM_SET_CPUID and KVM_SET_CPUID2 ioctl if
userspace tries to create a guest that does not obey this restriction.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:48 +02:00
Matt Fleming 56394ab8c2 x86/efi: Delete misleading efi_printk() error message
A number of people are reporting seeing the "setup_efi_pci() failed!"
error message in what used to be a quiet boot,

  https://bugzilla.kernel.org/show_bug.cgi?id=81891

The message isn't all that helpful because setup_efi_pci() can return a
non-success error code for a variety of reasons, not all of them fatal.

Let's drop the return code from setup_efi_pci*() altogether, since
there's no way to process it in any meaningful way outside of the inner
__setup_efi_pci*() functions.

Reported-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Ulf Winkelvos <ulf@winkelvos.de>
Cc: Andre Müller <andre.muller@web.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-24 12:46:59 +01:00
Andy Lutomirski f12c1f9002 x86/vdso: Fix vdso2c's special_pages[] error checking
Stephen Rothwell's compiler did something amazing: it unrolled a
loop, discovered that one iteration of that loop contained an
always-true test, and emitted a warning that will IMO only serve
to convince people to disable the warning.

That bogus warning caused me to wonder what prompted such an
absurdity from his compiler, and I discovered that the code in
question was, in fact, completely wrong -- I was looking things
up in the wrong array.

This affects 3.16 as well, but the only effect is to screw up
the error checking a bit.  vdso2c's output is unaffected.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/53d96ad5.80ywqrbs33ZBCQej%25akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 09:55:38 +02:00
Matt Fleming 84be880560 Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
This reverts commit f23cf8bd5c ("efi/x86: efistub: Move shared
dependencies to <asm/efi.h>") as well as the x86 parts of commit
f4f75ad574 ("efi: efistub: Convert into static library").

The road leading to these two reverts is long and winding.

The above two commits were merged during the v3.17 merge window and
turned the common EFI boot stub code into a static library. This
necessitated making some symbols global in the x86 boot stub which
introduced new entries into the early boot GOT.

The problem was that we weren't fixing up the newly created GOT entries
before invoking the EFI boot stub, which sometimes resulted in hangs or
resets. This failure was reported by Maarten on his Macbook pro.

The proposed fix was commit 9cb0e39423 ("x86/efi: Fixup GOT in all
boot code paths"). However, that caused issues for Linus when booting
his Sony Vaio Pro 11. It was subsequently reverted in commit
f3670394c2.

So that leaves us back with Maarten's Macbook pro not booting.

At this stage in the release cycle the least risky option is to revert
the x86 EFI boot stub to the pre-merge window code structure where we
explicitly #include efi-stub-helper.c instead of linking with the static
library. The arm64 code remains unaffected.

We can take another swing at the x86 parts for v3.18.

Conflicts:
	arch/x86/include/asm/efi.h

Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org> [arm64]
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-23 22:01:55 +01:00
Richard Guy Briggs b4f0d3755c audit: x86: drop arch from __audit_syscall_entry() interface
Since the arch is found locally in __audit_syscall_entry(), there is no need to
pass it in as a parameter.  Delete it from the parameter list.

x86* was the only arch to call __audit_syscall_entry() directly and did so from
assembly code.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-audit@redhat.com
Signed-off-by: Eric Paris <eparis@redhat.com>

---

As this patch relies on changes in the audit tree, I think it
appropriate to send it through my tree rather than the x86 tree.
2014-09-23 16:21:28 -04:00
Eric Paris 91397401bb ARCH: AUDIT: audit_syscall_entry() should not require the arch
We have a function where the arch can be queried, syscall_get_arch().
So rather than have every single piece of arch specific code use and/or
duplicate syscall_get_arch(), just have the audit code use the
syscall_get_arch() code.

Based-on-patch-by: Richard Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: microblaze-uclinux@itee.uq.edu.au
Cc: linux-mips@linux-mips.org
Cc: linux@lists.openrisc.net
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: linux-xtensa@linux-xtensa.org
Cc: x86@kernel.org
2014-09-23 16:21:26 -04:00
Eric Paris 4b4665e13c UM: implement syscall_get_arch()
This patch defines syscall_get_arch() for the um platform.  It adds a
new syscall.h header file to define this.  It copies the HOST_AUDIT_ARCH
definition from ptrace.h.  (that definition will be removed when we
switch audit to use this new syscall_get_arch() function)

Based-on-patch-by: Richard Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
2014-09-23 16:20:02 -04:00
David S. Miller 1f6d80358d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/mips/net/bpf_jit.c
	drivers/net/can/flexcan.c

Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23 12:09:27 -04:00
David Vrabel f955371ca9 x86: remove the Xen-specific _PAGE_IOMAP PTE flag
The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
that were used to map I/O regions that are 1:1 in the p2m.  This
allowed Xen to obtain the correct PFN when converting the MFNs read
from a PTE back to their PFN.

Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
returns the correct PFN by using a combination of the m2p and p2m to
determine if an MFN corresponds to a 1:1 mapping in the the p2m.

Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
future uses of the PTE flag.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
2014-09-23 13:36:20 +00:00
David Vrabel 7f2f882245 x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings
Since mfn_to_pfn() returns the correct PFN for identity mappings (as
used for MMIO regions), the use of _PAGE_IOMAP is not required in
pte_mfn_to_pfn().

Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it
in pte_mfn_to_pfn().

This will allow _PAGE_IOMAP to be removed, making it available for
future use.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-09-23 13:36:20 +00:00
David Vrabel 3166851142 x86: skip check for spurious faults for non-present faults
If a fault on a kernel address is due to a non-present page, then it
cannot be the result of stale TLB entry from a protection change (RO
to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
skipped.

See the initial if in spurious_fault() and the tests in
spurious_fault_check()) for the set of possible error codes checked
for spurious faults.  These are:

         IRUWP
Before   x00xx && ( 1xxxx || xxx1x )
After  ( 10001 || 00011 ) && ( 1xxxx || xxx1x )

Thus the new condition is a subset of the previous one, excluding only
non-present faults (I == 1 and W == 1 are mutually exclusive).

This avoids spurious_fault() oopsing in some cases if the pagetables
it attempts to walk are not accessible.  This obscures the location of
the original fault.

This also fixes a crash with Xen PV guests when they access entries in
the M2P corresponding to device MMIO regions.  The M2P is mapped
(read-only) by Xen into the kernel address space of the guest and this
mapping may contains holes for non-RAM regions.  Read faults will
result in calls to spurious_fault(), but because the page tables for
the M2P mappings are not accessible by the guest the pagetable walk
would fault.

This was not normally a problem as MMIO mappings would not normally
result in a M2P lookup because of the use of the _PAGE_IOMAP bit the
PTE.  However, removing the _PAGE_IOMAP bit requires M2P lookups for
MMIO mappings as well.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
2014-09-23 13:36:20 +00:00
Daniel Kiper 342cd340f6 xen/efi: Directly include needed headers
I discovered that some needed stuff is defined/declared in headers
which are not included directly. Currently it works but if somebody
remove required headers from currently included headers then build
will break. So, just in case directly include all needed headers.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-09-23 13:36:20 +00:00
Matt Rushton 4fbb67e3c8 xen/setup: Remap Xen Identity Mapped RAM
Instead of ballooning up and down dom0 memory this remaps the existing mfns
that were replaced by the identity map. The reason for this is that the
existing implementation ballooned memory up and and down which caused dom0
to have discontiguous pages. In some cases this resulted in the use of bounce
buffers which reduced network I/O performance significantly. This change will
honor the existing order of the pages with the exception of some boundary
conditions.

To do this we need to update both the Linux p2m table and the Xen m2p table.
Particular care must be taken when updating the p2m table since it's important
to limit table memory consumption and reuse the existing leaf pages which get
freed when an entire leaf page is set to the identity map. To implement this,
mapping updates are grouped into blocks with table entries getting cached
temporarily and then released.

On my test system before:
Total pages: 2105014
Total contiguous: 1640635

After:
Total pages: 2105014
Total contiguous: 2098904

Signed-off-by: Matthew Rushton <mrushton@amazon.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-09-23 13:36:18 +00:00
Josh Triplett 3cf6b0151b Merge branches 'tiny/bloat-o-meter-no-SyS', 'tiny/more-procless', 'tiny/no-advice', 'tiny/tinyconfig' and 'tiny/x86-boot-compressed-use-yn' into tiny/next 2014-09-22 23:14:40 -07:00
Linus Torvalds f3670394c2 Revert "x86/efi: Fixup GOT in all boot code paths"
This reverts commit 9cb0e39423.

It causes my Sony Vaio Pro 11 to immediately reboot at startup.

Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-22 23:05:49 -07:00
Mathias Krause 4ac9cbfa35 x86/PCI: Mark DMI tables as initialization data
The DMI tables are only used in __init code, thereby can be marked as
initialization data, too.  The same is true for the callback functions
referenced from the DMI tables.

This moves ~9.6 kB of code and r/o data to the init sections, marking the
memory for release after initialization.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2014-09-22 14:24:41 -06:00
Zubair Lutfullah Kakakhel 8057b30814 x86: use generic dma-contiguous.h
dma-contiguous.h is now in asm-generic. Use that to avoid code
repetition in x86.

Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: catalin.marinas@arm.com
Cc: will.deacon@arm.com
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: arnd@arndb.de
Cc: gregkh@linuxfoundation.org
Cc: m.szyprowski@samsung.com
Cc: x86@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/7359/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-09-22 13:35:52 +02:00
Linus Torvalds b29f83aa8b PCI updates for v3.17:
Enumeration
     - Don't default exclusively to first video device (Bruno Prémont)
 
   PCI device hotplug
     - Remove "no hotplug settings from platform" warning (Bjorn Helgaas)
     - Add pci_ignore_hotplug() for VGA switcheroo (Bjorn Helgaas)
 
   Freescale i.MX6
     - Put LTSSM in "Detect" state before disabling (Lucas Stach)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUG7o9AAoJEFmIoMA60/r8hbYP/3gR3xHd2QKpkmBcM1lf1yiR
 osQQnAfRqEO4fzrpmOYrYbLIAOPwanK6Y36rmIYB+wHU2SUaffV7ZI9uW32shTud
 09+1N+OrSS6fwzVUWOuKsf1kv/jxpS+ic2fb+Qe1OXwJh5G+z1D9Kvd2EPLJdlgK
 ySyX4zSTrLni8CoclzREO7u82VVO5rTdvbujBxuvpOQTOdD5TFqV/uhb/y3gQz+u
 sG6IxUbdXsy4r24C6OnPrmmZ1Rk/lgCMyA+QSozc5Eu5PdGzcY9a6gcKlTnsbwBs
 qYLAb+/KCa3KgQh07NYmFfYdpoMZUXgSsEtD8gyvfJQHwUYwW8rsEMKxlSCQrzYr
 0OrpBSVTO6ta1r8SKOWtSYETQgPE3GUiJR1DuCyV+55RLZYp6Q8zH6dbgfWQbA/g
 R/kWHihR/tcD9YIlT99QrBppZtvG5nZ3y7aLSqdYYxEJqHE0tlbuxAu8hgwDf3Qp
 lKZJMyadLB1MS9lnrMj8DYqIOKbe62LOwcEYzhMJzaq8vCy+JWtjxOOgwBkT7P5v
 bhhYh3eqi5/MBONtw52V6RDUQId9vOLGHoiM5/akG4FFmWdhO9S0SbMBhAJyazKT
 n3IP5yj657XAi/fK939PUCQ3YuT5GqlCNcXDCqUzBZhGnt/Ln7LPmQI323519Lp4
 3vI3irFT0Fu2IFEBhc6l
 =xIYe
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "These fix:

   - Boot video device detection on dual-GPU Apple systems
   - Hotplug fiascos on VGA switcheroo with radeon & nouveau drivers
   - Boot hang on Freescale i.MX6 systems
   - Excessive "no hotplug settings from platform" warnings

  In particular:

  Enumeration
    - Don't default exclusively to first video device (Bruno Prémont)

  PCI device hotplug
    - Remove "no hotplug settings from platform" warning (Bjorn Helgaas)
    - Add pci_ignore_hotplug() for VGA switcheroo (Bjorn Helgaas)

  Freescale i.MX6
    - Put LTSSM in "Detect" state before disabling (Lucas Stach)"

* tag 'pci-v3.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  vgaarb: Drop obsolete #ifndef
  vgaarb: Don't default exclusively to first video device with mem+io
  ACPIPHP / radeon / nouveau: Remove acpi_bus_no_hotplug()
  PCI: Remove "no hotplug settings from platform" warning
  PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device
  PCI: imx6: Put LTSSM in "Detect" state before disabling it
  MAINTAINERS: Add Lucas Stach as co-maintainer for i.MX6 PCI driver
2014-09-19 10:50:30 -07:00
Linus Torvalds 598a0c7d09 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two kernel side fixes: a kprobes fix and a perf_remove_from_context()
  fix (which does not yet fix the migration bug which is WIP)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix a race condition in perf_remove_from_context()
  kprobes/x86: Free 'optinsn' cache when range check fails
2014-09-19 10:31:36 -07:00
Linus Torvalds 7a5e87867e Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

  EFI fixes, a build fix, a page table dumping (debug) fix and a clang
  build fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/arm64: Fix fdt-related memory reservation
  x86/mm: Apply the section attribute to the variable, not its type
  x86/efi: Fixup GOT in all boot code paths
  x86/efi: Only load initrd above 4g on second try
  x86-64, ptdump: Mark espfix area only if existent
  x86, irq: Fix build error caused by 9eabc99a63
2014-09-19 09:07:47 -07:00
David E. Box ed2226bd4d x86/platform/intel/iosf: Add debugfs config option for IOSF
Makes the IOSF sideband available through debugfs. Allows
developers to experiment with using the sideband to provide
debug and analytical tools for units on the SoC.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1411017231-20807-4-git-send-email-david.e.box@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 13:08:43 +02:00
David E. Box ced3ce760b x86/platform/intel/iosf: Add better description of IOSF driver in config
Adds better description of IOSF driver to determine when it
should be enabled. Also moves the Kconfig option to "Processor
type and features" menu from main configuration menu.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1411017231-20807-3-git-send-email-david.e.box@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 13:08:42 +02:00
David E. Box 849f5d8943 x86/platform/intel/iosf: Add Braswell PCI ID
Add Braswell PCI ID to list of supported ID's for the IOSF driver.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1411017231-20807-2-git-send-email-david.e.box@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 13:08:42 +02:00
Kees Cook 0cacbfbeb5 x86/kaslr: Avoid the setup_data area when picking location
The KASLR location-choosing logic needs to avoid the setup_data
list memory areas as well. Without this, it would be possible to
have the ASLR position stomp on the memory, ultimately causing
the boot to fail.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Baoquan He <bhe@redhat.com>
Cc: stable@vger.kernel.org
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140911161931.GA12001@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 13:04:29 +02:00
Martin Kelly 9575a6a23a x86/platform/pmc_atom: Fix warning when CONFIG_DEBUG_FS=n
When compiling with CONFIG_DEBUG_FS=n, GCC emits an unused
variable warning for pmc_atom.c because "ret" is used only
within the CONFIG_DEBUG_FS block.

This patch adds a dummy #ifdef for pmc_dbgfs_register() when
CONFIG_DEBUG_FS=n to simplify the code and remove the warning.

Signed-off-by: Martin Kelly <martkell@amazon.com>
Acked-by: "Li, Aubrey" <aubrey.li@linux.intel.com>
Cc: vishwesh.m.rudramuni@intel.com
Link: http://lkml.kernel.org/r/1410963476-8360-1-git-send-email-martin@martingkelly.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 13:02:21 +02:00
Rakib Mullick d286c3af48 x86/mce: Avoid showing repetitive message from intel_init_thermal()
intel_init_thermal() is called from a) at the time of system initializing
and b) at the time of system resume to initialize thermal
monitoring.

In case when thermal monitoring is handled by SMI, we get to know it via
printk(). Currently it gives the message at both cases, but its okay if
we get it only once and no need to get the same message at every time
system resumes.

So, limit showing this message only at system boot time by avoid showing
at system resume and reduce abusing kernel log buffer.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1411068135.5121.10.camel@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 12:56:05 +02:00
Ingo Molnar 43657ffb79 * Increase the number of early_ioremap slots to fix a regression with
earlyprintk=efi after recent changes to the ACPI code - Dave Young
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUG/pjAAoJEC84WcCNIz1VQToP+wfz21mZLPyurzCXycbrXi3I
 h2Jg99HhCe2TL1+9slSEJZvCFGsnspbkZtfA8OB0Ms7eomt3mY9yTwsfMVfyUZRM
 2DAWjK0eg9ZEAXcmU0tGbI4Iy9Kf/U0OPEsZ8ZKJkG1QGA+pHrKqhfdrKzP/pVV6
 iQ94OgBa1BZqljYureERvIourghN6hcwlJHqTLF66kzbFtL//2NQJIidHBcJRO0k
 CA/2U65WNEFXAtUenHHUnSM8hhN5H2NiSxPKg6p4iK6nsNcBL80rhEmMHYW80VHp
 PP7lXoCAbGARIfXbjPPNvn7aHL0qzpsF3fvNrtQRSkEwGjEod8gxHAqqzguLrK58
 G0HovhfUvEADXzjDiWCWUAIT/ryHutJLxXo6Fn2UijtAso+W/wUorqJlKNDG1dHX
 PRJKO0kk7TCKPBYxNRkds9Gt8g8uEoInr+V13qvURjwb6H9hFd8fyuL25bRUftdV
 TE4/pzm5F9YjLU2548RfXKcz5snSuOT9EYf3s4C8wTtm4tJv7GxQ+/vz000Hres3
 YlWaY2CywsrZfdBZaaF9gEtsjsdBVH+PspOGtodY0jjnz/I71H7oZNdgb8vslRLS
 6iH9+swCEm6cdytzqFXO1qp7DKLpxZJmwTHKhAvdSYzXZBvf1JMCQ6zdoXNfdNRR
 R8vYh1pvBd1tRJQ9tpJe
 =t4a9
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fix from Matt Fleming:

  * Increase the number of early_ioremap() slots to fix a regression with
    earlyprintk=efi after recent changes to the ACPI code (Dave Young)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 12:49:20 +02:00
Aaron Tomlin a70857e46d sched: Add helper for task stack page overrun checking
This facility is used in a few places so let's introduce
a helper function to improve code readability.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dzickus@redhat.com
Cc: bmr@redhat.com
Cc: jcastillo@redhat.com
Cc: oleg@redhat.com
Cc: riel@redhat.com
Cc: prarit@redhat.com
Cc: jgh@redhat.com
Cc: minchan@kernel.org
Cc: mpe@ellerman.id.au
Cc: tglx@linutronix.de
Cc: hannes@cmpxchg.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1410527779-8133-3-git-send-email-atomlin@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 12:35:23 +02:00
Aaron Tomlin d4311ff1a8 init/main.c: Give init_task a canary
Tasks get their end of stack set to STACK_END_MAGIC with the
aim to catch stack overruns. Currently this feature does not
apply to init_task. This patch removes this restriction.

Note that a similar patch was posted by Prarit Bhargava
some time ago but was never merged:

  http://marc.info/?l=linux-kernel&m=127144305403241&w=2

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dzickus@redhat.com
Cc: bmr@redhat.com
Cc: jcastillo@redhat.com
Cc: jgh@redhat.com
Cc: minchan@kernel.org
Cc: tglx@linutronix.de
Cc: hannes@cmpxchg.org
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Daeseok Youn <daeseok.youn@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 12:35:22 +02:00
Tang Chen f51770ed46 kvm: Make init_rmode_identity_map() return 0 on success.
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and ret is redundant.

This patch removes the redundant variable, and makes init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-17 13:10:12 +02:00
Tang Chen a255d4795f kvm: Remove ept_identity_pagetable from struct kvm_arch.
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
      As a result, it cannot be migrated/hot-removed. After this patch, since
      kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
      is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-17 13:10:11 +02:00