Commit Graph

19780 Commits

Author SHA1 Message Date
Radim Krčmář b4a2d31da8 KVM: VMX: dynamise PLE window
Window is increased on every PLE exit and decreased on every sched_in.
The idea is that we don't want to PLE exit if there is no preemption
going on.
We do this with sched_in() because it does not hold rq lock.

There are two new kernel parameters for changing the window:
 ple_window_grow and ple_window_shrink
ple_window_grow affects the window on PLE exit and ple_window_shrink
does it on sched_in;  depending on their value, the window is modifier
like this: (ple_window is kvm_intel's global)

  ple_window_shrink/ |
  ple_window_grow    | PLE exit           | sched_in
  -------------------+--------------------+---------------------
  < 1                |  = ple_window      |  = ple_window
  < ple_window       | *= ple_window_grow | /= ple_window_shrink
  otherwise          | += ple_window_grow | -= ple_window_shrink

A third new parameter, ple_window_max, controls the maximal ple_window;
it is internally rounded down to a closest multiple of ple_window_grow.

VCPU's PLE window is never allowed below ple_window.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21 18:45:23 +02:00
Radim Krčmář a7653ecdf3 KVM: VMX: make PLE window per-VCPU
Change PLE window into per-VCPU variable, seeded from module parameter,
to allow greater flexibility.

Brings in a small overhead on every vmentry.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21 18:45:22 +02:00
Radim Krčmář ae97a3b818 KVM: x86: introduce sched_in to kvm_x86_ops
sched_in preempt notifier is available for x86, allow its use in
specific virtualization technlogies as well.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21 18:45:22 +02:00
Radim Krčmář e790d9ef64 KVM: add kvm_arch_sched_in
Introduce preempt notifiers for architecture specific code.
Advantage over creating a new notifier in every arch is slightly simpler
code and guaranteed call order with respect to kvm_sched_in.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21 18:45:21 +02:00
Nadav Amit 6689fbe3cf KVM: x86: Replace X86_FEATURE_NX offset with the definition
Replace reference to X86_FEATURE_NX using bit shift with the defined
X86_FEATURE_NX.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21 13:50:23 +02:00
Paolo Bonzini e0ad0b477c KVM: emulate: warn on invalid or uninitialized exception numbers
These were reported when running Jailhouse on AMD processors.

Initialize ctxt->exception.vector with an invalid exception number,
and warn if it remained invalid even though the emulator got
an X86EMUL_PROPAGATE_FAULT return code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20 13:01:26 +02:00
Paolo Bonzini 592f085847 KVM: emulate: do not return X86EMUL_PROPAGATE_FAULT explicitly
Always get it through emulate_exception or emulate_ts.  This
ensures that the ctxt->exception fields have been populated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20 13:01:25 +02:00
Nadav Amit d27aa7f15c KVM: x86: Clarify PMU related features bit manipulation
kvm_pmu_cpuid_update makes a lot of bit manuiplation operations, when in fact
there are already unions that can be used instead. Changing the bit
manipulation to the union for clarity. This patch does not change the
functionality.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20 13:01:25 +02:00
Wanpeng Li a32e84594d KVM: vmx: fix ept reserved bits for 1-GByte page
EPT misconfig handler in kvm will check which reason lead to EPT
misconfiguration after vmexit. One of the reasons is that an EPT
paging-structure entry is configured with settings reserved for
future functionality. However, the handler can't identify if
paging-structure entry of reserved bits for 1-GByte page are
configured, since PDPTE which point to 1-GByte page will reserve
bits 29:12 instead of bits 7:3 which are reserved for PDPTE that
references an EPT Page Directory. This patch fix it by reserve
bits 29:12 for 1-GByte page.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20 10:13:40 +02:00
Nadav Amit 1e1b6c2644 KVM: x86: recalculate_apic_map after enabling apic
Currently, recalculate_apic_map ignores vcpus whose lapic is software disabled
through the spurious interrupt vector. However, once it is re-enabled, the map
is not recalculated. Therefore, if the guest OS configured DFR while lapic is
software-disabled, the map may be incorrect. This patch recalculates apic map
after software enabling the lapic.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:29 +02:00
Nadav Amit fae0ba2157 KVM: x86: Clear apic tsc-deadline after deadline
Intel SDM 10.5.4.1 says "When the timer generates an interrupt, it disarms
itself and clears the IA32_TSC_DEADLINE MSR".

This patch clears the MSR upon timer interrupt delivery which delivered on
deadline mode.  Since the MSR may be reconfigured while an interrupt is
pending, causing the new value to be overriden, pending timer interrupts are
checked before setting a new deadline.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:29 +02:00
Wanpeng Li d7a2a246a1 KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRs
Section 11.11.2.3 of the SDM mentions "All other bits in the IA32_MTRR_PHYSBASEn
and IA32_MTRR_PHYSMASKn registers are reserved; the processor generates a
general-protection exception(#GP) if software attempts to write to them". This
patch do it in kvm.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:29 +02:00
Wanpeng Li adfb5d2746 KVM: x86: fix check legal type of Variable Range MTRRs
The first entry in each pair(IA32_MTRR_PHYSBASEn) defines the base
address and memory type for the range; the second entry(IA32_MTRR_PHYSMASKn)
contains a mask used to determine the address range. The legal values
for the type field of IA32_MTRR_PHYSBASEn are 0,1,4,5, and 6. However,
IA32_MTRR_PHYSMASKn don't have type field. This patch avoid check if
the type field is legal for IA32_MTRR_PHYSMASKn.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:29 +02:00
Monam Agarwal 3b63a43f1e arch/x86: Use RCU_INIT_POINTER(x, NULL) in kvm/vmx.c
Here rcu_assign_pointer() is ensuring that the
initialization of a structure is carried out before storing a pointer
to that structure.
So, rcu_assign_pointer(p, NULL) can always safely be converted to
RCU_INIT_POINTER(p, NULL).

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:29 +02:00
Paolo Bonzini 15fc075269 KVM: x86: raise invalid TSS exceptions during a task switch
Conditions that would usually trigger a general protection fault should
instead raise #TS.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Wanpeng Li 4473b570a7 KVM: x86: drop fpu_activate hook
The only user of the fpu_activate hook was dropped in commit
2d04a05bd7 (KVM: x86 emulator: emulate CLTS internally, 2011-04-20).
vmx_fpu_activate and svm_fpu_activate are still called on #NM (and for
Intel CLTS), but never from common code; hence, there's no need for
a hook.

Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Wei Huang dc9b2d933a KVM: SVM: add rdmsr support for AMD event registers
Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_PERFCTR0
MSRs. Reading the rest MSRs will trigger KVM to inject #GP into
guest VM. This causes a warning message "Failed to access perfctr
msr (MSR c0010001 is ffffffffffffffff)" on AMD host. This patch
adds RDMSR support for all K7_EVNTSELn and K7_PERFCTRn registers
and thus supresses the warning message.

Signed-off-by: Wei Huang <wehuang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Paolo Bonzini 0d234daf7e Revert "KVM: x86: Increase the number of fixed MTRR regs to 10"
This reverts commit 682367c494,
which causes 32-bit SMP Windows 7 guests to panic.

SeaBIOS has a limit on the number of MTRRs that it can handle,
and this patch exceeded the limit.  Better revert it.
Thanks to Nadav Amit for debugging the cause.

Cc: stable@nongnu.org
Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Paolo Bonzini 9a4cfb27f7 KVM: x86: do not check CS.DPL against RPL during task switch
This reverts the check added by commit 5045b46803 (KVM: x86: check CS.DPL
against RPL during task switch, 2014-05-15).  Although the CS.DPL=CS.RPL
check is mentioned in table 7-1 of the SDM as causing a #TSS exception,
it is not mentioned in table 6-6 that lists "invalid TSS conditions"
which cause #TSS exceptions. In fact it causes some tests to fail, which
pass on bare-metal.

Keep the rest of the commit, since we will find new uses for it in 3.18.

Reported-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Nadav Amit 3a6095a017 KVM: x86: Avoid emulating instructions on #UD mistakenly
Commit d40a6898e5 mistakenly caused instructions which are not marked as
EmulateOnUD to be emulated upon #UD exception. The commit caused the check of
whether the instruction flags include EmulateOnUD to never be evaluated. As a
result instructions whose emulation is broken may be emulated.  This fix moves
the evaluation of EmulateOnUD so it would be evaluated.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Tweak operand order in &&, remove EmulateOnUD where it's now superfluous.
 - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Linus Torvalds 49899007b9 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull idle update from Len Brown:
 "Two Intel-platform-specific updates to intel_idle, and a cosmetic
  tweak to the turbostat utility"

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: tweak whitespace in output format
  intel_idle: Broadwell support
  intel_idle: Disable Baytrail Core and Module C6 auto-demotion
2014-08-16 09:25:34 -06:00
Len Brown 8c058d53f6 intel_idle: Disable Baytrail Core and Module C6 auto-demotion
Power efficiency improves on Baytrail (Intel Atom Processor E3000)
when Linux disables C6 auto-demotion.

Based on work by Srinidhi Kasagar <srinidhi.kasagar@intel.com>.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2014-08-15 17:06:14 -04:00
Linus Torvalds a11c5c9ef6 PCI changes for the v3.17 merge window (part 2):
Miscellaneous
     - Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT7PyAAAoJEFmIoMA60/r8kjQQALr/8oEfZoVcjgCb7waWOr25
 hUTnrI6GBIAh/50hoBiPq0ouPCAKVv66+CUhuhFkLP7oJz+rMU0B9hfUvdLfmCpH
 7ppaallkllT9nPFIr7h5RUWLXsoQyuHmCYmSrUCcnlT2LPgU0dN72YWElLisEM6Z
 Pldg3933xyIQaCWviHjGEjWb7NvC+JY4pTkV5iyqGgU8Ale/eFYtLLSfdBEjIbGv
 VDirYZmKELYeuncZPrTAsp4IENRMZn702wwDakMSODVMEWtJB5h4yrBawqQDlFP5
 9ztIX6n9p9zkdVKbYZlx/Xwv6SYEnYXLxauVQMSO3Nck7Z10R5Ud+5uuCg/6mWH8
 AQI4UV5bbJcg7zHgocTG9XLFLFPoPtD2JT6k6UT1LeUAiAOqcSzhRO+/qJBmJOWZ
 Zv+EHXPlxBrl0zNifut6ZQrY17teuItVtmha70a/9W3PjnIx3KecqLcTwdTvDsOY
 IAyH8WMZrBKpPpsczSmfE93i2Z1QRS91HEAOeSMxl/98dcDTdllYZS7spjoDll2f
 xmpGDbpriLSCu2XsGHfTC9RbqA7CyuFlHggJSQDkT/5Esli0sCs7eweTuK3RVvPu
 t6bUHK3yElb6x9qMZhb5q6l72wSMlGMishTdaxEHmqrEA8PtaIFodmVX2T/Zel5n
 GHN6bysPqDItNR2v/3JX
 =jJGu
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull DEFINE_PCI_DEVICE_TABLE removal from Bjorn Helgaas:
 "Part two of the PCI changes for v3.17:

    - Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine)

  It's a mechanical change that removes uses of the
  DEFINE_PCI_DEVICE_TABLE macro.  I waited until later in the merge
  window to reduce conflicts, but it's possible you'll still see a few"

* tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use
2014-08-14 18:10:33 -06:00
Linus Torvalds 4dacb91c7d Xen bug fixes for 3.17-rc0
- Fix ARM build.
 - Fix boot crash with PVH guests.
 - Improve reliability of resume/migration.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJT6y4zAAoJEFxbo/MsZsTRqI4H/3c+TXRjeB5787wkm3Lya1JO
 5+DsbbtlM13PT+Oi0Zop7xUYMPya7Xkwix0FvFvJ/q0/NFMf6RDJREtRy7atRkkp
 IRKccsGA/OH0ETcmQJOGHErnK51cAjMrm+ilJHIIYd6btEVCuP212G9f8CVGMyIK
 r0U6sVUxTJ5TPOLjvv5e8Rz+iTtP+OXvxaf2+rcqAf1QAJJ6S6+ipJZjZemX8OR/
 IoUPiee5Ou/ogQekt0f473vJrxX5nP7wLdJ2F7YA/A6F/HQHNbYDgQXf2IXrqFmb
 DpkpXxaowZGKXwnwqHUr3pVDdeMpvYJKZDPZmTQq2htrXufHIiY/E3qr9wGjqxk=
 =iVy1
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.17-b-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen bugfixes from David Vrabel:
 - fix ARM build
 - fix boot crash with PVH guests
 - improve reliability of resume/migration

* tag 'stable/for-linus-3.17-b-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: use vmap() to map grant table pages in PVH guests
  x86/xen: resume timer irqs early
  arm/xen: remove duplicate arch_gnttab_init() function
2014-08-14 09:40:44 -06:00
Linus Torvalds 81c02a21b2 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic updates from Thomas Gleixner:
 "This is a major overhaul to the x86 apic subsystem consisting of the
  following parts:

   - Remove obsolete APIC driver abstractions (David Rientjes)

   - Use the irqdomain facilities to dynamically allocate IRQs for
     IOAPICs.  This is a prerequisite to enable IOAPIC hotplug support,
     and it also frees up wasted vectors (Jiang Liu)

   - Misc fixlets.

  Despite the hickup in Ingos previous pull request - caused by the
  missing fixup for the suspend/resume issue reported by Borislav - I
  strongly recommend that this update finds its way into 3.17.  Some
  history for you:

  This is preparatory work for physical IOAPIC hotplug.  The first
  attempt to support this was done by Yinghai and I shot it down because
  it just added another layer of obscurity and complexity to the already
  existing mess without tackling the underlying shortcomings of the
  current implementation.

  After quite some on- and offlist discussions, I requested that the
  design of this functionality must use generic infrastructure, i.e.
  irq domains, which provide all the mechanisms to dynamically map linux
  interrupt numbers to physical interrupts.

  Jiang picked up the idea and did a great job of consolidating the
  existing interfaces to manage the x86 (IOAPIC) interrupt system by
  utilizing irq domains.

  The testing in tip, Linux-next and inside of Intel on various machines
  did not unearth any oddities until Borislav exposed it to one of his
  oddball machines.  The issue was resolved quickly, but unfortunately
  the fix fell through the cracks and did not hit the tip tree before
  Ingo sent the pull request.  Not entirely Ingos fault, I also assumed
  that the fix was already merged when Ingo asked me whether he could
  send it.

  Nevertheless this work has a proper design, has undergone several
  rounds of review and the final fallout after applying it to tip and
  integrating it into Linux-next has been more than moderate.  It's the
  ground work not only for IOAPIC hotplug, it will also allow us to move
  the lowlevel vector allocation into the irqdomain hierarchy, which
  will benefit other architectures as well.  Patches are posted already,
  but they are on hold for two weeks, see below.

  I really appreciate the competence and responsiveness Jiang has shown
  in course of this endavour.  So I'm sure that any fallout of this will
  be addressed in a timely manner.

  FYI, I'm vanishing for 2 weeks into my annual kids summer camp kitchen
  duty^Wvacation, while you folks are drooling at KS/LinuxCon :) But HPA
  will have a look at the hopefully zero fallout until I'm back"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation
  x86/apic/vsmp: Make is_vsmp_box() static
  x86, apic: Remove enable_apic_mode callback
  x86, apic: Remove setup_portio_remap callback
  x86, apic: Remove multi_timer_check callback
  x86, apic: Replace noop_check_apicid_used
  x86, apic: Remove check_apicid_present callback
  x86, apic: Remove mps_oem_check callback
  x86, apic: Remove smp_callin_clear_local_apic callback
  x86, apic: Replace trampoline physical addresses with defaults
  x86, apic: Remove x86_32_numa_cpu_node callback
  x86: intel-mid: Use the new io_apic interfaces
  x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box()
  x86, irq: Clean up irqdomain transition code
  x86, irq, devicetree: Release IOAPIC pin when PCI device is disabled
  x86, irq, SFI: Release IOAPIC pin when PCI device is disabled
  x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled
  x86, irq, ACPI: Release IOAPIC pin when PCI device is disabled
  x86, irq: Introduce helper functions to release IOAPIC pin
  x86, irq: Simplify the way to handle ISA IRQ
  ...
2014-08-13 18:23:32 -06:00
Linus Torvalds d27c0d9018 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/efix fixes from Peter Anvin:
 "Two EFI-related Kconfig changes, which happen to touch immediately
  adjacent lines in Kconfig and thus collapse to a single patch"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub
  x86/efi: Fix 3DNow optimization build failure in EFI stub
2014-08-13 18:21:35 -06:00
Linus Torvalds 7453f33b2e Merge branch 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/xsave changes from Peter Anvin:
 "This is a patchset to support the XSAVES instruction required to
  support context switch of supervisor-only features in upcoming
  silicon.

  This patchset missed the 3.16 merge window, which is why it is based
  on 3.15-rc7"

* 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, xsave: Add forgotten inline annotation
  x86/xsaves: Clean up code in xstate offsets computation in xsave area
  x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi)
  Define kernel API to get address of each state in xsave area
  x86/xsaves: Enable xsaves/xrstors
  x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf
  x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time
  x86/xsaves: Add xsaves and xrstors support for booting time
  x86/xsaves: Clear reserved bits in xsave header
  x86/xsaves: Use xsave/xrstor for saving and restoring user space context
  x86/xsaves: Use xsaves/xrstors for context switch
  x86/xsaves: Use xsaves/xrstors to save and restore xsave area
  x86/xsaves: Define a macro for handling xsave/xrstor instruction fault
  x86/xsaves: Define macros for xsave instructions
  x86/xsaves: Change compacted format xsave area header
  x86/alternative: Add alternative_input_2 to support alternative with two features and input
  x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
2014-08-13 18:20:04 -06:00
Benoit Taine 9baa3c34ac PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to
meet kernel coding style guidelines.  This issue was reported by checkpatch.

A simplified version of the semantic patch that makes this change is as
follows (http://coccinelle.lip6.fr/):

// <smpl>

@@
identifier i;
declarer name DEFINE_PCI_DEVICE_TABLE;
initializer z;
@@

- DEFINE_PCI_DEVICE_TABLE(i)
+ const struct pci_device_id i[]
= z;

// </smpl>

[bhelgaas: add semantic patch]
Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-08-12 12:15:14 -06:00
H. Peter Anvin 9e13bcf7e0 * Enforce CONFIG_RELOCATABLE for the x86 EFI boot stub, otherwise
it's possible to overwrite random pieces of unallocated memory during
    kernel decompression, leading to machine resets.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT4fcOAAoJEC84WcCNIz1Vh5gP/jDRY0hXYOGA319d4apbbQYJ
 8riMSSatbtuzAf+qipmI1IYNGY0YSDVeFO+dJ+lgCHLCp/spP0urdWNankkxIOH8
 n/I5XoXIlNWDhnqF7oYA72qbV4ntxNMs3LDyatnk2Hy9RJDj2xHZJecTjmxU5aAk
 /rl//aUkQ8LvYdkDCOgSZvn74da4de5LD4imt4YTU9tWTIJsIYYme6vjX5HzGwAH
 SqbQpOZED8lSbpKifOVv/W/iZGJV2hWeJRPDnqfPrRbJHHwBsmDL3usunqy9Z0Jf
 MA3mK3F7Y2pNF/ebOG2XIZFfmdbZzU9AfAz5zxbFhUQ3okwppdXz2OH6OYSgHTAP
 3x/PfYacnFDcTD2AXsXpfhnj39CyLqNbQriMkhbXBrT1FmOaNScJjIvxnJWWemcw
 iCZGno/XbPldbed9DlUVJUQW6gvdxdY5SZa7O/4jub+aIb2SYng2XxUuW1mXbV2N
 QmYSlhWWNxirIPVyg8SRDwUrF/jsXwtvmCgZ0nYoViVLR0+2a5d9LxlgzQuQjtA5
 gVRfCaq5VRN7ZtvxTIPRlM6+FUVGjz+VR7rKkoTwlSNcegX6PdYTxVxWicrNbsde
 Xt32Qh8kLOOhZwt89/c+3fiHDrzujBNYu0FSTlNEHK9EVL4kFW7njGUQWKOrmWyK
 2HpLM116572eOqaQoJhi
 =4YFS
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/efi

 * Enforce CONFIG_RELOCATABLE for the x86 EFI boot stub, otherwise
   it's possible to overwrite random pieces of unallocated memory during
   kernel decompression, leading to machine resets.

Resolved Conflicts:
	arch/x86/Kconfig

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-08-11 13:58:54 -07:00
David Vrabel 7d951f3ccb x86/xen: use vmap() to map grant table pages in PVH guests
Commit b7dd0e350e (x86/xen: safely map and unmap grant frames when
in atomic context) causes PVH guests to crash in
arch_gnttab_map_shared() when they attempted to map the pages for the
grant table.

This use of a PV-specific function during the PVH grant table setup is
non-obvious and not needed.  The standard vmap() function does the
right thing.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Tested-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Cc: stable@vger.kernel.org
2014-08-11 11:59:35 +01:00
David Vrabel 8d5999df35 x86/xen: resume timer irqs early
If the timer irqs are resumed during device resume it is possible in
certain circumstances for the resume to hang early on, before device
interrupts are resumed.  For an Ubuntu 14.04 PVHVM guest this would
occur in ~0.5% of resume attempts.

It is not entirely clear what is occuring the point of the hang but I
think a task necessary for the resume calls schedule_timeout(),
waiting for a timer interrupt (which never arrives).  This failure may
require specific tasks to be running on the other VCPUs to trigger
(processes are not frozen during a suspend/resume if PREEMPT is
disabled).

Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in
syscore_resume().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
2014-08-11 11:59:34 +01:00
Linus Torvalds 63b12bdb0d Merge branch 'signal-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc
Pull arch signal handling cleanup from Richard Weinberger:
 "This patch series moves all remaining archs to the get_signal(),
  signal_setup_done() and sigsp() functions.

  Currently these archs use open coded variants of the said functions.
  Further, unused parameters get removed from get_signal_to_deliver(),
  tracehook_signal_handler() and signal_delivered().

  At the end of the day we save around 500 lines of code."

* 'signal-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (43 commits)
  powerpc: Use sigsp()
  openrisc: Use sigsp()
  mn10300: Use sigsp()
  mips: Use sigsp()
  microblaze: Use sigsp()
  metag: Use sigsp()
  m68k: Use sigsp()
  m32r: Use sigsp()
  hexagon: Use sigsp()
  frv: Use sigsp()
  cris: Use sigsp()
  c6x: Use sigsp()
  blackfin: Use sigsp()
  avr32: Use sigsp()
  arm64: Use sigsp()
  arc: Use sigsp()
  sas_ss_flags: Remove nested ternary if
  Rip out get_signal_to_deliver()
  Clean up signal_delivered()
  tracehook_signal_handler: Remove sig, info, ka and regs
  ...
2014-08-09 09:58:12 -07:00
Linus Torvalds 8065be8d03 Merge branch 'akpm' (second patchbomb from Andrew Morton)
Merge more incoming from Andrew Morton:
 "Two new syscalls:

     memfd_create in "shm: add memfd_create() syscall"
     kexec_file_load in "kexec: implementation of new syscall kexec_file_load"

  And:

   - Most (all?) of the rest of MM

   - Lots of the usual misc bits

   - fs/autofs4

   - drivers/rtc

   - fs/nilfs

   - procfs

   - fork.c, exec.c

   - more in lib/

   - rapidio

   - Janitorial work in filesystems: fs/ufs, fs/reiserfs, fs/adfs,
     fs/cramfs, fs/romfs, fs/qnx6.

   - initrd/initramfs work

   - "file sealing" and the memfd_create() syscall, in tmpfs

   - add pci_zalloc_consistent, use it in lots of places

   - MAINTAINERS maintenance

   - kexec feature work"

* emailed patches from Andrew Morton <akpm@linux-foundation.org: (193 commits)
  MAINTAINERS: update nomadik patterns
  MAINTAINERS: update usb/gadget patterns
  MAINTAINERS: update DMA BUFFER SHARING patterns
  kexec: verify the signature of signed PE bzImage
  kexec: support kexec/kdump on EFI systems
  kexec: support for kexec on panic using new system call
  kexec-bzImage64: support for loading bzImage using 64bit entry
  kexec: load and relocate purgatory at kernel load time
  purgatory: core purgatory functionality
  purgatory/sha256: provide implementation of sha256 in purgaotory context
  kexec: implementation of new syscall kexec_file_load
  kexec: new syscall kexec_file_load() declaration
  kexec: make kexec_segment user buffer pointer a union
  resource: provide new functions to walk through resources
  kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc()
  kexec: move segment verification code in a separate function
  kexec: rename unusebale_pages to unusable_pages
  kernel: build bin2c based on config option CONFIG_BUILD_BIN2C
  bin2c: move bin2c in scripts/basic
  shm: wait for pins to be released when sealing
  ...
2014-08-08 15:57:47 -07:00
Vivek Goyal 8e7d838103 kexec: verify the signature of signed PE bzImage
This is the final piece of the puzzle of verifying kernel image signature
during kexec_file_load() syscall.

This patch calls into PE file routines to verify signature of bzImage.  If
signature are valid, kexec_file_load() succeeds otherwise it fails.

Two new config options have been introduced.  First one is
CONFIG_KEXEC_VERIFY_SIG.  This option enforces that kernel has to be
validly signed otherwise kernel load will fail.  If this option is not
set, no signature verification will be done.  Only exception will be when
secureboot is enabled.  In that case signature verification should be
automatically enforced when secureboot is enabled.  But that will happen
when secureboot patches are merged.

Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
enables signature verification support on bzImage.  If this option is not
set and previous one is set, kernel image loading will fail because kernel
does not have support to verify signature of bzImage.

I tested these patches with both "pesign" and "sbsign" signed bzImages.

I used signing_key.priv key and signing_key.x509 cert for signing as
generated during kernel build process (if module signing is enabled).

Used following method to sign bzImage.

pesign
======
- Convert DER format cert to PEM format cert
openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform
PEM

- Generate a .p12 file from existing cert and private key file
openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in
signing_key.x509.PEM

- Import .p12 file into pesign db
pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign

- Sign bzImage
pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign
-c "Glacier signing key - Magrathea" -s

sbsign
======
sbsign --key signing_key.priv --cert signing_key.x509.PEM --output
/boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+

Patch details:

Well all the hard work is done in previous patches.  Now bzImage loader
has just call into that code and verify whether bzImage signature are
valid or not.

Also create two config options.  First one is CONFIG_KEXEC_VERIFY_SIG.
This option enforces that kernel has to be validly signed otherwise kernel
load will fail.  If this option is not set, no signature verification will
be done.  Only exception will be when secureboot is enabled.  In that case
signature verification should be automatically enforced when secureboot is
enabled.  But that will happen when secureboot patches are merged.

Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
enables signature verification support on bzImage.  If this option is not
set and previous one is set, kernel image loading will fail because kernel
does not have support to verify signature of bzImage.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Matt Fleming <matt@console-pimps.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:33 -07:00
Vivek Goyal 6a2c20e7d8 kexec: support kexec/kdump on EFI systems
This patch does two things.  It passes EFI run time mappings to second
kernel in bootparams efi_info.  Second kernel parse this info and create
new mappings in second kernel.  That means mappings in first and second
kernel will be same.  This paves the way to enable EFI in kexec kernel.

This patch also prepares and passes EFI setup data through bootparams.
This contains bunch of information about various tables and their
addresses.

These information gathering and passing has been written along the lines
of what current kexec-tools is doing to make kexec work with UEFI.

[akpm@linux-foundation.org: s/get_efi/efi_get/g, per Matt]
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:33 -07:00
Vivek Goyal dd5f726076 kexec: support for kexec on panic using new system call
This patch adds support for loading a kexec on panic (kdump) kernel usning
new system call.

It prepares ELF headers for memory areas to be dumped and for saved cpu
registers.  Also prepares the memory map for second kernel and limits its
boot to reserved areas only.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:33 -07:00
Vivek Goyal 27f48d3e63 kexec-bzImage64: support for loading bzImage using 64bit entry
This is loader specific code which can load bzImage and set it up for
64bit entry.  This does not take care of 32bit entry or real mode entry.

32bit mode entry can be implemented if somebody needs it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:33 -07:00
Vivek Goyal 12db5562e0 kexec: load and relocate purgatory at kernel load time
Load purgatory code in RAM and relocate it based on the location.
Relocation code has been inspired by module relocation code and purgatory
relocation code in kexec-tools.

Also compute the checksums of loaded kexec segments and store them in
purgatory.

Arch independent code provides this functionality so that arch dependent
bootloaders can make use of it.

Helper functions are provided to get/set symbol values in purgatory which
are used by bootloaders later to set things like stack and entry point of
second kernel etc.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:32 -07:00
Vivek Goyal 8fc5b4d412 purgatory: core purgatory functionality
Create a stand alone relocatable object purgatory which runs between two
kernels.  This name, concept and some code has been taken from
kexec-tools.  Idea is that this code runs after a crash and it runs in
minimal environment.  So keep it separate from rest of the kernel and in
long term we will have to practically do no maintenance of this code.

This code also has the logic to do verify sha256 hashes of various
segments which have been loaded into memory.  So first we verify that the
kernel we are jumping to is fine and has not been corrupted and make
progress only if checsums are verified.

This code also takes care of copying some memory contents to backup region.

[sfr@canb.auug.org.au: run host built programs from objtree]
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:32 -07:00
Vivek Goyal daeba0641a purgatory/sha256: provide implementation of sha256 in purgaotory context
Next two patches provide code for purgatory.  This is a code which does
not link against the kernel and runs stand alone.  This code runs between
two kernels.  One of the primary purpose of this code is to verify the
digest of newly loaded kernel and making sure it matches the digest
computed at kernel load time.

We use sha256 for calculating digest of kexec segmetns.  Purgatory can't
use stanard crypto API as that API is not available in purgatory context.

Hence, I have copied code from crypto/sha256_generic.c and compiled it
with purgaotry code so that it could be used.  I could not #include
sha256_generic.c file here as some of the function signature requiered
little tweaking.  Original functions work with crypto API but these ones
don't

So instead of doing #include on sha256_generic.c I just copied relevant
portions of code into arch/x86/purgatory/sha256.c.  Now we shouldn't have
to touch this code at all.  Do let me know if there are better ways to
handle it.

This patch does not enable compiling of this code.  That happens in next
patch.  I wanted to highlight this change in a separate patch for easy
review.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:32 -07:00
Vivek Goyal cb1052581e kexec: implementation of new syscall kexec_file_load
Previous patch provided the interface definition and this patch prvides
implementation of new syscall.

Previously segment list was prepared in user space.  Now user space just
passes kernel fd, initrd fd and command line and kernel will create a
segment list internally.

This patch contains generic part of the code.  Actual segment preparation
and loading is done by arch and image specific loader.  Which comes in
next patch.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:32 -07:00
Vivek Goyal f0895685c7 kexec: new syscall kexec_file_load() declaration
This is the new syscall kexec_file_load() declaration/interface.  I have
reserved the syscall number only for x86_64 so far.  Other architectures
(including i386) can reserve syscall number when they enable the support
for this new syscall.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:32 -07:00
Vivek Goyal de5b56ba51 kernel: build bin2c based on config option CONFIG_BUILD_BIN2C
currently bin2c builds only if CONFIG_IKCONFIG=y. But bin2c will now be
used by kexec too.  So make it compilation dependent on CONFIG_BUILD_BIN2C
and this config option can be selected by CONFIG_KEXEC and CONFIG_IKCONFIG.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:32 -07:00
David Herrmann 9183df25fe shm: add memfd_create() syscall
memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
that you can pass to mmap().  It can support sealing and avoids any
connection to user-visible mount-points.  Thus, it's not subject to quotas
on mounted file-systems, but can be used like malloc()'ed memory, but with
a file-descriptor to it.

memfd_create() returns the raw shmem file, so calls like ftruncate() can
be used to modify the underlying inode.  Also calls like fstat() will
return proper information and mark the file as regular file.  If you want
sealing, you can specify MFD_ALLOW_SEALING.  Otherwise, sealing is not
supported (like on all other regular files).

Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
subject to a filesystem size limit.  It is still properly accounted to
memcg limits, though, and to the same overcommit or no-overcommit
accounting as all user memory.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:31 -07:00
Daniel Walter 164109e3cd arch/x86: replace strict_strto calls
Replace obsolete strict_strto calls with appropriate kstrto calls

Signed-off-by: Daniel Walter <dwalter@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:28 -07:00
Andy Lutomirski a6c19dfe39 arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area
The core mm code will provide a default gate area based on
FIXADDR_USER_START and FIXADDR_USER_END if
!defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).

This default is only useful for ia64.  arm64, ppc, s390, sh, tile, 64-bit
UML, and x86_32 have their own code just to disable it.  arm, 32-bit UML,
and x86_64 have gate areas, but they have their own implementations.

This gets rid of the default and moves the code into ia64.

This should save some code on architectures without a gate area: it's now
possible to inline the gate_area functions in the default case.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
Acked-by: Richard Weinberger <richard@nod.at> [for um]
Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:27 -07:00
Laura Abbott 308c09f17d lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
architecture specific scatterlist.h, make it a proper Kconfig option and
use that instead.  At same time, remove the header files are are now
mostly useless and just include asm-generic/scatterlist.h.

[sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>			[x86]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	[powerpc]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:26 -07:00
Jiang Liu 3eec595235 x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation
Now IOAPIC driver dynamically allocates IRQ numbers for IOAPIC pins.
We need to keep IRQ assignment for PCI devices during suspend/hibernation,
otherwise it may cause failure of suspend/hibernation due to:
1) Device driver calls pci_enable_device() to allocate an IRQ number
   and register interrupt handler on the returned IRQ.
2) Device driver's suspend callback calls pci_disable_device() and
   release assigned IRQ in turn.
3) Device driver's resume callback calls pci_enable_device() to
   allocate IRQ number again. A different IRQ number may be assigned
   by IOAPIC driver this time.
4) Now the hardware delivers interrupt to the new IRQ but interrupt
   handler is still registered against the old IRQ, so it breaks
   suspend/hibernation.

To fix this issue, we keep IRQ assignment during suspend/hibernation.
Flag pci_dev.dev.power.is_prepared is used to detect that
pci_disable_device() is called during suspend/hibernation.

Reported-and-Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/1407478071-29399-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-08 11:14:45 +02:00
Linus Torvalds 66bb0aa077 Here are the PPC and ARM changes for KVM, which I separated because
they had small conflicts (respectively within KVM documentation,
 and with 3.16-rc changes).  Since they were all within the subsystem,
 I took care of them.
 
 Stephen Rothwell reported some snags in PPC builds, but they are all
 fixed now; the latest linux-next report was clean.
 
 New features for ARM include:
 - KVM VGIC v2 emulation on GICv3 hardware
 - Big-Endian support for arm/arm64 (guest and host)
 - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)
 
 And for PPC:
 - Book3S: Good number of LE host fixes, enable HV on LE
 - Book3S HV: Add in-guest debug support
 
 This release drops support for KVM on the PPC440.  As a result, the
 PPC merge removes more lines than it adds. :)
 
 I also included an x86 change, since Davidlohr tied it to an independent
 bug report and the reporter quickly provided a Tested-by; there was no
 reason to wait for -rc2.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJT4iIJAAoJEBvWZb6bTYbyZqoP/3Wxy8NWPFJ8HGt81NHlGnDS
 a9UbL7EibcOEG+aaKqmtBglTD5YDiGBDNCxxiSJaDHt+grLN4fsWIliJob1nJFoO
 90f89EWN2XjeCrJXA5nUoeg5tpc5OoYKsiP6pTgzIwkP8vvs/H1+zpcTS/UmYsr/
 qipVMMsM+zZeHWZcSbqjW88z7YqIn1sr5282wJ85cbyv4KGizb/G4dyPuDqLb6np
 hkAD8Ah6VV2suQ2FSy7G2fg20R0vglUi60hkEHLoCBPVqJCl7SmC8MvxNbjBnP8S
 J36R0R0u1wHYKzAGooLJGVOZ/o/gSiVqKX+++L2EvJBN+kuA6u/7fxLyBT+LwDAE
 IF/Aln5rpg1fe+eywvhz86WljTVEQ8bO1zVsIQUPY+/ZOPedZHMwyvXft8ogbjSp
 2m9OJ/3e8Aggh0OeHpCDoeow+QDUXvX0YdCw+2Yh0p+7VMXqkyp0QEiBu38jrusC
 rB3VNifJbDSWLKdG9LfCAPHnxZD2XYEwv2WFBo6KQOGMGHfx0GXpCOL/jQihrhA6
 HtEG5Bs3lvnHQemdpUZ58xojiABbMaUPdcnPXQQEp23WhZzrfLMLzqVG0VYnhSsC
 9pi7MJj8c31rqx5WU2oRM28i/BvNxN0NCtkDpineO5s3f89Ws1xnwxqlm38AKP0J
 irJQTYFEqec+GM9JK1rG
 =hyQP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second round of KVM changes from Paolo Bonzini:
 "Here are the PPC and ARM changes for KVM, which I separated because
  they had small conflicts (respectively within KVM documentation, and
  with 3.16-rc changes).  Since they were all within the subsystem, I
  took care of them.

  Stephen Rothwell reported some snags in PPC builds, but they are all
  fixed now; the latest linux-next report was clean.

  New features for ARM include:
   - KVM VGIC v2 emulation on GICv3 hardware
   - Big-Endian support for arm/arm64 (guest and host)
   - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)

  And for PPC:
   - Book3S: Good number of LE host fixes, enable HV on LE
   - Book3S HV: Add in-guest debug support

  This release drops support for KVM on the PPC440.  As a result, the
  PPC merge removes more lines than it adds.  :)

  I also included an x86 change, since Davidlohr tied it to an
  independent bug report and the reporter quickly provided a Tested-by;
  there was no reason to wait for -rc2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits)
  KVM: Move more code under CONFIG_HAVE_KVM_IRQFD
  KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
  KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
  KVM: PPC: Enable IRQFD support for the XICS interrupt controller
  KVM: Give IRQFD its own separate enabling Kconfig option
  KVM: Move irq notifier implementation into eventfd.c
  KVM: Move all accesses to kvm::irq_routing into irqchip.c
  KVM: irqchip: Provide and use accessors for irq routing table
  KVM: Don't keep reference to irq routing table in irqfd struct
  KVM: PPC: drop duplicate tracepoint
  arm64: KVM: fix 64bit CP15 VM access for 32bit guests
  KVM: arm64: GICv3: mandate page-aligned GICV region
  arm64: KVM: GICv3: move system register access to msr_s/mrs_s
  KVM: PPC: PR: Handle FSCR feature deselects
  KVM: PPC: HV: Remove generic instruction emulation
  KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
  KVM: PPC: Remove DCR handling
  KVM: PPC: Expose helper functions for data/inst faults
  KVM: PPC: Separate loadstore emulation from priv emulation
  KVM: PPC: Handle magic page in kvmppc_ld/st
  ...
2014-08-07 11:35:30 -07:00
Linus Torvalds e306e3be1c - Remove unused V2 grant table support.
- Note that Konrad is xen-blkkback/front maintainer.
 - Add 'xen_nopv' option to disable PV extentions for x86 HVM guests.
 - Misc. minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJT4N57AAoJEFxbo/MsZsTRtfsH/2GxmloKqMZqusnz5PR/x2hd
 M3aXtDw36rxv3hEciIs/NX6obMenRdofDKXVMafnU/gw+EOBQQQ2n/nDqcLOSN+0
 hVyrKHgByYQKaeAhAbrGiGIkuoe5JAURsaggx/YlYSx3hkE0za1XmcUjkPFEVP3l
 UeXXJ40H9hHgESsDwd1UQ08YNtvwdaWVHJAjio3jSxCBAHnAPhCqPhKVy/6LOr+U
 T6HgYsX9HLQRYBy34OOYfKBFnGOJpstnZJd3hMTYtrF4xaTl/Cnf+YxKxv/XJtGD
 YHukhQaEyws7RaDAXK1Uty1hlqgzDoVcFz1TixJIrF6YaO2QhhjMa/oYkbBW09s=
 =Ojrz
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.17-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen updates from David Vrabel:
 - remove unused V2 grant table support
 - note that Konrad is xen-blkkback/front maintainer
 - add 'xen_nopv' option to disable PV extentions for x86 HVM guests
 - misc minor cleanups

* tag 'stable/for-linus-3.17-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen-pciback: Document the 'quirks' sysfs file
  xen/pciback: Fix error return code in xen_pcibk_attach()
  xen/events: drop negativity check of unsigned parameter
  xen/setup: Remove Identity Map Debug Message
  xen/events/fifo: remove a unecessary use of BM()
  xen/events/fifo: ensure all bitops are properly aligned even on x86
  xen/events/fifo: reset control block and local HEADs on resume
  xen/arm: use BUG_ON
  xen/grant-table: remove support for V2 tables
  x86/xen: safely map and unmap grant frames when in atomic context
  MAINTAINERS: Make me the Xen block subsystem (front and back) maintainer
  xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests.
2014-08-07 11:33:15 -07:00