Commit Graph

33482 Commits

Author SHA1 Message Date
leoliu-oc 917b905e6c x86/mce: Add Zhaoxin CMCI support
mainline inclusion
from mainline-v5.4-rc1
commit <5a3d56a034be9e8e87a6cb9ed3f2928184db1417>
category: feature

-------------------

All newer Zhaoxin CPUs support CMCI and are compatible with Intel's
Machine-Check Architecture. Add that support for Zhaoxin CPUs.

 [ bp: Massage comments and export intel_init_cmci(). ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: CooperYan@zhaoxin.com
Cc: DavidWang@zhaoxin.com
Cc: HerryYang@zhaoxin.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: QiyuanWang@zhaoxin.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1568787573-1297-4-git-send-email-TonyWWang-oc@zhaoxin.com

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-08-26 17:44:56 +08:00
leoliu-oc 48e4acdaec x86/mce: Add Zhaoxin MCE support
mainline inclusion
from mainline-v5.4-rc1
commit <6e898d2bf67a82df0aa0c955adc9278faba9a635>
category: feature

-------------------

All newer Zhaoxin CPUs are compatible with Intel's Machine-Check
Architecture, so add support for them.

 [ bp: Reflow comment in vendor_disable_error_reporting() and massage
   commit message. ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: CooperYan@zhaoxin.com
Cc: DavidWang@zhaoxin.com
Cc: HerryYang@zhaoxin.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: QiyuanWang@zhaoxin.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1568787573-1297-2-git-send-email-TonyWWang-oc@zhaoxin.com

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-08-26 17:44:54 +08:00
leoliu-oc 8beb9ef189 x86/cpu/centaur: Add Centaur family >=7 CPUs initialization support
mainline inclusion
from mainline-v5.9-rc1
commit <33b4711df4c1b3aec7c267c60fc24abccfadd40c>
category: feature

-------------------
Add Centaur family >=7 CPUs specific initialization support.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1599562666-31351-3-git-send-email-TonyWWang-oc@zhaoxin.com

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-08-26 16:54:08 +08:00
leoliu-oc 0e070b4bb8 x86/cpu/centaur: Replace two-condition switch-case with an if statement
mainline inclusion
from mainline-v5.9-rc1
commit <8687bdc04128b2bd16faaae11db10128ad0da7b8>
category: feature

-------------------

Use a normal if statements instead of a two-condition switch-case.

 [ bp: Massage commit message. ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1599562666-31351-2-git-send-email-TonyWWang-oc@zhaoxin.com

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-08-26 16:54:06 +08:00
leoliu-oc adbb2ac925 x86/cpu: Remove redundant cpu_detect_cache_sizes() call
mainline inclusion
from mainline-v5.5-rc1
commit <283bab9809786cf41798512f5c1e97f4b679ba96>
category: feature

-------------------

Both functions call init_intel_cacheinfo() which computes L2 and L3 cache
sizes from CPUID(4). But then they also call cpu_detect_cache_sizes() a
bit later which computes ->x86_tlbsize and L2 size from CPUID(80000006).

However, the latter call is not needed because

 - on these CPUs, CPUID(80000006).EBX for ->x86_tlbsize is reserved

 - CPUID(80000006).ECX for the L2 size has the same result as CPUID(4)

Therefore, remove the latter call to simplify the code.

 [ bp: Rewrite commit message. ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1579075257-6985-1-git-send-email-TonyWWang-oc@zhaoxin.com

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-08-26 16:42:42 +08:00
leoliu-oc 62982bda95 x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization
mainline inclusion
from mainline-v5.5-rc1
commit <7d37953ba81121c8725f99356f7ee9762d4c3ed9>
category: feature

-------------------

Use the recently added IA32_FEAT_CTL MSR initialization sequence to
opportunistically enable VMX support when running on a Zhaoxin CPU.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-8-sean.j.christopherson@intel.com

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-08-26 16:38:35 +08:00
leoliu-oc dd1d37a84b x86/centaur: Use common IA32_FEAT_CTL MSR initialization
mainline inclusion
from mainline-v5.5-rc1
commit <501444905fcb4166589fda99497c273ac5efc65e>
category: feature

-------------------

Use the recently added IA32_FEAT_CTL MSR initialization sequence to
opportunistically enable VMX support when running on a Centaur CPU.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-7-sean.j.christopherson@intel.com

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-08-26 16:37:04 +08:00
Carrie.Cai 403acd3e81 add support for Montage Mont-TSSE driver
Signed-off-by: Carrie.Cai <carrie.cai@montage-tech.com>
2024-06-12 13:16:49 +08:00
Huang Cun bec7f2888b config: remove CONFIG_BT_SHARE_CFS_BANDWIDTH and CONFIG_HT_ISOLATE
The CI check will execute make dist-check-diff-config, and the code
controlled by these two macros has been deprecated, so remove it.

Signed-off-by: Huang Cun <cunhuang@tencent.com>
2024-06-12 13:16:47 +08:00
Jianping Liu 3154060704 tkernel: sync code to the same with tk4 pub/lts/0017-kabi
Sync code to the same with tk4 pub/lts/0017-kabi, except deleted rue
and wujing. Partners can submit pull requests to this branch, and we
can pick the commits to tk4 pub/lts/0017-kabi easly.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-06-12 13:13:20 +08:00
Jianping Liu c62d6b571d ock: sync codes to ock 5.4.119-20.0009.21
Gitee limit the repo's size to 3GB, to reduce the size of the code,
sync codes to ock 5.4.119-20.0009.21 in one commit.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-06-11 20:27:38 +08:00
Jianping Liu be16237b31 tkernel: add base tlinux kernel interfaces
Sync kernel codes to the same with 590eaf1fec ("Init Repo base on
linux 5.4.32 long term, and add base tlinux kernel interfaces."), which
is from tk4, and it is the base of tk4.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-06-11 20:09:33 +08:00
Linus Torvalds fe30021c36 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two fixes: disable unreliable HPET on Intel Coffe Lake platforms, and
  fix a lockdep splat in the resctrl code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix potential lockdep warning
  x86/quirks: Disable HPET on Intel Coffe Lake platforms
2019-11-16 16:10:59 -08:00
Sean Christopherson ed69a6cb70 KVM: x86/mmu: Take slots_lock when using kvm_mmu_zap_all_fast()
Acquire the per-VM slots_lock when zapping all shadow pages as part of
toggling nx_huge_pages.  The fast zap algorithm relies on exclusivity
(via slots_lock) to identify obsolete vs. valid shadow pages, because it
uses a single bit for its generation number. Holding slots_lock also
obviates the need to acquire a read lock on the VM's srcu.

Failing to take slots_lock when toggling nx_huge_pages allows multiple
instances of kvm_mmu_zap_all_fast() to run concurrently, as the other
user, KVM_SET_USER_MEMORY_REGION, does not take the global kvm_lock.
(kvm_mmu_zap_all_fast() does take kvm->mmu_lock, but it can be
temporarily dropped by kvm_zap_obsolete_pages(), so it is not enough
to enforce exclusivity).

Concurrent fast zap instances causes obsolete shadow pages to be
incorrectly identified as valid due to the single bit generation number
wrapping, which results in stale shadow pages being left in KVM's MMU
and leads to all sorts of undesirable behavior.
The bug is easily confirmed by running with CONFIG_PROVE_LOCKING and
toggling nx_huge_pages via its module param.

Note, until commit 4ae5acbc4936 ("KVM: x86/mmu: Take slots_lock when
using kvm_mmu_zap_all_fast()", 2019-11-13) the fast zap algorithm used
an ulong-sized generation instead of relying on exclusivity for
correctness, but all callers except the recently added set_nx_huge_pages()
needed to hold slots_lock anyways.  Therefore, this patch does not have
to be backported to stable kernels.

Given that toggling nx_huge_pages is by no means a fast path, force it
to conform to the current approach instead of reintroducing the previous
generation count.

Fixes: b8e8c8303f ("kvm: mmu: ITLB_MULTIHIT mitigation", but NOT FOR STABLE)
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-14 16:21:22 +01:00
Xiaoyao Li 6cbee2b9ec KVM: X86: Reset the three MSR list number variables to 0 in kvm_init_msr_list()
When applying commit 7a5ee6edb4 ("KVM: X86: Fix initialization of MSR
lists"), it forgot to reset the three MSR lists number varialbes to 0
while removing the useless conditionals.

Fixes: 7a5ee6edb4 (KVM: X86: Fix initialization of MSR lists)
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-13 15:50:39 +01:00
Paolo Bonzini 13fb59276b kvm: x86: disable shattered huge page recovery for PREEMPT_RT.
If a huge page is recovered (and becomes no executable) while another
thread is executing it, the resulting contention on mmu_lock can cause
latency spikes.  Disabling recovery for PREEMPT_RT kernels fixes this
issue.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-13 15:47:06 +01:00
Xiaochen Shen c8eafe1495 x86/resctrl: Fix potential lockdep warning
rdtgroup_cpus_write() and mkdir_rdt_prepare() call
rdtgroup_kn_lock_live() -> kernfs_to_rdtgroup() to get 'rdtgrp', and
then call the rdt_last_cmd_{clear,puts,...}() functions which will check
if rdtgroup_mutex is held/requires its caller to hold rdtgroup_mutex.

But if 'rdtgrp' returned from kernfs_to_rdtgroup() is NULL,
rdtgroup_mutex is not held and calling rdt_last_cmd_{clear,puts,...}()
will result in a self-incurred, potential lockdep warning.

Remove the rdt_last_cmd_{clear,puts,...}() calls in these two paths.
Just returning error should be sufficient to report to the user that the
entry doesn't exist any more.

 [ bp: Massage. ]

Fixes: 94457b36e8 ("x86/intel_rdt: Add diagnostics when writing the cpus file")
Fixes: cfd0f34e4c ("x86/intel_rdt: Add diagnostics when making directories")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: pei.p.jia@intel.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1573079796-11713-1-git-send-email-xiaochen.shen@intel.com
2019-11-13 12:34:44 +01:00
Linus Torvalds 8c5bd25bf4 Bugfixes: unwinding of KVM_CREATE_VM failure,
VT-d posted interrupts, DAX/ZONE_DEVICE,
 module unload/reload.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdyrEsAAoJEL/70l94x66DIOkH/Asqrh4o4pwfRHWE+9rnM6PI
 j8oFi7Q4eRXJnP4zEMnMbb6xD/BfSH1tWEcPcYgIxD/t0DFx8F92/xsETAJ/Qc5n
 CWpmnhMkJqERlV+GSRuBqnheMo0CEH1Ab1QZKhh5U3//pK3OtGY9WyydJHWcquTh
 bGh2pnxwVZOtIIEmclUUfKjyR2Fu8hJLnQwzWgYZ27UK7J2pLmiiTX0vwQG359Iq
 sDn9ND33pCBW5e/D2mzccRjOJEvzwrumewM1sRDsoAYLJzUjg9+xD83vZDa1d7R6
 gajCDFWVJbPoLvUY+DgsZBwMMlogElimJMT/Zft3ERbCsYJbFvcmwp4JzyxDxQ4=
 =J6KN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Fix unwinding of KVM_CREATE_VM failure, VT-d posted interrupts,
  DAX/ZONE_DEVICE, and module unload/reload"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved
  KVM: VMX: Introduce pi_is_pir_empty() helper
  KVM: VMX: Do not change PID.NDST when loading a blocked vCPU
  KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts
  KVM: VMX: Fix comment to specify PID.ON instead of PIR.ON
  KVM: X86: Fix initialization of MSR lists
  KVM: fix placement of refcount initialization
  KVM: Fix NULL-ptr deref after kvm_create_vm fails
2019-11-12 13:19:15 -08:00
Linus Torvalds eb094f0696 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TSX Async Abort and iTLB Multihit mitigations from Thomas Gleixner:
 "The performance deterioration departement is not proud at all of
  presenting the seventh installment of speculation mitigations and
  hardware misfeature workarounds:

   1) TSX Async Abort (TAA) - 'The Annoying Affair'

      TAA is a hardware vulnerability that allows unprivileged
      speculative access to data which is available in various CPU
      internal buffers by using asynchronous aborts within an Intel TSX
      transactional region.

      The mitigation depends on a microcode update providing a new MSR
      which allows to disable TSX in the CPU. CPUs which have no
      microcode update can be mitigated by disabling TSX in the BIOS if
      the BIOS provides a tunable.

      Newer CPUs will have a bit set which indicates that the CPU is not
      vulnerable, but the MSR to disable TSX will be available
      nevertheless as it is an architected MSR. That means the kernel
      provides the ability to disable TSX on the kernel command line,
      which is useful as TSX is a truly useful mechanism to accelerate
      side channel attacks of all sorts.

   2) iITLB Multihit (NX) - 'No eXcuses'

      iTLB Multihit is an erratum where some Intel processors may incur
      a machine check error, possibly resulting in an unrecoverable CPU
      lockup, when an instruction fetch hits multiple entries in the
      instruction TLB. This can occur when the page size is changed
      along with either the physical address or cache type. A malicious
      guest running on a virtualized system can exploit this erratum to
      perform a denial of service attack.

      The workaround is that KVM marks huge pages in the extended page
      tables as not executable (NX). If the guest attempts to execute in
      such a page, the page is broken down into 4k pages which are
      marked executable. The workaround comes with a mechanism to
      recover these shattered huge pages over time.

  Both issues come with full documentation in the hardware
  vulnerabilities section of the Linux kernel user's and administrator's
  guide.

  Thanks to all patch authors and reviewers who had the extraordinary
  priviledge to be exposed to this nuisance.

  Special thanks to Borislav Petkov for polishing the final TAA patch
  set and to Paolo Bonzini for shepherding the KVM iTLB workarounds and
  providing also the backports to stable kernels for those!"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
  Documentation: Add ITLB_MULTIHIT documentation
  kvm: x86: mmu: Recovery of shattered NX large pages
  kvm: Add helper function for creating VM worker threads
  kvm: mmu: ITLB_MULTIHIT mitigation
  cpu/speculation: Uninline and export CPU mitigations helpers
  x86/cpu: Add Tremont to the cpu vulnerability whitelist
  x86/bugs: Add ITLB_MULTIHIT bug infrastructure
  x86/tsx: Add config options to set tsx=on|off|auto
  x86/speculation/taa: Add documentation for TSX Async Abort
  x86/tsx: Add "auto" option to the tsx= cmdline parameter
  kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
  x86/speculation/taa: Add sysfs reporting for TSX Async Abort
  x86/speculation/taa: Add mitigation for TSX Async Abort
  x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
  x86/cpu: Add a helper function x86_read_arch_cap_msr()
  x86/msr: Add the IA32_TSX_CTRL MSR
2019-11-12 10:53:24 -08:00
Kai-Heng Feng fc5db58539 x86/quirks: Disable HPET on Intel Coffe Lake platforms
Some Coffee Lake platforms have a skewed HPET timer once the SoCs entered
PC10, which in consequence marks TSC as unstable because HPET is used as
watchdog clocksource for TSC.

Harry Pan tried to work around it in the clocksource watchdog code [1]
thereby creating a circular dependency between HPET and TSC. This also
ignores the fact, that HPET is not only unsuitable as watchdog clocksource
on these systems, it becomes unusable in general.

Disable HPET on affected platforms.

Suggested-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183
Link: https://lore.kernel.org/lkml/20190516090651.1396-1-harry.pan@intel.com/ [1]
Link: https://lkml.kernel.org/r/20191016103816.30650-1-kai.heng.feng@canonical.com
2019-11-12 15:55:20 +01:00
Sean Christopherson a78986aae9 KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved
Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
instead manually handle ZONE_DEVICE on a case-by-case basis.  For things
like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
pages, e.g. put pages grabbed via gup().  But for flows such as setting
A/D bits or shifting refcounts for transparent huge pages, KVM needs to
to avoid processing ZONE_DEVICE pages as the flows in question lack the
underlying machinery for proper handling of ZONE_DEVICE pages.

This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
when running a KVM guest backed with /dev/dax memory, as KVM straight up
doesn't put any references to ZONE_DEVICE pages acquired by gup().

Note, Dan Williams proposed an alternative solution of doing put_page()
on ZONE_DEVICE pages immediately after gup() in order to simplify the
auditing needed to ensure is_zone_device_page() is called if and only if
the backing device is pinned (via gup()).  But that approach would break
kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
unmap() when accessing guest memory, unlike KVM's secondary MMU, which
coordinates with mmu_notifier invalidations to avoid creating stale
page references, i.e. doesn't rely on pages being pinned.

[*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.pl

Reported-by: Adam Borowski <kilobyte@angband.pl>
Analyzed-by: David Hildenbrand <david@redhat.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Fixes: 3565fce3a6 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-12 10:17:42 +01:00
Joao Martins 29881b6ec6 KVM: VMX: Introduce pi_is_pir_empty() helper
Streamline the PID.PIR check and change its call sites to use
the newly added helper.

Suggested-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-12 10:17:41 +01:00
Joao Martins 132194ffa1 KVM: VMX: Do not change PID.NDST when loading a blocked vCPU
When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU
linked list of all vCPUs that are blocked on this pCPU. Afterwards, it
changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler
(wakeup_handler()) is responsible to kick (unblock) any vCPU on that
linked list that now has pending posted interrupts.

While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
will cause vmx_vcpu_pi_put() to set PID.SN.  If later the vCPU will be
scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear
PID.SN but will also *overwrite PID.NDST to this different pCPU*.
Instead of keeping it with original pCPU which vCPU had entered block
phase on.

This results in an issue because when a posted interrupt is delivered, as
the wakeup_handler() will be executed and fail to find blocked vCPU on
its per pCPU linked list of all vCPUs that are blocked on this pCPU.
Which is due to the vCPU being placed on a *different* per pCPU
linked list i.e. the original pCPU in which it entered block phase.

The regression is introduced by commit c112b5f502 ("KVM: x86:
Recompute PID.ON when clearing PID.SN"). Therefore, partially revert
it and reintroduce the condition in vmx_vcpu_pi_load() responsible for
avoiding changing PID.NDST when loading a blocked vCPU.

Fixes: c112b5f502 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
Tested-by: Nathan Ni <nathan.ni@oracle.com>
Co-developed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-12 10:17:41 +01:00
Joao Martins 9482ae458b KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts
Commit 17e433b543 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
introduced vmx_dy_apicv_has_pending_interrupt() in order to determine
if a vCPU have a pending posted interrupt. This routine is used by
kvm_vcpu_on_spin() when searching for a a new runnable vCPU to schedule
on pCPU instead of a vCPU doing busy loop.

vmx_dy_apicv_has_pending_interrupt() determines if a
vCPU has a pending posted interrupt solely based on PID.ON. However,
when a vCPU is preempted, vmx_vcpu_pi_put() sets PID.SN which cause
raised posted interrupts to only set bit in PID.PIR without setting
PID.ON (and without sending notification vector), as depicted in VT-d
manual section 5.2.3 "Interrupt-Posting Hardware Operation".

Therefore, checking PID.ON is insufficient to determine if a vCPU has
pending posted interrupts and instead we should also check if there is
some bit set on PID.PIR if PID.SN=1.

Fixes: 17e433b543 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
Reviewed-by: Jagannathan Raman <jag.raman@oracle.com>
Co-developed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-12 10:17:40 +01:00
Liran Alon d9ff2744ee KVM: VMX: Fix comment to specify PID.ON instead of PIR.ON
The Outstanding Notification (ON) bit is part of the Posted Interrupt
Descriptor (PID) as opposed to the Posted Interrupts Register (PIR).
The latter is a bitmap for pending vectors.

Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-12 10:17:39 +01:00
Chenyi Qiang 7a5ee6edb4 KVM: X86: Fix initialization of MSR lists
The three MSR lists(msrs_to_save[], emulated_msrs[] and
msr_based_features[]) are global arrays of kvm.ko, which are
adjusted (copy supported MSRs forward to override the unsupported MSRs)
when insmod kvm-{intel,amd}.ko, but it doesn't reset these three arrays
to their initial value when rmmod kvm-{intel,amd}.ko. Thus, at the next
installation, kvm-{intel,amd}.ko will do operations on the modified
arrays with some MSRs lost and some MSRs duplicated.

So define three constant arrays to hold the initial MSR lists and
initialize msrs_to_save[], emulated_msrs[] and msr_based_features[]
based on the constant arrays.

Cc: stable@vger.kernel.org
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
[Remove now useless conditionals. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-12 10:17:25 +01:00
Linus Torvalds 9805a68371 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of fixes for x86:

   - Make the tsc=reliable/nowatchdog command line parameter work again.
     It was broken with the introduction of the early TSC clocksource.

   - Prevent the evaluation of exception stacks before they are set up.
     This causes a crash in dumpstack because the stack walk termination
     gets screwed up.

   - Prevent a NULL pointer dereference in the rescource control file
     system.

   - Avoid bogus warnings about APIC id mismatch related to the LDR
     which can happen when the LDR is not in use and therefore not
     initialized. Only evaluate that when the APIC is in logical
     destination mode"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Respect tsc command line paraemeter for clocksource_tsc_early
  x86/dumpstack/64: Don't evaluate exception stacks before setup
  x86/apic/32: Avoid bogus LDR warnings
  x86/resctrl: Prevent NULL pointer dereference when reading mondata
2019-11-10 12:07:47 -08:00
Josh Poimboeuf 012206a822 x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
For new IBRS_ALL CPUs, the Enhanced IBRS check at the beginning of
cpu_bugs_smt_update() causes the function to return early, unintentionally
skipping the MDS and TAA logic.

This is not a problem for MDS, because there appears to be no overlap
between IBRS_ALL and MDS-affected CPUs.  So the MDS mitigation would be
disabled and nothing would need to be done in this function anyway.

But for TAA, the TAA_MSG_SMT string will never get printed on Cascade
Lake and newer.

The check is superfluous anyway: when 'spectre_v2_enabled' is
SPECTRE_V2_IBRS_ENHANCED, 'spectre_v2_user' is always
SPECTRE_V2_USER_NONE, and so the 'spectre_v2_user' switch statement
handles it appropriately by doing nothing.  So just remove the check.

Fixes: 1b42f01741 ("x86/speculation/taa: Add mitigation for TSX Async Abort")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
2019-11-07 16:06:27 +01:00
Michael Zhivich 63ec58b44f x86/tsc: Respect tsc command line paraemeter for clocksource_tsc_early
The introduction of clocksource_tsc_early broke the functionality of
"tsc=reliable" and "tsc=nowatchdog" command line parameters, since
clocksource_tsc_early is unconditionally registered with
CLOCK_SOURCE_MUST_VERIFY and thus put on the watchdog list.

This can cause the TSC to be declared unstable during boot:

  clocksource: timekeeping watchdog on CPU0: Marking clocksource
               'tsc-early' as unstable because the skew is too large:
  clocksource: 'refined-jiffies' wd_now: fffb7018 wd_last: fffb6e9d
               mask: ffffffff
  clocksource: 'tsc-early' cs_now: 68a6a7070f6a0 cs_last: 68a69ab6f74d6
               mask: ffffffffffffffff
  tsc: Marking TSC unstable due to clocksource watchdog

The corresponding elapsed times are cs_nsec=1224152026 and wd_nsec=378942392, so
the watchdog differs from TSC by 0.84 seconds.

This happens when HPET is not available and jiffies are used as the TSC
watchdog instead and the jiffies update is not happening due to lost timer
interrupts in periodic mode, which can happen e.g. with expensive debug
mechanisms enabled or under massive overload conditions in virtualized
environments.

Before the introduction of the early TSC clocksource the command line
parameters "tsc=reliable" and "tsc=nowatchdog" could be used to work around
this issue.

Restore the behaviour by disabling the watchdog if requested on the kernel
command line.

[ tglx: Clarify changelog ]

Fixes: aa83c45762 ("x86/tsc: Introduce early tsc clocksource")
Signed-off-by: Michael Zhivich <mzhivich@akamai.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191024175945.14338-1-mzhivich@akamai.com
2019-11-05 01:24:56 +01:00
Thomas Gleixner e361362b08 x86/dumpstack/64: Don't evaluate exception stacks before setup
Cyrill reported the following crash:

  BUG: unable to handle page fault for address: 0000000000001ff0
  #PF: supervisor read access in kernel mode
  RIP: 0010:get_stack_info+0xb3/0x148

It turns out that if the stack tracer is invoked before the exception stack
mappings are initialized in_exception_stack() can erroneously classify an
invalid address as an address inside of an exception stack:

    begin = this_cpu_read(cea_exception_stacks);  <- 0
    end = begin + sizeof(exception stacks);

i.e. any address between 0 and end will be considered as exception stack
address and the subsequent code will then try to derefence the resulting
stack frame at a non mapped address.

 end = begin + (unsigned long)ep->size;
     ==> end = 0x2000

 regs = (struct pt_regs *)end - 1;
     ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

 info->next_sp   = (unsigned long *)regs->sp;
     ==> Crashes due to accessing 0x1ff0

Prevent this by checking the validity of the cea_exception_stack base
address and bailing out if it is zero.

Fixes: afcd21dad8 ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de
2019-11-05 00:51:35 +01:00
Jan Beulich fe6f85ca12 x86/apic/32: Avoid bogus LDR warnings
The removal of the LDR initialization in the bigsmp_32 APIC code unearthed
a problem in setup_local_APIC().

The code checks unconditionally for a mismatch of the logical APIC id by
comparing the early APIC id which was initialized in get_smp_config() with
the actual LDR value in the APIC.

Due to the removal of the bogus LDR initialization the check now can
trigger on bigsmp_32 APIC systems emitting a warning for every booting
CPU. This is of course a false positive because the APIC is not using
logical destination mode.

Restrict the check and the possibly resulting fixup to systems which are
actually using the APIC in logical destination mode.

[ tglx: Massaged changelog and added Cc stable ]

Fixes: bae3a8d330 ("x86/apic: Do not initialize LDR and DFR for bigsmp")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/666d8f91-b5a8-1afd-7add-821e72a35f03@suse.com
2019-11-05 00:11:00 +01:00
Junaid Shahid 1aa9b9572b kvm: x86: mmu: Recovery of shattered NX large pages
The page table pages corresponding to broken down large pages are zapped in
FIFO order, so that the large page can potentially be recovered, if it is
not longer being used for execution.  This removes the performance penalty
for walking deeper EPT page tables.

By default, one large page will last about one hour once the guest
reaches a steady state.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-04 20:26:00 +01:00
Paolo Bonzini b8e8c8303f kvm: mmu: ITLB_MULTIHIT mitigation
With some Intel processors, putting the same virtual address in the TLB
as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
and cause the processor to issue a machine check resulting in a CPU lockup.

Unfortunately when EPT page tables use huge pages, it is possible for a
malicious guest to cause this situation.

Add a knob to mark huge pages as non-executable. When the nx_huge_pages
parameter is enabled (and we are using EPT), all huge pages are marked as
NX. If the guest attempts to execute in one of those pages, the page is
broken down into 4K pages, which are then marked executable.

This is not an issue for shadow paging (except nested EPT), because then
the host is in control of TLB flushes and the problematic situation cannot
happen.  With nested EPT, again the nested guest can cause problems shadow
and direct EPT is treated in the same way.

[ tglx: Fixup default to auto and massage wording a bit ]

Originally-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-04 12:22:02 +01:00
Pawan Gupta cad14885a8 x86/cpu: Add Tremont to the cpu vulnerability whitelist
Add the new cpu family ATOM_TREMONT_D to the cpu vunerability
whitelist. ATOM_TREMONT_D is not affected by X86_BUG_ITLB_MULTIHIT.

ATOM_TREMONT_D might have mitigations against other issues as well, but
only the ITLB multihit mitigation is confirmed at this point.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-04 12:22:01 +01:00
Vineela Tummalapalli db4d30fbb7 x86/bugs: Add ITLB_MULTIHIT bug infrastructure
Some processors may incur a machine check error possibly resulting in an
unrecoverable CPU lockup when an instruction fetch encounters a TLB
multi-hit in the instruction TLB. This can occur when the page size is
changed along with either the physical address or cache type. The relevant
erratum can be found here:

   https://bugzilla.kernel.org/show_bug.cgi?id=205195

There are other processors affected for which the erratum does not fully
disclose the impact.

This issue affects both bare-metal x86 page tables and EPT.

It can be mitigated by either eliminating the use of large pages or by
using careful TLB invalidations when changing the page size in the page
tables.

Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in
MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which
are mitigated against this issue.

Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-04 12:22:01 +01:00
Thomas Gleixner ca8888d7ae Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
to pick up the KVM fix which is required for the NX series.
2019-11-04 11:32:04 +01:00
Xiaochen Shen 26467b0f84 x86/resctrl: Prevent NULL pointer dereference when reading mondata
When a mon group is being deleted, rdtgrp->flags is set to RDT_DELETED
in rdtgroup_rmdir_mon() firstly. The structure of rdtgrp will be freed
until rdtgrp->waitcount is dropped to 0 in rdtgroup_kn_unlock() later.

During the window of deleting a mon group, if an application calls
rdtgroup_mondata_show() to read mondata under this mon group,
'rdtgrp' returned from rdtgroup_kn_lock_live() is a NULL pointer when
rdtgrp->flags is RDT_DELETED. And then 'rdtgrp' is passed in this path:
rdtgroup_mondata_show() --> mon_event_read() --> mon_event_count().
Thus it results in NULL pointer dereference in mon_event_count().

Check 'rdtgrp' in rdtgroup_mondata_show(), and return -ENOENT
immediately when reading mondata during the window of deleting a mon
group.

Fixes: d89b737901 ("x86/intel_rdt/cqm: Add mon_data")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: pei.p.jia@intel.com
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1572326702-27577-1-git-send-email-xiaochen.shen@intel.com
2019-11-03 17:51:22 +01:00
Linus Torvalds 355f83c1d0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes: an ABI fix for a reserved field, AMD IBS fixes, an Intel
  uncore PMU driver fix and a header typo fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/
  perf/x86/uncore: Fix event group support
  perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
  perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
  perf/core: Start rejecting the syscall with attr.__reserved_2 set
2019-11-01 11:40:47 -07:00
Linus Torvalds b2a18c25c7 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "Various fixes all over the map: prevent boot crashes on HyperV,
  classify UEFI randomness as bootloader randomness, fix EFI boot for
  the Raspberry Pi2, fix efi_test permissions, etc"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN
  x86, efi: Never relocate kernel below lowest acceptable address
  efi: libstub/arm: Account for firmware reserved memory at the base of RAM
  efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness
  efi/tpm: Return -EINVAL when determining tpm final events log size fails
  efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only
2019-11-01 11:32:50 -07:00
Linus Torvalds b88866b60d Generic: fix memory leak failure to create VM.
x86: fix MMU corner case with AMD nested paging disabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl27ZC8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNw/wf9EN2AQuyfkwMxLdh2VlyMg9SzRBgn
 SZt6EmC8Pbfqqri2pPlDh1e1rk5CdIKM3EtcnNYvbZddXuAiORLx62lQCKu9UWzJ
 mfVpahjFdx+6wFCI2IUB66Qat/E34uFP91h8cKxdwuTrgeT1V+HeFomwqmVQVkym
 urqEQSWJh6SLl3ZeeiuVaxQdcqq8dW+PChKwrLmgQ4Dz8aREHEbd9ukHAopcyE+T
 L8B29LtrBh2KCZeH7g91d5hGgGFeJCd1q+rXKOJLvpPJCZ0fnxtL9Va3h2y4TIbF
 utDAA17Hu5QmnAJw7Lh7nN0mHGPejbs763IaGrW2nrY70pVeO6bDMh8M2Q==
 =w2Y0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "generic:
   - fix memory leak on failure to create VM

  x86:
   - fix MMU corner case with AMD nested paging disabled"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
  kvm: call kvm_arch_destroy_vm if vm creation fails
  kvm: Allocate memslots and buses before calling kvm_arch_init_vm
2019-11-01 09:54:38 -07:00
Paolo Bonzini 9167ab7993 KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
VMX already does so if the host has SMEP, in order to support the combination of
CR0.WP=1 and CR4.SMEP=1.  However, it is perfectly safe to always do so, and in
fact VMX already ends up running with EFER.NXE=1 on old processors that lack the
"load EFER" controls, because it may help avoiding a slow MSR write.  Removing
all the conditionals simplifies the code.

SVM does not have similar code, but it should since recent AMD processors do
support SMEP.  So this patch also makes the code for the two vendors more similar
while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors.

Cc: stable@vger.kernel.org
Cc: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-31 12:13:44 +01:00
Kairui Song 220dd7699c x86, efi: Never relocate kernel below lowest acceptable address
Currently, kernel fails to boot on some HyperV VMs when using EFI.
And it's a potential issue on all x86 platforms.

It's caused by broken kernel relocation on EFI systems, when below three
conditions are met:

1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
   by the loader.
2. There isn't enough room to contain the kernel, starting from the
   default load address (eg. something else occupied part the region).
3. In the memmap provided by EFI firmware, there is a memory region
   starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
   kernel.

EFI stub will perform a kernel relocation when condition 1 is met. But
due to condition 2, EFI stub can't relocate kernel to the preferred
address, so it fallback to ask EFI firmware to alloc lowest usable memory
region, got the low region mentioned in condition 3, and relocated
kernel there.

It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
is the lowest acceptable kernel relocation address.

The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
address if kernel is located below it. Then the relocation before
decompression, which move kernel to the end of the decompression buffer,
will overwrite other memory region, as there is no enough memory there.

To fix it, just don't let EFI stub relocate the kernel to any address
lower than lowest acceptable address.

[ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ]

Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-31 09:40:19 +01:00
Kan Liang 75be6f703a perf/x86/uncore: Fix event group support
The events in the same group don't start or stop simultaneously.
Here is the ftrace when enabling event group for uncore_iio_0:

  # perf stat -e "{uncore_iio_0/event=0x1/,uncore_iio_0/event=0xe/}"

            <idle>-0     [000] d.h.  8959.064832: read_msr: a41, value
  b2b0b030		//Read counter reg of IIO unit0 counter0
            <idle>-0     [000] d.h.  8959.064835: write_msr: a48, value
  400001			//Write Ctrl reg of IIO unit0 counter0 to enable
  counter0. <------ Although counter0 is enabled, Unit Ctrl is still
  freezed. Nothing will count. We are still good here.
            <idle>-0     [000] d.h.  8959.064836: read_msr: a40, value
  30100                   //Read Unit Ctrl reg of IIO unit0
            <idle>-0     [000] d.h.  8959.064838: write_msr: a40, value
  30000			//Write Unit Ctrl reg of IIO unit0 to enable all
  counters in the unit by clear Freeze bit  <------Unit0 is un-freezed.
  Counter0 has been enabled. Now it starts counting. But counter1 has not
  been enabled yet. The issue starts here.
            <idle>-0     [000] d.h.  8959.064846: read_msr: a42, value 0
			//Read counter reg of IIO unit0 counter1
            <idle>-0     [000] d.h.  8959.064847: write_msr: a49, value
  40000e			//Write Ctrl reg of IIO unit0 counter1 to enable
  counter1.   <------ Now, counter1 just starts to count. Counter0 has
  been running for a while.

Current code un-freezes the Unit Ctrl right after the first counter is
enabled. The subsequent group events always loses some counter values.

Implement pmu_enable and pmu_disable support for uncore, which can help
to batch hardware accesses.

No one uses uncore_enable_box and uncore_disable_box. Remove them.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-drivers-review@eclists.intel.com
Cc: linux-perf@eclists.intel.com
Fixes: 087bfbb032 ("perf/x86: Add generic Intel uncore PMU support")
Link: https://lkml.kernel.org/r/1572014593-31591-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28 11:02:01 +01:00
Kim Phillips e431e79b60 perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
This saves us writing the IBS control MSR twice when disabling the
event.

I searched revision guides for all families since 10h, and did not
find occurrence of erratum #420, nor anything remotely similar:
so we isolate the secondary MSR write to family 10h only.

Also unconditionally update the count mask for IBS Op implementations
that have read & writeable current count (CurCnt) fields in addition
to the MaxCnt field.  These bits were reserved on prior
implementations, and therefore shouldn't have negative impact.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: c9574fe0bd ("perf/x86-ibs: Implement workaround for IBS erratum #420")
Link: https://lkml.kernel.org/r/20191023150955.30292-2-kim.phillips@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28 11:02:00 +01:00
Kim Phillips 317b96bb14 perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
The loop that reads all the IBS MSRs into *buf stopped one MSR short of
reading the IbsOpData register, which contains the RipInvalid status bit.

Fix the offset_max assignment so the MSR gets read, so the RIP invalid
evaluation is based on what the IBS h/w output, instead of what was
left in memory.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: d47e8238cd ("perf/x86-ibs: Take instruction pointer from ibs sample")
Link: https://lkml.kernel.org/r/20191023150955.30292-1-kim.phillips@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28 11:01:59 +01:00
Michal Hocko db616173d7 x86/tsx: Add config options to set tsx=on|off|auto
There is a general consensus that TSX usage is not largely spread while
the history shows there is a non trivial space for side channel attacks
possible. Therefore the tsx is disabled by default even on platforms
that might have a safe implementation of TSX according to the current
knowledge. This is a fair trade off to make.

There are, however, workloads that really do benefit from using TSX and
updating to a newer kernel with TSX disabled might introduce a
noticeable regressions. This would be especially a problem for Linux
distributions which will provide TAA mitigations.

Introduce config options X86_INTEL_TSX_MODE_OFF, X86_INTEL_TSX_MODE_ON
and X86_INTEL_TSX_MODE_AUTO to control the TSX feature. The config
setting can be overridden by the tsx cmdline options.

 [ bp: Text cleanups from Josh. ]

Suggested-by: Borislav Petkov <bpetkov@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 09:12:18 +01:00
Pawan Gupta 7531a3596e x86/tsx: Add "auto" option to the tsx= cmdline parameter
Platforms which are not affected by X86_BUG_TAA may want the TSX feature
enabled. Add "auto" option to the TSX cmdline parameter. When tsx=auto
disable TSX when X86_BUG_TAA is present, otherwise enable TSX.

More details on X86_BUG_TAA can be found here:
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html

 [ bp: Extend the arg buffer to accommodate "auto\0". ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:37:00 +01:00
Pawan Gupta e1d38b63ac kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
Export the IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0 to guests on TSX
Async Abort(TAA) affected hosts that have TSX enabled and updated
microcode. This is required so that the guests don't complain,

  "Vulnerable: Clear CPU buffers attempted, no microcode"

when the host has the updated microcode to clear CPU buffers.

Microcode update also adds support for MSR_IA32_TSX_CTRL which is
enumerated by the ARCH_CAP_TSX_CTRL bit in IA32_ARCH_CAPABILITIES MSR.
Guests can't do this check themselves when the ARCH_CAP_TSX_CTRL bit is
not exported to the guests.

In this case export MDS_NO=0 to the guests. When guests have
CPUID.MD_CLEAR=1, they deploy MDS mitigation which also mitigates TAA.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:36:59 +01:00
Pawan Gupta 6608b45ac5 x86/speculation/taa: Add sysfs reporting for TSX Async Abort
Add the sysfs reporting file for TSX Async Abort. It exposes the
vulnerability and the mitigation state similar to the existing files for
the other hardware vulnerabilities.

Sysfs file path is:
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:36:59 +01:00
Pawan Gupta 1b42f01741 x86/speculation/taa: Add mitigation for TSX Async Abort
TSX Async Abort (TAA) is a side channel vulnerability to the internal
buffers in some Intel processors similar to Microachitectural Data
Sampling (MDS). In this case, certain loads may speculatively pass
invalid data to dependent operations when an asynchronous abort
condition is pending in a TSX transaction.

This includes loads with no fault or assist condition. Such loads may
speculatively expose stale data from the uarch data structures as in
MDS. Scope of exposure is within the same-thread and cross-thread. This
issue affects all current processors that support TSX, but do not have
ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES.

On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0,
CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers
using VERW or L1D_FLUSH, there is no additional mitigation needed for
TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by
disabling the Transactional Synchronization Extensions (TSX) feature.

A new MSR IA32_TSX_CTRL in future and current processors after a
microcode update can be used to control the TSX feature. There are two
bits in that MSR:

* TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted
Transactional Memory (RTM).

* TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other
TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
disabled with updated microcode but still enumerated as present by
CPUID(EAX=7).EBX{bit4}.

The second mitigation approach is similar to MDS which is clearing the
affected CPU buffers on return to user space and when entering a guest.
Relevant microcode update is required for the mitigation to work.  More
details on this approach can be found here:

  https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html

The TSX feature can be controlled by the "tsx" command line parameter.
If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is
deployed. The effective mitigation state can be read from sysfs.

 [ bp:
   - massage + comments cleanup
   - s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh.
   - remove partial TAA mitigation in update_mds_branch_idle() - Josh.
   - s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g
 ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:36:58 +01:00
Pawan Gupta 95c5824f75 x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
Add a kernel cmdline parameter "tsx" to control the Transactional
Synchronization Extensions (TSX) feature. On CPUs that support TSX
control, use "tsx=on|off" to enable or disable TSX. Not specifying this
option is equivalent to "tsx=off". This is because on certain processors
TSX may be used as a part of a speculative side channel attack.

Carve out the TSX controlling functionality into a separate compilation
unit because TSX is a CPU feature while the TSX async abort control
machinery will go to cpu/bugs.c.

 [ bp: - Massage, shorten and clear the arg buffer.
       - Clarifications of the tsx= possible options - Josh.
       - Expand on TSX_CTRL availability - Pawan. ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:36:58 +01:00
Pawan Gupta 286836a704 x86/cpu: Add a helper function x86_read_arch_cap_msr()
Add a helper function to read the IA32_ARCH_CAPABILITIES MSR.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:36:58 +01:00
Pawan Gupta c2955f270a x86/msr: Add the IA32_TSX_CTRL MSR
Transactional Synchronization Extensions (TSX) may be used on certain
processors as part of a speculative side channel attack.  A microcode
update for existing processors that are vulnerable to this attack will
add a new MSR - IA32_TSX_CTRL to allow the system administrator the
option to disable TSX as one of the possible mitigations.

The CPUs which get this new MSR after a microcode upgrade are the ones
which do not set MSR_IA32_ARCH_CAPABILITIES.MDS_NO (bit 5) because those
CPUs have CPUID.MD_CLEAR, i.e., the VERW implementation which clears all
CPU buffers takes care of the TAA case as well.

  [ Note that future processors that are not vulnerable will also
    support the IA32_TSX_CTRL MSR. ]

Add defines for the new IA32_TSX_CTRL MSR and its bits.

TSX has two sub-features:

1. Restricted Transactional Memory (RTM) is an explicitly-used feature
   where new instructions begin and end TSX transactions.
2. Hardware Lock Elision (HLE) is implicitly used when certain kinds of
   "old" style locks are used by software.

Bit 7 of the IA32_ARCH_CAPABILITIES indicates the presence of the
IA32_TSX_CTRL MSR.

There are two control bits in IA32_TSX_CTRL MSR:

  Bit 0: When set, it disables the Restricted Transactional Memory (RTM)
         sub-feature of TSX (will force all transactions to abort on the
	 XBEGIN instruction).

  Bit 1: When set, it disables the enumeration of the RTM and HLE feature
         (i.e. it will make CPUID(EAX=7).EBX{bit4} and
	  CPUID(EAX=7).EBX{bit11} read as 0).

The other TSX sub-feature, Hardware Lock Elision (HLE), is
unconditionally disabled by the new microcode but still enumerated
as present by CPUID(EAX=7).EBX{bit4}, unless disabled by
IA32_TSX_CTRL_MSR[1] - TSX_CTRL_CPUID_CLEAR.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:36:58 +01:00
Linus Torvalds 153a971ff5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Two fixes for the VMWare guest support:

   - Unbreak VMWare platform detection which got wreckaged by converting
     an integer constant to a string constant.

   - Fix the clang build of the VMWAre hypercall by explicitely
     specifying the ouput register for INL instead of using the short
     form"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
  x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
2019-10-27 07:14:40 -04:00
Linus Torvalds a8a31fdcca Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A set of perf fixes:

  kernel:

   - Unbreak the tracking of auxiliary buffer allocations which got
     imbalanced causing recource limit failures.

   - Fix the fallout of splitting of ToPA entries which missed to shift
     the base entry PA correctly.

   - Use the correct context to lookup the AUX event when unmapping the
     associated AUX buffer so the event can be stopped and the buffer
     reference dropped.

  tools:

   - Fix buildiid-cache mode setting in copyfile_mode_ns() when copying
     /proc/kcore

   - Fix freeing id arrays in the event list so the correct event is
     closed.

   - Sync sched.h anc kvm.h headers with the kernel sources.

   - Link jvmti against tools/lib/ctype.o to have weak strlcpy().

   - Fix multiple memory and file descriptor leaks, found by coverity in
     perf annotate.

   - Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found
     by a static analysis tool"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/aux: Fix AUX output stopping
  perf/aux: Fix tracking of auxiliary trace buffer allocation
  perf/x86/intel/pt: Fix base for single entry topa
  perf kmem: Fix memory leak in compact_gfp_flags()
  tools headers UAPI: Sync sched.h with the kernel
  tools headers kvm: Sync kvm.h headers with the kernel sources
  tools headers kvm: Sync kvm headers with the kernel sources
  tools headers kvm: Sync kvm headers with the kernel sources
  perf c2c: Fix memory leak in build_cl_output()
  perf tools: Fix mode setting in copyfile_mode_ns()
  perf annotate: Fix multiple memory and file descriptor leaks
  perf tools: Fix resource leak of closedir() on the error paths
  perf evlist: Fix fix for freed id arrays
  perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()
2019-10-27 06:59:34 -04:00
Linus Torvalds 4fac2407f8 xen: patch for 5.4-rc5
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXbQMcQAKCRCAXGG7T9hj
 vj3nAP9OLhgxeq1AeGEslLWv2nSEJzmMIQ2/Qv/pyZoQjQeZVgD+Jyl+pt8u0giG
 DaL/aw+i8P7aDM/3jnBpXB0PlYIw5go=
 =+GfI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixlet from Juergen Gross:
 "Just one patch for issuing a deprecation warning for 32-bit Xen pv
  guests"

* tag 'for-linus-5.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: issue deprecation warning for 32-bit pv guest
2019-10-26 06:32:12 -04:00
Juergen Gross 6ccae60d01 xen: issue deprecation warning for 32-bit pv guest
Support for the kernel as Xen 32-bit PV guest will soon be removed.
Issue a warning when booted as such.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-25 09:53:18 -04:00
Jim Mattson 671ddc700f KVM: nVMX: Don't leak L1 MMIO regions to L2
If the "virtualize APIC accesses" VM-execution control is set in the
VMCS, the APIC virtualization hardware is triggered when a page walk
in VMX non-root mode terminates at a PTE wherein the address of the 4k
page frame matches the APIC-access address specified in the VMCS. On
hardware, the APIC-access address may be any valid 4k-aligned physical
address.

KVM's nVMX implementation enforces the additional constraint that the
APIC-access address specified in the vmcs12 must be backed by
a "struct page" in L1. If not, L0 will simply clear the "virtualize
APIC accesses" VM-execution control in the vmcs02.

The problem with this approach is that the L1 guest has arranged the
vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
VM-execution control is clear in the vmcs12--so that the L2 guest
physical address(es)--or L2 guest linear address(es)--that reference
the L2 APIC map to the APIC-access address specified in the
vmcs12. Without the "virtualize APIC accesses" VM-execution control in
the vmcs02, the APIC accesses in the L2 guest will directly access the
APIC-access page in L1.

When there is no mapping whatsoever for the APIC-access address in L1,
the L2 VM just loses the intended APIC virtualization. However, when
the APIC-access address is mapped to an MMIO region in L1, the L2
guest gets direct access to the L1 MMIO device. For example, if the
APIC-access address specified in the vmcs12 is 0xfee00000, then L2
gets direct access to L1's APIC.

Since this vmcs12 configuration is something that KVM cannot
faithfully emulate, the appropriate response is to exit to userspace
with KVM_INTERNAL_ERROR_EMULATION.

Fixes: fe3ef05c75 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
Reported-by: Dan Cross <dcross@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 19:04:40 +02:00
Miaohe Lin 5c94ac5d0f KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update
Guest physical APIC ID may not equal to vcpu->vcpu_id in some case.
We may set the wrong physical id in avic_handle_ldr_update as we
always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page
instead.
Export and use kvm_xapic_id here and in avic_handle_apic_id_update
as suggested by Vitaly.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 18:47:50 +02:00
Paolo Bonzini 49dedf0dd0 kvm: clear kvmclock MSR on reset
After resetting the vCPU, the kvmclock MSR keeps the previous value but it is
not enabled.  This can be confusing, so fix it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 13:31:22 +02:00
kbuild test robot b4fdcf6056 KVM: x86: fix bugon.cocci warnings
Use BUG_ON instead of a if condition followed by BUG.

Generated by: scripts/coccinelle/misc/bugon.cocci

Fixes: 4b526de50e ("KVM: x86: Check kvm_rebooting in kvm_spurious_fault()")
CC: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 13:31:21 +02:00
Liran Alon 1a8211c7d8 KVM: VMX: Remove specialized handling of unexpected exit-reasons
Commit bf653b78f9 ("KVM: vmx: Introduce handle_unexpected_vmexit
and handle WAITPKG vmexit") introduced specialized handling of
specific exit-reasons that should not be raised by CPU because
KVM configures VMCS such that they should never be raised.

However, since commit 7396d337cf ("KVM: x86: Return to userspace
with internal error on unexpected exit reason"), VMX & SVM
exit handlers were modified to generically handle all unexpected
exit-reasons by returning to userspace with internal error.

Therefore, there is no need for specialized handling of specific
unexpected exit-reasons (This specialized handling also introduced
inconsistency for these exit-reasons to silently skip guest instruction
instead of return to userspace on internal-error).

Fixes: bf653b78f9 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit")
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 13:31:20 +02:00
Jim Mattson 41cd02c6f7 kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID
When the RDPID instruction is supported on the host, enumerate it in
KVM_GET_SUPPORTED_CPUID.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 13:31:12 +02:00
Thomas Hellstrom 6fee2a0be0 x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT
definition, but expects it to be an integer. However, when it was moved
to the new vmware.h include file, it was changed to be a string to better
fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the
platform detection VMWARE_PORT functionality.

Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB
definitions to be integers, and use __stringify() for their stringified
form when needed.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: b4dd4f6e36 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas_os@shipmail.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-22 00:51:44 +02:00
Thomas Hellstrom db633a4e0e x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
LLVM's assembler doesn't accept the short form INL instruction:

  inl (%%dx)

but instead insists on the output register to be explicitly specified.

This was previously fixed for the VMWARE_PORT macro. Fix it also for
the VMWARE_HYPERCALL macro.

Suggested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: clang-built-linux@googlegroups.com
Fixes: b4dd4f6e36 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/20191021172403.3085-2-thomas_os@shipmail.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-22 00:51:44 +02:00
Jiri Olsa 13301c6b16 perf/x86/intel/pt: Fix base for single entry topa
Jan reported failing ltp test for PT:

  https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/tracing/pt_test/pt_test.c

It looks like the reason is this new commit added in this v5.4 merge window:

  38bb8d77d0 ("perf/x86/intel/pt: Split ToPA metadata and page layout")

which did not keep the TOPA_SHIFT for entry base.

Add it back.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 38bb8d77d0 ("perf/x86/intel/pt: Split ToPA metadata and page layout")
Link: https://lkml.kernel.org/r/20191019220726.12213-1-jolsa@kernel.org
[ Minor changelog edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-20 14:42:28 +02:00
Linus Torvalds 4fe34d61a3 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of x86 fixes:

   - Prevent a NULL pointer dereference in the X2APIC code in case of a
     CPU hotplug failure.

   - Prevent boot failures on HP superdome machines by invalidating the
     level2 kernel pagetable entries outside of the kernel area as
     invalid so BIOS reserved space won't be touched unintentionally.

     Also ensure that memory holes are rounded up to the next PMD
     boundary correctly.

   - Enable X2APIC support on Hyper-V to prevent boot failures.

   - Set the paravirt name when running on Hyper-V for consistency

   - Move a function under the appropriate ifdef guard to prevent build
     warnings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard
  x86/hyperv: Set pv_info.name to "Hyper-V"
  x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
  x86/hyperv: Make vapic support x2apic mode
  x86/boot/64: Round memory hole size up to next PMD page
  x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
2019-10-20 06:31:14 -04:00
Zhenzhong Duan 228d120051 x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard
When building with "EXTRA_CFLAGS=-Wall" gcc warns:

arch/x86/boot/compressed/acpi.c:29:30: warning: get_cmdline_acpi_rsdp defined but not used [-Wunused-function]

get_cmdline_acpi_rsdp() is only used when CONFIG_RANDOMIZE_BASE and
CONFIG_MEMORY_HOTREMOVE are both enabled, so any build where one of these
config options is disabled has this issue.

Move the function under the same ifdef guard as the call site.

[ tglx: Add context to the changelog so it becomes useful ]

Fixes: 41fa1ee9c6 ("acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1569719633-32164-1-git-send-email-zhenzhong.duan@oracle.com
2019-10-18 13:33:38 +02:00
Andrea Parri f7c0f50f18 x86/hyperv: Set pv_info.name to "Hyper-V"
Michael reported that the x86/hyperv initialization code prints the
following dmesg when running in a VM on Hyper-V:

  [    0.000738] Booting paravirtualized kernel on bare hardware

Let the x86/hyperv initialization code set pv_info.name to "Hyper-V" so
dmesg reports correctly:

  [    0.000172] Booting paravirtualized kernel on Hyper-V

[ tglx: Folded build fix provided by Yue ]

Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Link: https://lkml.kernel.org/r/20191015103502.13156-1-parri.andrea@gmail.com
2019-10-18 13:33:38 +02:00
Sean Christopherson 7a22e03b0c x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
Check that the per-cpu cluster mask pointer has been set prior to
clearing a dying cpu's bit.  The per-cpu pointer is not set until the
target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the
teardown function, x2apic_dead_cpu(), is associated with the earlier
CPUHP_X2APIC_PREPARE.  If an error occurs before the cpu is awakened,
e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference
the NULL pointer and cause a panic.

  smpboot: do_boot_cpu failed(-22) to wakeup CPU#1
  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:x2apic_dead_cpu+0x1a/0x30
  Call Trace:
   cpuhp_invoke_callback+0x9a/0x580
   _cpu_up+0x10d/0x140
   do_cpu_up+0x69/0xb0
   smp_init+0x63/0xa9
   kernel_init_freeable+0xd7/0x229
   ? rest_init+0xa0/0xa0
   kernel_init+0xa/0x100
   ret_from_fork+0x35/0x40

Fixes: 023a611748 ("x86/apic/x2apic: Simplify cluster management")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
2019-10-15 10:57:09 +02:00
Roman Kagan e211288b72 x86/hyperv: Make vapic support x2apic mode
Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
when supported by the vcpus.

However, the apic access functions for Hyper-V enlightened apic assume
xapic mode only.

As a result, Linux fails to bring up secondary cpus when run as a guest
in QEMU/KVM with both hv_apic and x2apic enabled.

According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic
apic MSRs behave exactly the same as the corresponding architectural
x2apic MSRs, so there's no need to override the apic accessors.  The
only exception is hv_apic_eoi_write, which benefits from lazy EOI when
available; however, its implementation works for both xapic and x2apic
modes.

Fixes: 29217a4746 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Fixes: 6b48cb5f83 ("X86/Hyper-V: Enlighten APIC access")
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191010123258.16919-1-rkagan@virtuozzo.com
2019-10-15 10:57:09 +02:00
Linus Torvalds 465a7e291f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also a couple of updates for new Intel
  models (which are technically hw-enablement, but to users it's a fix
  to perf behavior on those new CPUs - hope this is fine), an AUX
  inheritance fix, event time-sharing fix, and a fix for lost non-perf
  NMI events on AMD systems"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  perf/x86/cstate: Add Tiger Lake CPU support
  perf/x86/msr: Add Tiger Lake CPU support
  perf/x86/intel: Add Tiger Lake CPU support
  perf/x86/cstate: Update C-state counters for Ice Lake
  perf/x86/msr: Add new CPU model numbers for Ice Lake
  perf/x86/cstate: Add Comet Lake CPU support
  perf/x86/msr: Add Comet Lake CPU support
  perf/x86/intel: Add Comet Lake CPU support
  perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
  perf/core: Fix corner case in perf_rotate_context()
  perf/core: Rework memory accounting in perf_mmap()
  perf/core: Fix inheritance of aux_output groups
  perf annotate: Don't return -1 for error when doing BPF disassembly
  perf annotate: Return appropriate error code for allocation failures
  perf annotate: Fix arch specific ->init() failure errors
  perf annotate: Propagate the symbol__annotate() error return
  perf annotate: Fix the signedness of failure returns
  perf annotate: Propagate perf_env__arch() error
  perf evsel: Fall back to global 'perf_env' in perf_evsel__env()
  perf tools: Propagate get_cpuid() error
  ...
2019-10-12 15:15:17 -07:00
Linus Torvalds 9b4e40c8fe Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "Misc EFI fixes all across the map: CPER error report fixes, fixes to
  TPM event log parsing, fix for a kexec hang, a Sparse fix and other
  fixes"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/tpm: Fix sanity check of unsigned tbl_size being less than zero
  efi/x86: Do not clean dummy variable in kexec path
  efi: Make unexported efi_rci2_sysfs_init() static
  efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing
  efi/tpm: Don't traverse an event log with no events
  efi/tpm: Don't access event->count when it isn't mapped
  efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
  efi/cper: Fix endianness of PCIe class code
2019-10-12 15:08:24 -07:00
Linus Torvalds fcb45a2848 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware
  guest support fix when built under Clang, and new CPU model number
  definitions"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Comet Lake to the Intel CPU models header
  lib/string: Make memzero_explicit() inline instead of external
  x86/cpu/vmware: Use the full form of INL in VMWARE_PORT
  x86/asm: Fix MWAITX C-state hint value
2019-10-12 14:46:14 -07:00
Linus Torvalds e9ec3588a9 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 license tag fixlets from Ingo Molnar:
 "Fix a couple of SPDX tags in x86 headers to follow the canonical
  pattern"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Use the correct SPDX License Identifier in headers
2019-10-12 14:37:55 -07:00
Linus Torvalds 680b5b3c5d xen: fixes for 5.4-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXaGvDgAKCRCAXGG7T9hj
 vjeDAPoDg9Gn0zDiej0jWWX6ZFXPQDdzWwhK9iEuwH9U7+GgOQEA6BeTPSIxvXUm
 gicOPWs6QGGQ0m3CGN2UGKc74pFh9ww=
 =wcya
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - correct panic handling when running as a Xen guest

 - cleanup the Xen grant driver to remove printing a pointer being
   always NULL

 - remove a soon to be wrong call of of_dma_configure()

* tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Stop abusing DT of_dma_configure API
  xen/grant-table: remove unnecessary printing
  x86/xen: Return from panic notifier
2019-10-12 14:11:21 -07:00
Kan Liang 52e92f409d perf/x86/cstate: Add Tiger Lake CPU support
Tiger Lake is the followon to Ice Lake. From the perspective of Intel
cstate residency counters, there is nothing changed compared with
Ice Lake.

Share icl_cstates with Ice Lake.
Update the comments for Tiger Lake.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-10-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12 15:13:09 +02:00
Kan Liang 0917b95079 perf/x86/msr: Add Tiger Lake CPU support
Tiger Lake is the followon to Ice Lake. PPERF and SMI_COUNT MSRs are
also supported.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-9-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12 15:13:09 +02:00
Kan Liang 23645a76ba perf/x86/intel: Add Tiger Lake CPU support
Tiger Lake is the followon to Ice Lake. From the perspective of Intel
core PMU, there is little changes compared with Ice Lake, e.g. small
changes in event list. But it doesn't impact on core PMU functionality.
Share the perf code with Ice Lake. The event list patch will be submitted
later separately.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-8-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12 15:13:09 +02:00
Kan Liang f1857a2467 perf/x86/cstate: Update C-state counters for Ice Lake
There is no Core C3 C-State counter for Ice Lake.
Package C8/C9/C10 C-State counters are added for Ice Lake.

Introduce a new event list, icl_cstates, for Ice Lake.
Update the comments accordingly.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f08c47d1f8 ("perf/x86/intel/cstate: Add Icelake support")
Link: https://lkml.kernel.org/r/1570549810-25049-7-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12 15:13:09 +02:00
Kan Liang 1a5da78d00 perf/x86/msr: Add new CPU model numbers for Ice Lake
PPERF and SMI_COUNT MSRs are also supported by Ice Lake desktop and
server.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-6-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12 15:13:09 +02:00
Kan Liang 1ffa6c04da perf/x86/cstate: Add Comet Lake CPU support
Comet Lake is the new 10th Gen Intel processor. From the perspective of
Intel cstate residency counters, there is nothing changed compared with
Kaby Lake.

Share hswult_cstates with Kaby Lake.
Update the comments for Comet Lake.
Kaby Lake is missed in the comments for some Residency Counters. Update
the comments for Kaby Lake as well.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12 15:13:08 +02:00
Kan Liang 9674b1cc0f perf/x86/msr: Add Comet Lake CPU support
Comet Lake is the new 10th Gen Intel processor. PPERF and SMI_COUNT MSRs
are also supported.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-4-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12 15:13:08 +02:00
Kan Liang 9066288b2a perf/x86/intel: Add Comet Lake CPU support
Comet Lake is the new 10th Gen Intel processor. From the perspective
of Intel PMU, there is nothing changed compared with Sky Lake.
Share the perf code with Sky Lake.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-3-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12 15:13:08 +02:00
Ingo Molnar 7a275fd7b9 Merge branch 'x86/urgent' into perf/urgent, to pick up new CPU model definitions
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12 15:12:21 +02:00
Steve Wahl 1869dbe87c x86/boot/64: Round memory hole size up to next PMD page
The kernel image map is created using PMD pages, which can include
some extra space beyond what's actually needed.  Round the size of the
memory hole we search for up to the next PMD boundary, to be certain
all of the space to be mapped is usable RAM and includes no reserved
areas.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/df4f49f05c0c27f108234eb93db5c613d09ea62e.1569358539.git.steve.wahl@hpe.com
2019-10-11 18:47:23 +02:00
Steve Wahl 2aa85f246c x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
Our hardware (UV aka Superdome Flex) has address ranges marked
reserved by the BIOS. Access to these ranges is caught as an error,
causing the BIOS to halt the system.

Initial page tables mapped a large range of physical addresses that
were not checked against the list of BIOS reserved addresses, and
sometimes included reserved addresses in part of the mapped range.
Including the reserved range in the map allowed processor speculative
accesses to the reserved range, triggering a BIOS halt.

Used early in booting, the page table level2_kernel_pgt addresses 1
GiB divided into 2 MiB pages, and it was set up to linearly map a full
 1 GiB of physical addresses that included the physical address range
of the kernel image, as chosen by KASLR.  But this also included a
large range of unused addresses on either side of the kernel image.
And unlike the kernel image's physical address range, this extra
mapped space was not checked against the BIOS tables of usable RAM
addresses.  So there were times when the addresses chosen by KASLR
would result in processor accessible mappings of BIOS reserved
physical addresses.

The kernel code did not directly access any of this extra mapped
space, but having it mapped allowed the processor to issue speculative
accesses into reserved memory, causing system halts.

This was encountered somewhat rarely on a normal system boot, and much
more often when starting the crash kernel if "crashkernel=512M,high"
was specified on the command line (this heavily restricts the physical
address of the crash kernel, in our case usually within 1 GiB of
reserved space).

The solution is to invalidate the pages of this table outside the kernel
image's space before the page table is activated. It fixes this problem
on our hardware.

 [ bp: Touchups. ]

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
2019-10-11 18:38:15 +02:00
Tom Lendacky df4d29732f perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
It turns out that the NMI latency workaround from commit:

  6d3edaae16 ("x86/perf/amd: Resolve NMI latency issues for active PMCs")

ends up being too conservative and results in the perf NMI handler claiming
NMIs too easily on AMD hardware when the NMI watchdog is active.

This has an impact, for example, on the hpwdt (HPE watchdog timer) module.
This module can produce an NMI that is used to reset the system. It
registers an NMI handler for the NMI_UNKNOWN type and relies on the fact
that nothing has claimed an NMI so that its handler will be invoked when
the watchdog device produces an NMI. After the referenced commit, the
hpwdt module is unable to process its generated NMI if the NMI watchdog is
active, because the current NMI latency mitigation results in the NMI
being claimed by the perf NMI handler.

Update the AMD perf NMI latency mitigation workaround to, instead, use a
window of time. Whenever a PMC is handled in the perf NMI handler, set a
timestamp which will act as a perf NMI window. Any NMIs arriving within
that window will be claimed by perf. Anything outside that window will
not be claimed by perf. The value for the NMI window is set to 100 msecs.
This is a conservative value that easily covers any NMI latency in the
hardware. While this still results in a window in which the hpwdt module
will not receive its NMI, the window is now much, much smaller.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6d3edaae16 ("x86/perf/amd: Resolve NMI latency issues for active PMCs")
Link: https://lkml.kernel.org/r/Message-ID:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-09 12:44:14 +02:00
Kan Liang 8d7c6ac3b2 x86/cpu: Add Comet Lake to the Intel CPU models header
Comet Lake is the new 10th Gen Intel processor. Add two new CPU model
numbers to the Intel family list.

The CPU model numbers are not published in the SDM yet but they come
from an authoritative internal source.

 [ bp: Touch up commit message. ]

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: ak@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1570549810-25049-2-git-send-email-kan.liang@linux.intel.com
2019-10-08 19:01:31 +02:00
Sami Tolvanen fbcfb8f027 x86/cpu/vmware: Use the full form of INL in VMWARE_PORT
LLVM's assembler doesn't accept the short form INL instruction:

  inl (%%dx)

but instead insists on the output register to be explicitly specified:

  <inline asm>:1:7: error: invalid operand for instruction
          inl (%dx)
             ^
  LLVM ERROR: Error parsing inline asm

Use the full form of the instruction to fix the build.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: clang-built-linux@googlegroups.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: "VMware, Inc." <pv-drivers@vmware.com>
Cc: x86-ml <x86@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/734
Link: https://lkml.kernel.org/r/20191007192129.104336-1-samitolvanen@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-08 13:26:42 +02:00
Janakarajan Natarajan 454de1e7d9 x86/asm: Fix MWAITX C-state hint value
As per "AMD64 Architecture Programmer's Manual Volume 3: General-Purpose
and System Instructions", MWAITX EAX[7:4]+1 specifies the optional hint
of the optimized C-state. For C0 state, EAX[7:4] should be set to 0xf.

Currently, a value of 0xf is set for EAX[3:0] instead of EAX[7:4]. Fix
this by changing MWAITX_DISABLE_CSTATES from 0xf to 0xf0.

This hasn't had any implications so far because setting reserved bits in
EAX is simply ignored by the CPU.

 [ bp: Fixup comment in delay_mwaitx() and massage. ]

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191007190011.4859-1-Janakarajan.Natarajan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-08 13:25:24 +02:00
Boris Ostrovsky c6875f3aac x86/xen: Return from panic notifier
Currently execution of panic() continues until Xen's panic notifier
(xen_panic_event()) is called at which point we make a hypercall that
never returns.

This means that any notifier that is supposed to be called later as
well as significant part of panic() code (such as pstore writes from
kmsg_dump()) is never executed.

There is no reason for xen_panic_event() to be this last point in
execution since panic()'s emergency_restart() will call into
xen_emergency_restart() from where we can perform our hypercall.

Nevertheless, we will provide xen_legacy_crash boot option that will
preserve original behavior during crash. This option could be used,
for example, if running kernel dumper (which happens after panic
notifiers) is undesirable.

Reported-by: James Dingwall <james@dingwall.me.uk>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
2019-10-07 17:53:30 -04:00
Linus Torvalds c512c69187 uaccess: implement a proper unsafe_copy_to_user() and switch filldir over to it
In commit 9f79b78ef7 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I made filldir() use unsafe_put_user(), which
improves code generation on x86 enormously.

But because we didn't have a "unsafe_copy_to_user()", the dirent name
copy was also done by hand with unsafe_put_user() in a loop, and it
turns out that a lot of other architectures didn't like that, because
unlike x86, they have various alignment issues.

Most non-x86 architectures trap and fix it up, and some (like xtensa)
will just fail unaligned put_user() accesses unconditionally.  Which
makes that "copy using put_user() in a loop" not work for them at all.

I could make that code do explicit alignment etc, but the architectures
that don't like unaligned accesses also don't really use the fancy
"user_access_begin/end()" model, so they might just use the regular old
__copy_to_user() interface.

So this commit takes that looping implementation, turns it into the x86
version of "unsafe_copy_to_user()", and makes other architectures
implement the unsafe copy version as __copy_to_user() (the same way they
do for the other unsafe_xyz() accessor functions).

Note that it only does this for the copying _to_ user space, and we
still don't have a unsafe version of copy_from_user().

That's partly because we have no current users of it, but also partly
because the copy_from_user() case is slightly different and cannot
efficiently be implemented in terms of a unsafe_get_user() loop (because
gcc can't do asm goto with outputs).

It would be trivial to do this using "rep movsb", which would work
really nicely on newer x86 cores, but really badly on some older ones.

Al Viro is looking at cleaning up all our user copy routines to make
this all a non-issue, but for now we have this simple-but-stupid version
for x86 that works fine for the dirent name copy case because those
names are short strings and we simply don't need anything fancier.

Fixes: 9f79b78ef7 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-and-tested-by: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07 12:56:48 -07:00
Dave Young 2ecb7402cf efi/x86: Do not clean dummy variable in kexec path
kexec reboot fails randomly in UEFI based KVM guest.  The firmware
just resets while calling efi_delete_dummy_variable();  Unfortunately
I don't know how to debug the firmware, it is also possible a potential
problem on real hardware as well although nobody reproduced it.

The intention of the efi_delete_dummy_variable is to trigger garbage collection
when entering virtual mode.  But SetVirtualAddressMap can only run once
for each physical reboot, thus kexec_enter_virtual_mode() is not necessarily
a good place to clean a dummy object.

Drop the efi_delete_dummy_variable so that kexec reboot can work.

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Matthew Garrett <mjg59@google.com>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Talbert <swt@techie.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-integrity@vger.kernel.org
Link: https://lkml.kernel.org/r/20191002165904.8819-8-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 15:24:36 +02:00
Linus Torvalds b145b0eb20 ARM and x86 bugfixes of all kinds. The most visible one is that migrating
a nested hypervisor has always been busted on Broadwell and newer processors,
 and that has finally been fixed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdlzTRAAoJEL/70l94x66DElcH/Rvhn5VQE/n2J+tKEXAICxQu
 FqcTBJ5x2mp04aFe7xD3kWoKRJmz2lmHdw2ahFd4sqqLfGEFF/KW24ADI33vzLx/
 UmT78O0Je3PX77TRnEXy+napbJny0iT6ikTAQKPbyQ151JlqlbPvatpDXXLPWQHv
 jj6nKHCvMBrhV3kgaXO3cTFl8swX1hvR9lo9PcA2gRNt+HMN0heUmpfKughPoOes
 JH+UNjsEr7MYlXYlIIc9o71EYH+kgPObwlLejy0ture+dvvZEJUJjZJE8H/XG5f2
 ryXG9favaCOTAvaGf0R5Es+47A3crqUr6gHS0N28QKPn7x4hehIkKpA9dXQnWIw=
 =1/LN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM and x86 bugfixes of all kinds.

  The most visible one is that migrating a nested hypervisor has always
  been busted on Broadwell and newer processors, and that has finally
  been fixed"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: x86: omit "impossible" pmu MSRs from MSR list
  KVM: nVMX: Fix consistency check on injected exception error code
  KVM: x86: omit absent pmu MSRs from MSR list
  selftests: kvm: Fix libkvm build error
  kvm: vmx: Limit guest PMCs to those supported on the host
  kvm: x86, powerpc: do not allow clearing largepages debugfs entry
  KVM: selftests: x86: clarify what is reported on KVM_GET_MSRS failure
  KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TF
  selftests: kvm: add test for dirty logging inside nested guests
  KVM: x86: fix nested guest live migration with PML
  KVM: x86: assign two bits to track SPTE kinds
  KVM: x86: Expose XSAVEERPTR to the guest
  kvm: x86: Enumerate support for CLZERO instruction
  kvm: x86: Use AMD CPUID semantics for AMD vCPUs
  kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH
  KVM: X86: Fix userspace set invalid CR4
  kvm: x86: Fix a spurious -E2BIG in __do_cpuid_func
  KVM: LAPIC: Loosen filter for adaptive tuning of lapic_timer_advance_ns
  KVM: arm/arm64: vgic: Use the appropriate TRACE_INCLUDE_PATH
  arm64: KVM: Kill hyp_alternate_select()
  ...
2019-10-04 11:17:51 -07:00
Linus Torvalds 50dfd03d95 xen: fixes and cleanups for 5.4-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXZbQhwAKCRCAXGG7T9hj
 vh03AP9mOLNY8r16u6a+Iy0YVccTaeiQiquG6HgFVEGX2Ki38gD/Xf5u6bPRYBts
 uSRL/eYDvtfU4YGGMjogn20Fdzhc5Ak=
 =EkVp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.4-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes and cleanups from Juergen Gross:

 - a fix in the Xen balloon driver avoiding hitting a BUG_ON() in some
   cases, plus a follow-on cleanup series for that driver

 - a patch for introducing non-blocking EFI callbacks in Xen's EFI
   driver, plu a cleanup patch for Xen EFI handling merging the x86 and
   ARM arch specific initialization into the Xen EFI driver

 - a fix of the Xen xenbus driver avoiding a self-deadlock when cleaning
   up after a user process has died

 - a fix for Xen on ARM after removal of ZONE_DMA

 - a cleanup patch for avoiding build warnings for Xen on ARM

* tag 'for-linus-5.4-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/xenbus: fix self-deadlock after killing user process
  xen/efi: have a common runtime setup function
  arm: xen: mm: use __GPF_DMA32 for arm64
  xen/balloon: Clear PG_offline in balloon_retrieve()
  xen/balloon: Mark pages PG_offline in balloon_append()
  xen/balloon: Drop __balloon_append()
  xen/balloon: Set pages PageOffline() in balloon_add_region()
  ARM: xen: unexport HYPERVISOR_platform_op function
  xen/efi: Set nonblocking callbacks
2019-10-04 11:13:09 -07:00
Paolo Bonzini cf05a67b68 KVM: x86: omit "impossible" pmu MSRs from MSR list
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
contiguous MSR indices reserved by Intel for event selectors.
Since some machines actually have MSRs past the reserved range,
filtering them against x86_pmu.num_counters_gp may have false
positives.  Cut the list to 18 entries to avoid this.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jim Mattson <jamttson@google.com>
Fixes: e2ada66ec4 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04 14:01:28 +02:00
Sean Christopherson 567926cca9 KVM: nVMX: Fix consistency check on injected exception error code
Current versions of Intel's SDM incorrectly state that "bits 31:15 of
the VM-Entry exception error-code field" must be zero.  In reality, bits
31:16 must be zero, i.e. error codes are 16-bit values.

The bogus error code check manifests as an unexpected VM-Entry failure
due to an invalid code field (error number 7) in L1, e.g. when injecting
a #GP with error_code=0x9f00.

Nadav previously reported the bug[*], both to KVM and Intel, and fixed
the associated kvm-unit-test.

[*] https://patchwork.kernel.org/patch/11124749/

Reported-by: Nadav Amit <namit@vmware.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-03 12:32:44 +02:00
Paolo Bonzini 24c29b7ac0 KVM: x86: omit absent pmu MSRs from MSR list
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous
MSR indices reserved by Intel for event selectors.  Since some machines
actually have MSRs past the reserved range, these may survive the
filtering of msrs_to_save array and would be rejected by KVM_GET/SET_MSR.
To avoid this, cut the list to whatever CPUID reports for the host's
architectural PMU.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Fixes: e2ada66ec4 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-03 12:07:59 +02:00
Juergen Gross 0951570685 xen/efi: have a common runtime setup function
Today the EFI runtime functions are setup in architecture specific
code (x86 and arm), with the functions themselves living in drivers/xen
as they are not architecture dependent.

As the setup is exactly the same for arm and x86 move the setup to
drivers/xen, too. This at once removes the need to make the single
functions global visible.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[boris: "Dropped EXPORT_SYMBOL_GPL(xen_efi_runtime_setup)"]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-02 10:31:07 -04:00