Commit Graph

34327 Commits

Author SHA1 Message Date
Al Viro 119cd59fcf x86: get rid of put_user_try in __setup_rt_frame() (both 32bit and 64bit)
Straightforward, except for save_altstack_ex() stuck in those.
Replace that thing with an analogue that would use unsafe_put_user()
instead of put_user_ex() (called compat_save_altstack()) and be done
with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-26 14:41:22 -04:00
Al Viro 57d563c829 x86: ia32_setup_rt_frame(): consolidate uaccess areas
__copy_siginfo_to_user32() call reordered a bit.  The rest folds
nicely.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-26 14:41:10 -04:00
Al Viro e239074105 x86: ia32_setup_frame(): consolidate uaccess areas
Currently we have user_access block, followed by __put_user(),
deciding what the restorer will be and finally a put_user_try
block.

Moving the calculation of restorer first allows the rest
(actual copyout work) to coalesce into a single user_access block.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-26 14:39:38 -04:00
Al Viro 44a1d99632 x86: ia32_setup_sigcontext(): lift user_access_{begin,end}() into the callers
What's left is just a sequence of stores to userland addresses, with all
error handling, etc. done out of line.  Calling that from user_access block
is safe, but rather than teaching objtool to recognize it as such we can
just make it always_inline - it is small enough and has few enough callers,
for the space savings not to be an issue.

	Rename the sucker to __unsafe_setup_sigcontext32() and provide
unsafe_put_sigcontext32() with usual kind of semantics.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-26 14:35:43 -04:00
Al Viro 39f16c1c0f x86: get rid of put_user_try in {ia32,x32}_setup_rt_frame()
Straightforward, except for compat_save_altstack_ex() stuck in those.
Replace that thing with an analogue that would use unsafe_put_user()
instead of put_user_ex() (called unsafe_compat_save_altstack()) and
be done with that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-19 00:37:49 -04:00
Al Viro d2d2728d16 x86: switch ia32_setup_sigcontext() to unsafe_put_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:40:56 -04:00
Al Viro 9f855c085f x86: switch setup_sigcontext() to unsafe_put_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:39:02 -04:00
Al Viro a37d01ead4 x86: switch save_v86_state() to unsafe_put_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:36:01 -04:00
Al Viro 77f3c6166d x86: kill get_user_{try,catch,ex}
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:35:35 -04:00
Al Viro 3add42c29c x86: get rid of get_user_ex() in restore_sigcontext()
Just do copyin into a local struct and be done with that - we are
on a shallow stack here.

[reworked by tglx, removing the macro horrors while we are touching that]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:22:40 -04:00
Al Viro 978727ca33 x86: get rid of get_user_ex() in ia32_restore_sigcontext()
Just do copyin into a local struct and be done with that - we are
on a shallow stack here.

[reworked by tglx, removing the macro horrors while we are touching that]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:10:27 -04:00
Al Viro c63aad695d vm86: get rid of get_user_ex() use
Just do a copyin of what we want into a local variable and
be done with that.  We are guaranteed to be on shallow stack
here...

Note that conditional expression for range passed to access_ok()
in mainline had been pointless all along - the only difference
between vm86plus_struct and vm86_struct is that the former has
one extra field in the end and when we get to copyin of that
field (conditional upon 'plus' argument), we use copy_from_user().
Moreover, all fields starting with ->int_revectored are copied
that way, so we only need that check (be it done by access_ok()
or by user_access_begin()) only on the beginning of the structure -
the fields that used to be covered by that get_user_try() block.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:04:00 -04:00
Al Viro 4b842e4e25 x86: get rid of small constant size cases in raw_copy_{to,from}_user()
Very few call sites where that would be triggered remain, and none
of those is anywhere near hot enough to bother.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 15:53:25 -04:00
Al Viro 71c3313a38 x86: switch sigframe sigset handling to explict __get_user()/__put_user()
... and consolidate the definition of sigframe_ia32->extramask - it's
always a 1-element array of 32bit unsigned.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 15:29:54 -04:00
Al Viro a481444399 x86 kvm page table walks: switch to explicit __get_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-15 17:26:26 -05:00
Al Viro c8e3dd8660 x86 user stack frame reads: switch to explicit __get_user()
rather than relying upon the magic in raw_copy_from_user()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-15 17:26:26 -05:00
Linus Torvalds 89a47dd1af Kbuild updates for v5.6 (2nd)
- fix randconfig to generate a sane .config
 
  - rename hostprogs-y / always to hostprogs / always-y, which are
    more natual syntax.
 
  - optimize scripts/kallsyms
 
  - fix yes2modconfig and mod2yesconfig
 
  - make multiple directory targets ('make foo/ bar/') work
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl47NfMVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGRGwP/3AHO8P0wGEeFKs3ziSMjs2W7/Pj
 lN08Kuxm0u3LnyEEcHVUveoi+xBYqvrw0RsGgYf5S8q0Mpep7MPqbfkDUxV/0Zkj
 QP2CsvOTbjdBjH7q3ojkwLcDl0Pxu9mg3eZMRXZ2WQeNXuMRw6Bicoh7ElvB1Bv/
 HC+j30i2Me3cf/riQGSAsstvlXyIR8RaerR8PfRGESTysiiN76+JcHTatJHhOJL9
 O6XKkzo8/CXMYKKVF4Ae4NP+WFg6E96/pAPx0Rf47RbPX9UG35L9rkzTDnk70Ms6
 OhKiu3hXsRX7mkqApuoTqjge4+iiQcKZxYmMXU1vGlIRzjwg19/4YFP6pDSCcnIu
 kKb8KN4o4N41N7MFS3OLZWwISA8Vw6RbtwDZ3AghDWb7EHb9oNW42mGfcAPr1+wZ
 /KH6RHTzaz+5q2MgyMY1NhADFrhIT9CvDM+UJECgbokblnw7PHAnPmbsuVak9ZOH
 u9ojO1HpTTuIYO6N6v4K5zQBZF1N+RvkmBnhHd8j6SksppsCoC/G62QxgXhF2YK3
 FQMpATCpuyengLxWAmPEjsyyPOlrrdu9UxqNsXVy5ol40+7zpxuHwKcQKCa9urJR
 rcpbIwLaBcLhHU4BmvBxUk5aZxxGV2F0O0gXTOAbT2xhd6BipZSMhUmN49SErhQm
 NC/coUmQX7McxMXh
 =sv4U
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - fix randconfig to generate a sane .config

 - rename hostprogs-y / always to hostprogs / always-y, which are more
   natual syntax.

 - optimize scripts/kallsyms

 - fix yes2modconfig and mod2yesconfig

 - make multiple directory targets ('make foo/ bar/') work

* tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: make multiple directory targets work
  kconfig: Invalidate all symbols after changing to y or m.
  kallsyms: fix type of kallsyms_token_table[]
  scripts/kallsyms: change table to store (strcut sym_entry *)
  scripts/kallsyms: rename local variables in read_symbol()
  kbuild: rename hostprogs-y/always to hostprogs/always-y
  kbuild: fix the document to use extra-y for vmlinux.lds
  kconfig: fix broken dependency in randconfig-generated .config
2020-02-09 16:05:50 -08:00
Linus Torvalds 1a2a76c268 A set of fixes for X86:
- Ensure that the PIT is set up when the local APIC is disable or
    configured in legacy mode. This is caused by an ordering issue
    introduced in the recent changes which skip PIT initialization when the
    TSC and APIC frequencies are already known.
 
  - Handle malformed SRAT tables during early ACPI parsing which caused an
    infinite loop anda boot hang.
 
  - Fix a long standing race in the affinity setting code which affects PCI
    devices with non-maskable MSI interrupts. The problem is caused by the
    non-atomic writes of the MSI address (destination APIC id) and data
    (vector) fields which the device uses to construct the MSI message. The
    non-atomic writes are mandated by PCI.
 
    If both fields change and the device raises an interrupt after writing
    address and before writing data, then the MSI block constructs a
    inconsistent message which causes interrupts to be lost and subsequent
    malfunction of the device.
 
    The fix is to redirect the interrupt to the new vector on the current
    CPU first and then switch it over to the new target CPU. This allows to
    observe an eventually raised interrupt in the transitional stage (old
    CPU, new vector) to be observed in the APIC IRR and retriggered on the
    new target CPU and the new vector. The potential spurious interrupts
    caused by this are harmless and can in the worst case expose a buggy
    driver (all handlers have to be able to deal with spurious interrupts as
    they can and do happen for various reasons).
 
  - Add the missing suspend/resume mechanism for the HYPERV hypercall page
    which prevents resume hibernation on HYPERV guests. This change got
    lost before the merge window.
 
  - Mask the IOAPIC before disabling the local APIC to prevent potentially
    stale IOAPIC remote IRR bits which cause stale interrupt lines after
    resume.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5AEJwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWY2D/47ur9gsVQGryKzneVAr0SCsq4Un11e
 uifX4ldu4gCEBRTYhpgcpiFKeLvY/QJ6uOD+gQUHyy/s+lCf6yzE6UhXEqSCtcT7
 LkSxD8jAFf6KhMA6iqYBfyxUsPMXBetLjjHWsyc/kf15O/vbYm7qf05timmNZkDS
 S7C+yr3KRqRjLR7G7t4twlgC9aLcNUQihUdsH2qyTvjnlkYHJLDa0/Js7bFYYKVx
 9GdUDLvPFB1mZ76g012De4R3kJsWitiyLlQ38DP5VysKulnszUCdiXlgCEFrgxvQ
 OQhLafQzOAzvxQmP+1alODR0dmJZA8k0zsDeeTB/vTpRvv6+Pe2qUswLSpauBzuq
 TpDsrv8/5pwZh28+91f/Unk+tH8NaVNtGe/Uf+ePxIkn1nbqL84o4NHGplM6R97d
 HAWdZQZ1cGRLf6YRRJ+57oM/5xE3vBbF1Wn0+QDTFwdsk2vcxuQ4eB3M/8E1V7Zk
 upp8ty50bZ5+rxQ8XTq/eb8epSRnfLoBYpi4ux6MIOWRdmKDl40cDeZCzA2kNP7m
 qY1haaRN3ksqvhzc0Yf6cL+CgvC4ur8gRHezfOqmBzVoaLyVEFIVjgjR/ojf0bq8
 /v+L9D5+IdIv4jEZruRRs0gOXNDzoBbvf0qKGaO0tUTWiDsv7c5AGixp8aozniHS
 HXsv1lIpRuC7WQ==
 =WxKD
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for X86:

   - Ensure that the PIT is set up when the local APIC is disable or
     configured in legacy mode. This is caused by an ordering issue
     introduced in the recent changes which skip PIT initialization when
     the TSC and APIC frequencies are already known.

   - Handle malformed SRAT tables during early ACPI parsing which caused
     an infinite loop anda boot hang.

   - Fix a long standing race in the affinity setting code which affects
     PCI devices with non-maskable MSI interrupts. The problem is caused
     by the non-atomic writes of the MSI address (destination APIC id)
     and data (vector) fields which the device uses to construct the MSI
     message. The non-atomic writes are mandated by PCI.

     If both fields change and the device raises an interrupt after
     writing address and before writing data, then the MSI block
     constructs a inconsistent message which causes interrupts to be
     lost and subsequent malfunction of the device.

     The fix is to redirect the interrupt to the new vector on the
     current CPU first and then switch it over to the new target CPU.
     This allows to observe an eventually raised interrupt in the
     transitional stage (old CPU, new vector) to be observed in the APIC
     IRR and retriggered on the new target CPU and the new vector.

     The potential spurious interrupts caused by this are harmless and
     can in the worst case expose a buggy driver (all handlers have to
     be able to deal with spurious interrupts as they can and do happen
     for various reasons).

   - Add the missing suspend/resume mechanism for the HYPERV hypercall
     page which prevents resume hibernation on HYPERV guests. This
     change got lost before the merge window.

   - Mask the IOAPIC before disabling the local APIC to prevent
     potentially stale IOAPIC remote IRR bits which cause stale
     interrupt lines after resume"

* tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Mask IOAPIC entries when disabling the local APIC
  x86/hyperv: Suspend/resume the hypercall page for hibernation
  x86/apic/msi: Plug non-maskable MSI affinity race
  x86/boot: Handle malformed SRAT tables during early ACPI parsing
  x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
2020-02-09 12:11:12 -08:00
Linus Torvalds 6ff90aa2cf A single fix for a EFI boot regression on X86 which was caused by the
recent rework of the EFI memory map parsing. On systems with invalid memmap
 entries the cleanup function uses an value which cannot be relied on in
 this stage. Use the actual EFI memmap entry instead.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5ABcYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXBWEACURHTBuiOHWFzO4tZtC8MzfVbLgNR2
 aemHNslI0h/tJz+q8hEBnLnwNVS9xjwGaTl1B2Ij7Z0dJjLvcC9CrI1o7ibf0e9v
 Ba4ybAs5zAUiXyLsgcckDVXbBeU9LuzeKfNe1hPPIJgFeRrftwZpLRC6GUuSXODj
 cI13DF9sXxaWLZfktC5CZMcFZh5EtFYvfedAp/qMhb1ot2DdvQ9fBMexcSHKtCgT
 QIZU1DTO9JtOgbyEu+PM30/ID3WYJVXHNtyLFDNgIPK80EORD2Ujlw1LmEc/BrEt
 upN8sPiFMTtYDhDjgSc7tYuzF2n/on0xNM0JG8cXr2oGTb9unDxNiDK6VQ8hfDy3
 8T0xK1DY6ithNSDznaLix03uNVkgh2MVCEZlb3429aM7APyn7+/O5lDo7sNVTUt5
 QK/OOcf+2OvEr1VhKellze85yCIU4wOr4kiruDZXsBuu4PVyCQwnxz9JXeHb6Cg6
 oWKdNT5AwdGENKmOPmjx+NBQPrWShYcYaKkA8hYMiFcewfHaL8m0FTdmcpp9XcwM
 /a2nNL4STS2O1hd/60fNVAEtFVXY821OhAG6qW3mehJGJlXGijfAxCUFHGR32279
 9omyu33ii/wmZ+m1rp5meAY7qHBvBWwSrRCR3UWU3h5icOaBp0zU/FQoUlhqo0pT
 W9IhBcbPVy5kPA==
 =AU47
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fix from Thomas Gleixner:
 "A single fix for a EFI boot regression on X86 which was caused by the
  recent rework of the EFI memory map parsing. On systems with invalid
  memmap entries the cleanup function uses an value which cannot be
  relied on in this stage. Use the actual EFI memmap entry instead"

* tag 'efi-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/x86: Fix boot regression on systems with invalid memmap entries
2020-02-09 11:54:50 -08:00
Linus Torvalds c9d35ee049 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs file system parameter updates from Al Viro:
 "Saner fs_parser.c guts and data structures. The system-wide registry
  of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
  the horror switch() in fs_parse() that would have to grow another case
  every time something got added to that system-wide registry.

  New syntax types can be added by filesystems easily now, and their
  namespace is that of functions - not of system-wide enum members. IOW,
  they can be shared or kept private and if some turn out to be widely
  useful, we can make them common library helpers, etc., without having
  to do anything whatsoever to fs_parse() itself.

  And we already get that kind of requests - the thing that finally
  pushed me into doing that was "oh, and let's add one for timeouts -
  things like 15s or 2h". If some filesystem really wants that, let them
  do it. Without somebody having to play gatekeeper for the variants
  blessed by direct support in fs_parse(), TYVM.

  Quite a bit of boilerplate is gone. And IMO the data structures make a
  lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
  tmpfs: switch to use of invalfc()
  cgroup1: switch to use of errorfc() et.al.
  procfs: switch to use of invalfc()
  hugetlbfs: switch to use of invalfc()
  cramfs: switch to use of errofc() et.al.
  gfs2: switch to use of errorfc() et.al.
  fuse: switch to use errorfc() et.al.
  ceph: use errorfc() and friends instead of spelling the prefix out
  prefix-handling analogues of errorf() and friends
  turn fs_param_is_... into functions
  fs_parse: handle optional arguments sanely
  fs_parse: fold fs_parameter_desc/fs_parameter_spec
  fs_parser: remove fs_parameter_description name field
  add prefix to fs_context->log
  ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
  new primitive: __fs_parse()
  switch rbd and libceph to p_log-based primitives
  struct p_log, variants of warnf() et.al. taking that one instead
  teach logfc() to handle prefices, give it saner calling conventions
  get rid of cg_invalf()
  ...
2020-02-08 13:26:41 -08:00
Al Viro d7167b1499 fs_parse: fold fs_parameter_desc/fs_parameter_spec
The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Eric Sandeen 96cafb9ccb fs_parser: remove fs_parameter_description name field
Unused now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:36 -05:00
Tony W Wang-oc 0f378d73d4 x86/apic: Mask IOAPIC entries when disabling the local APIC
When a system suspends, the local APIC is disabled in the suspend sequence,
but the IOAPIC is left in the current state. This means unmasked interrupt
lines stay unmasked. This is usually the case for IOAPIC pin 9 to which the
ACPI interrupt is connected.

That means that in suspended state the IOAPIC can respond to an external
interrupt, e.g. the wakeup via keyboard/RTC/ACPI, but the interrupt message
cannot be handled by the disabled local APIC. As a consequence the Remote
IRR bit is set, but the local APIC does not send an EOI to acknowledge
it. This causes the affected interrupt line to become stale and the stale
Remote IRR bit will cause a hang when __synchronize_hardirq() is invoked
for that interrupt line.

To prevent this, mask all IOAPIC entries before disabling the local
APIC. The resume code already has the unmask operation inside.

[ tglx: Massaged changelog ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1579076539-7267-1-git-send-email-TonyWWang-oc@zhaoxin.com
2020-02-07 15:32:16 +01:00
Linus Torvalds 90568ecf56 s390:
* fix register corruption
 * ENOTSUPP/EOPNOTSUPP mixed
 * reset cleanups/fixes
 * selftests
 
 x86:
 * Bug fixes and cleanups
 * AMD support for APIC virtualization even in combination with
   in-kernel PIT or IOAPIC.
 
 MIPS:
 * Compilation fix.
 
 Generic:
 * Fix refcount overflow for zero page.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeOuf7AAoJEL/70l94x66DOBQH/j1W9lUpbDgr9aWbrZT+O/yP
 FWzUDrRlCZCjV1FQKbGPa4YLeDRTG5n+RIQTjmCGRqiMqeoELSJ1+iK99e97nG/u
 L28zf/90Nf0R+wwHL4AOFeploTYfG4WP8EVnlr3CG2UCJrNjxN1KU7yRZoWmWa2d
 ckLJ8ouwNvx6VZd233LVmT38EP4352d1LyqIL8/+oXDIyAcRJLFQu1gRCwagsh3w
 1v1czowFpWnRQ/z9zU7YD+PA4v85/Ge8sVVHlpi1X5NgV/khk4U6B0crAw6M+la+
 mTmpz9g56oAh9m9NUdtv4zDCz1EWGH0v8+ZkAajUKtrM0ftJMn57P6p8PH4VVlE=
 =5+Wl
 -----END PGP SIGNATURE-----

Merge tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "s390:
   - fix register corruption
   - ENOTSUPP/EOPNOTSUPP mixed
   - reset cleanups/fixes
   - selftests

  x86:
   - Bug fixes and cleanups
   - AMD support for APIC virtualization even in combination with
     in-kernel PIT or IOAPIC.

  MIPS:
   - Compilation fix.

  Generic:
   - Fix refcount overflow for zero page"

* tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration
  KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit
  x86: vmxfeatures: rename features for consistency with KVM and manual
  KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses
  KVM: x86: Fix perfctr WRMSR for running counters
  x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
  x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  kvm: mmu: Separate generating and setting mmio ptes
  kvm: mmu: Replace unsigned with unsigned int for PTE access
  KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()
  KVM: MIPS: Fold comparecount_func() into comparecount_wakeup()
  KVM: MIPS: Fix a build error due to referencing not-yet-defined function
  x86/kvm: do not setup pv tlb flush when not paravirtualized
  KVM: fix overflow of zero page refcount with ksm running
  KVM: x86: Take a u64 when checking for a valid dr7 value
  KVM: x86: use raw clock values consistently
  KVM: x86: reorganize pvclock_gtod_data members
  KVM: nVMX: delete meaningless nested_vmx_run() declaration
  KVM: SVM: allow AVIC without split irqchip
  kvm: ioapic: Lazy update IOAPIC EOI
  ...
2020-02-06 09:07:45 -08:00
Linus Torvalds 9e6c535c64 pci-v5.6-fixes-1
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl48HHIUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyJkhAAu3GpkS0sOWNmQhmvr2v9nn/QD8qN
 geXF7WmEbl+3CbWEEg/UEag4KPaNnQEqdPodX84ZF2wJ+vWQvQmqhYDd4s+a1IiX
 QEIFLx0LqcjaWSvrrhJnTTp2aFLKZQdP6mkhX1ddzsseDburKcq15mVzkfgibNcf
 76fbzWyvlNw9uv4CQBeXHJ2l/Ga8pLQN+ZreG0xKHtVsXZ0J2WT6Qp+KDLU+twW+
 BRoN8rzah9I2vbsPNKo0cikjGiTPpJWFzuHiEWZJQplJZ3kpa4/QTSlXewpLruiM
 /k1gKGeHyYlQVWvD1FcwxmSWle3pvihjwKfzGHCsw6cWTtIeEFlD4ehg4LIqjYuW
 aIktqDLj9dHcEh4t5fIVkGgGzc1dRZyaqXnxxC0f5sGezZSFFrCpHTLXDQpznnbC
 xj6kejxwfyypvMtIWJLK0SwdCWikcL2x0I/x1JgV4iXEdr4LjEhtg6rxCewENQ9Q
 xZgEbVTi+hHSCZYqfu68/qrWWrapt4dcmZhxPXkk2bj2A5xsgyou4gaTG6jhMQab
 WmgIwyzd3Bk+o3+JJbRt1/qhDUx3413ja6f2oX8XNV6b0Fa9XIDwk4Qw47tLA8Kj
 yWWt4w3iccghhg2d/eWz2+fxfbNB0qGGR4v+Y+lpy/jvFORQpS44uTKgcr0q1OvR
 kye5UT+mW+b1z1U=
 =tcy+
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Define to_pci_sysdata() always to fix build breakage when !CONFIG_PCI
   (Jason A. Donenfeld)

 - Use PF PASID for VFs to fix VF IOMMU bind failures (Kuppuswamy
   Sathyanarayanan)

* tag 'pci-v5.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/ATS: Use PF PASID for VFs
  x86/PCI: Define to_pci_sysdata() even when !CONFIG_PCI
2020-02-06 14:17:38 +00:00
Miaohe Lin a8be1ad01b KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration
The function vmx_decache_cr0_guest_bits() is only called below its
implementation. So this is meaningless and should be removed.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:44:06 +01:00
Sean Christopherson d76c7fbc01 KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit
Re-add code to mark CR4.UMIP as reserved if UMIP is not supported by the
host.  The UMIP handling was unintentionally dropped during a recent
refactoring.

Not flagging CR4.UMIP allows the guest to set its CR4.UMIP regardless of
host support or userspace desires.  On CPUs with UMIP support, including
emulated UMIP, this allows the guest to enable UMIP against the wishes
of the userspace VMM.  On CPUs without any form of UMIP, this results in
a failed VM-Enter due to invalid guest state.

Fixes: 345599f9a2 ("KVM: x86: Add macro to ensure reserved cr4 bits checks stay in sync")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:30:19 +01:00
Paolo Bonzini bcfcff640c x86: vmxfeatures: rename features for consistency with KVM and manual
Three of the feature bits in vmxfeatures.h have names that are different
from the Intel SDM.  The names have been adjusted recently in KVM but they
were using the old name in the tip tree's x86/cpu branch.  Adjust for
consistency.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:22:59 +01:00
Paolo Bonzini df7e881892 KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses
Userspace that does not know about the AMD_IBRS bit might still
allow the guest to protect itself with MSR_IA32_SPEC_CTRL using
the Intel SPEC_CTRL bit.  However, svm.c disallows this and will
cause a #GP in the guest when writing to the MSR.  Fix this by
loosening the test and allowing the Intel CPUID bit, and in fact
allow the AMD_STIBP bit as well since it allows writing to
MSR_IA32_SPEC_CTRL too.

Reported-by: Zhiyi Guo <zhguo@redhat.com>
Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:12:57 +01:00
Eric Hankland 4400cf546b KVM: x86: Fix perfctr WRMSR for running counters
Correct the logic in intel_pmu_set_msr() for fixed and general purpose
counters. This was recently changed to set pmc->counter without taking
in to account the value of pmc_read_counter() which will be incorrect if
the counter is currently running and non-zero; this changes back to the
old logic which accounted for the value of currently running counters.

Signed-off-by: Eric Hankland <ehankland@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:01:15 +01:00
Vitaly Kuznetsov a83502314c x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
controls on for its guests and nested_vmx_check_controls() checks for
that. This is, however, not the case for the controls which are supported
on the host but are missing in enlightened VMCS and when eVMCS is in use.

It would certainly be possible to add these missing checks to
nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
non-eVMCS guests by doing less unrelated checks. Create a separate
nested_evmcs_check_controls() for this purpose.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:55:26 +01:00
Vitaly Kuznetsov 31de3d2500 x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
with default (matching CPU model) values and in case eVMCS is also enabled,
fails.

It would be possible to drop VMX feature filtering completely and make
this a guest's responsibility: if it decides to use eVMCS it should know
which fields are available and which are not. Hyper-V mostly complies to
this, however, there are some problematic controls:
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL

which Hyper-V enables. As there are no corresponding fields in eVMCS, we
can't handle this properly in KVM. This is a Hyper-V issue.

Move VMX controls sanitization from nested_enable_evmcs() to vmx_get_msr(),
and do the bare minimum (only clear controls which are known to cause issues).
This allows userspace to keep setting controls it wants and at the same
time hides them from the guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:55:06 +01:00
Ben Gardon 8f79b06495 kvm: mmu: Separate generating and setting mmio ptes
Separate the functions for generating MMIO page table entries from the
function that inserts them into the paging structure. This refactoring
will facilitate changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.

No functional change expected.

Tested by running kvm-unit-tests on an Intel Haswell machine. This
commit introduced no new failures.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:54:07 +01:00
Ben Gardon 0a2b64c50d kvm: mmu: Replace unsigned with unsigned int for PTE access
There are several functions which pass an access permission mask for
SPTEs as an unsigned. This works, but checkpatch complains about it.
Switch the occurrences of unsigned to unsigned int to satisfy checkpatch.

No functional change expected.

Tested by running kvm-unit-tests on an Intel Haswell machine. This
commit introduced no new failures.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:54:00 +01:00
Sean Christopherson ea79a75092 KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()
The blurb pertaining to the return value of nested_vmx_load_cr3() no
longer matches reality, remove it entirely as the behavior it is
attempting to document is quite obvious when reading the actual code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:31:25 +01:00
Thadeu Lima de Souza Cascardo 64b38bd190 x86/kvm: do not setup pv tlb flush when not paravirtualized
kvm_setup_pv_tlb_flush will waste memory and print a misguiding message
when KVM paravirtualization is not available.

Intel SDM says that the when cpuid is used with EAX higher than the
maximum supported value for basic of extended function, the data for the
highest supported basic function will be returned.

So, in some systems, kvm_arch_para_features will return bogus data,
causing kvm_setup_pv_tlb_flush to detect support for pv tlb flush.

Testing for kvm_para_available will work as it checks for the hypervisor
signature.

Besides, when the "nopv" command line parameter is used, it should not
continue as well, as kvm_guest_init will no be called in that case.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:28:07 +01:00
Sean Christopherson 9b5e85320f KVM: x86: Take a u64 when checking for a valid dr7 value
Take a u64 instead of an unsigned long in kvm_dr7_valid() to fix a build
warning on i386 due to right-shifting a 32-bit value by 32 when checking
for bits being set in dr7[63:32].

Alternatively, the warning could be resolved by rewriting the check to
use an i386-friendly method, but taking a u64 fixes another oddity on
32-bit KVM.  Beause KVM implements natural width VMCS fields as u64s to
avoid layout issues between 32-bit and 64-bit, a devious guest can stuff
vmcs12->guest_dr7 with a 64-bit value even when both the guest and host
are 32-bit kernels.  KVM eventually drops vmcs12->guest_dr7[63:32] when
propagating vmcs12->guest_dr7 to vmcs02, but ideally KVM would not rely
on that behavior for correctness.

Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Fixes: ecb697d10f70 ("KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:45 +01:00
Paolo Bonzini 8171cd6880 KVM: x86: use raw clock values consistently
Commit 53fafdbb8b ("KVM: x86: switch KVMCLOCK base to monotonic raw
clock") changed kvmclock to use tkr_raw instead of tkr_mono.  However,
the default kvmclock_offset for the VM was still based on the monotonic
clock and, if the raw clock drifted enough from the monotonic clock,
this could cause a negative system_time to be written to the guest's
struct pvclock.  RHEL5 does not like it and (if it boots fast enough to
observe a negative time value) it hangs.

There is another thing to be careful about: getboottime64 returns the
host boot time with tkr_mono frequency, and subtracting the tkr_raw-based
kvmclock value will cause the wallclock to be off if tkr_raw drifts
from tkr_mono.  To avoid this, compute the wallclock delta from the
current time instead of being clever and using getboottime64.

Fixes: 53fafdbb8b ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
Cc: stable@vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:45 +01:00
Paolo Bonzini 917f9475c0 KVM: x86: reorganize pvclock_gtod_data members
We will need a copy of tk->offs_boot in the next patch.  Store it and
cleanup the struct: instead of storing tk->tkr_xxx.base with the tk->offs_boot
included, store the raw value in struct pvclock_clock and sum it in
do_monotonic_raw and do_realtime.   tk->tkr_xxx.xtime_nsec also moves
to struct pvclock_clock.

While at it, fix a (usually harmless) typo in do_monotonic_raw, which
was using gtod->clock.shift instead of gtod->raw_clock.shift.

Fixes: 53fafdbb8b ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
Cc: stable@vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:45 +01:00
Miaohe Lin 33aabd029f KVM: nVMX: delete meaningless nested_vmx_run() declaration
The function nested_vmx_run() declaration is below its implementation. So
this is meaningless and should be removed.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:45 +01:00
Paolo Bonzini e8ef2a19a0 KVM: SVM: allow AVIC without split irqchip
SVM is now able to disable AVIC dynamically whenever the in-kernel PIT sets
up an ack notifier, so we can enable it even if in-kernel IOAPIC/PIC/PIT
are in use.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit f458d039db kvm: ioapic: Lazy update IOAPIC EOI
In-kernel IOAPIC does not receive EOI with AMD SVM AVIC
since the processor accelerate write to APIC EOI register and
does not trap if the interrupt is edge-triggered.

Workaround this by lazy check for pending APIC EOI at the time when
setting new IOPIC irq, and update IOAPIC EOI if no pending APIC EOI.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit 1ec2405c7c kvm: ioapic: Refactor kvm_ioapic_update_eoi()
Refactor code for handling IOAPIC EOI for subsequent patch.
There is no functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit e2ed4078a6 kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.
AMD SVM AVIC accelerates EOI write and does not trap. This causes
in-kernel PIT re-injection mode to fail since it relies on irq-ack
notifier mechanism. So, APICv is activated only when in-kernel PIT
is in discard mode e.g. w/ qemu option:

  -global kvm-pit.lost_tick_policy=discard

Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this
reason.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit f3515dc3be svm: Temporarily deactivate AVIC during ExtINT handling
AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary
deactivated and fall back to using legacy interrupt injection via vINTR
and interrupt window.

Also, introduce APICV_INHIBIT_REASON_IRQWIN to be used for this reason.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[Rename svm_request_update_avic to svm_toggle_avic_for_extint. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit 9a0bf05430 svm: Deactivate AVIC when launching guest with nested SVM support
Since AVIC does not currently work w/ nested virtualization,
deactivate AVIC for the guest if setting CPUID Fn80000001_ECX[SVM]
(i.e. indicate support for SVM, which is needed for nested virtualization).
Also, introduce a new APICV_INHIBIT_REASON_NESTED bit to be used for
this reason.

Suggested-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit f4fdc0a2ed kvm: x86: hyperv: Use APICv update request interface
Since disabling APICv has to be done for all vcpus on AMD-based
system, adopt the newly introduced kvm_request_apicv_update()
interface, and introduce a new APICV_INHIBIT_REASON_HYPERV.

Also, remove the kvm_vcpu_deactivate_apicv() since no longer used.

Cc: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit 6c3e4422dd svm: Add support for dynamic APICv
Add necessary logics to support (de)activate AVIC at runtime.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:42 +01:00
Suravee Suthikulpanit 2de9d0ccd0 kvm: x86: Introduce x86 ops hook for pre-update APICv
AMD SVM AVIC needs to update APIC backing page mapping before changing
APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl
function hook to be called prior KVM APICv update request to each vcpu.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:42 +01:00
Suravee Suthikulpanit ef8efd7a15 kvm: x86: Introduce APICv x86 ops for checking APIC inhibit reasons
Inibit reason bits are used to determine if APICv deactivation is
applicable for a particular hardware virtualization architecture.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:42 +01:00