system/xen: Updated for version 4.15.0.
Signed-off-by: Mario Preksavec <mario@slackware.hr> Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
This commit is contained in:
parent
25d63828e9
commit
b0768026fe
|
@ -14,7 +14,6 @@ This script has a few optional dependencies:
|
|||
|
||||
Linking with the stock libraries:
|
||||
|
||||
bluez - enable with USE_BLUEZ=yes
|
||||
gtk - enable with USE_GTK=yes
|
||||
audio - enable with USE_AUDIO=yes
|
||||
(or a comma-delimited list: oss alsa sdl pa)
|
||||
|
|
|
@ -1,77 +1,89 @@
|
|||
kernel-xen.sh: This script builds the Linux Kernel for a Xen Hypervisor.
|
||||
|
||||
Kernel configuration files included here are based on generic Slackware config
|
||||
files. For 32bit systems, SMP config was used. To run "make menuconfig" before
|
||||
compiling Xen kernel, use:
|
||||
* Kernel config files found here are based on generic Slackware ones with
|
||||
some Xen settings enabled to get it going. Only x86_64 architecture is now
|
||||
supported because Xen no longer builds a 32-bit VMM image. This readme is
|
||||
by no means complete or a replacement for Linux Kernel and Xen docs.
|
||||
|
||||
* To run "make menuconfig" before compiling Xen kernel, use:
|
||||
|
||||
MENUCONFIG=yes ./kernel-xen.sh
|
||||
|
||||
Originally, booting Xen kernel with LILO bootloader is not supported, and GRUB
|
||||
has to be used. With mbootpack this has changed, and LILO can be used as well.
|
||||
Basically, mbootpack takes Linux kernel, initrd and Xen VMM, and packages them
|
||||
up into a file that looks like a bzImage Linux kernel. This script will select
|
||||
LILO by default, changing to GRUB is easy:
|
||||
* This script will also create an initrd image, with the following defaults:
|
||||
|
||||
BOOTLOADER=grub ./kernel-xen.sh
|
||||
ROOTMOD=ext4 ROOTFS=ext4 ROOTDEV=/dev/sda2 ./kernel-xen.sh
|
||||
|
||||
Slackware generic kernel requires initrd image, this script assumes root is on
|
||||
/dev/sda2 and filesystem is ext4, changes are made with:
|
||||
* Booting LILO with mbootpack has shown to be unreliable, and the easiest
|
||||
method is to use EXTLINUX from Syslinux package. In this example, device
|
||||
/dev/sda1 would have an ext2 filesystem mounted to /boot.
|
||||
|
||||
ROOTMOD=ext3 ROOTFS=ext3 ROOTDEV=/dev/sda5 ./kernel-xen.sh
|
||||
!!! Make sure to understand what are you doing at this point, you could
|
||||
easily lose your data. Always create backups !!!
|
||||
|
||||
When using LILO bootloader, this is what the lilo.conf should have:
|
||||
* To check and set the legacy BIOS bootable flag (bit 2 attribute):
|
||||
|
||||
image = /boot/vmlinuz-xen
|
||||
root = /dev/sda2
|
||||
label = XenLinux
|
||||
append="dom0_mem=512M -- nomodeset"
|
||||
read-only
|
||||
sgdisk /dev/sda --attributes=1:show
|
||||
sgdisk /dev/sda --attributes=1:set:2
|
||||
|
||||
Everything on the left side of "--" is passed to Xen kernel, and what's on the
|
||||
right, goes to Linux kernel.
|
||||
* Install the binary:
|
||||
|
||||
When using GRUB, /boot/grub/menu.lst should have these:
|
||||
mkdir /boot/extlinux
|
||||
extlinux --install /boot/extlinux
|
||||
dd if=/usr/share/syslinux/gptmbr.bin of=/dev/sda
|
||||
cp -a /usr/share/syslinux/mboot.c32 /boot/extlinux/
|
||||
|
||||
title Slackware XenLinux 14.2
|
||||
root (hd0,0)
|
||||
kernel /boot/xen.gz dom0_mem=524288 console=vga
|
||||
module /boot/vmlinuz-xen root=/dev/sda2 ro console=tty0 nomodeset
|
||||
module /boot/initrd-xen.gz
|
||||
* Edit the /boot/extlinux/extlinux.conf file:
|
||||
|
||||
Booting Xen on a native EFI system is also an option, but the only clean
|
||||
solution at this time requires a modified binutils package. More experienced
|
||||
user can add "x86_64-pep" to the list of enabled targets and build/replace
|
||||
binutils on their system. Subsequently, building Xen will now also create a
|
||||
Xen EFI binary.
|
||||
default XenLinux
|
||||
prompt 1
|
||||
timeout 50
|
||||
label XenLinux
|
||||
kernel mboot.c32
|
||||
append /xen.gz --- /vmlinuz-xen root=/dev/sda2 nomodeset --- /initrd-xen.gz
|
||||
|
||||
To make things a bit easier, a copy of Xen EFI binary can be found here:
|
||||
* When using GRUB, /boot/grub/menu.lst should looks something like this:
|
||||
|
||||
http://slackware.hr/~mario/xen/xen-4.13.1.efi.gz
|
||||
title Slackware XenLinux 15.0
|
||||
root (hd0,0)
|
||||
kernel /boot/xen.gz dom0_mem=524288 console=vga
|
||||
module /boot/vmlinuz-xen root=/dev/sda2 ro console=tty0 nomodeset
|
||||
module /boot/initrd-xen.gz
|
||||
|
||||
If an automatic boot to Xen kernel is desired, the binary should be renamed and
|
||||
copied to the following location: /boot/efi/EFI/BOOT/bootx64.efi
|
||||
Downloaded binary should be unpacked first, and the config file should be
|
||||
present in the same directory (same file name, minus the suffix).
|
||||
For example: "xen.cfg" or "bootx64.cfg", and its contents:
|
||||
* Booting Xen on a native EFI system (non-BIOS legacy mode) is probably the
|
||||
best option, but the only clean solution at this time requires a modified
|
||||
binutils package. More experienced user can add "x86_64-pep" to the list of
|
||||
enabled targets and build/replace binutils on their system. Subsequently,
|
||||
building Xen will then also create a Xen EFI binary.
|
||||
|
||||
[global]
|
||||
default=xen
|
||||
* To make things a bit easier, a copy of Xen EFI binary can be found here:
|
||||
|
||||
[xen]
|
||||
options=dom0_mem=min:512M,max:512M,512M
|
||||
kernel=vmlinuz-xen root=/dev/sda2 ro console=tty0 nomodeset
|
||||
ramdisk=initrd-xen.gz
|
||||
http://slackware.hr/~mario/xen/xen-4.15.0.efi.gz
|
||||
|
||||
There are some other EFI bootloaders, for example ELILO comes with the support
|
||||
for VMM images, but their x86 support is lacking. GRUB2 apparently supports
|
||||
only the chainloader method; however, the stock Slackware version is too old
|
||||
for this task. rEFInd should work, but the Xen EFI method was satisfactory to
|
||||
the author :-)
|
||||
!!! Make sure to understand what are you doing at this point, you could
|
||||
easily lose your data. Always create backups !!!
|
||||
|
||||
Troubleshooting dom0 crashes, freezes, blank screen and such:
|
||||
* In this example, partition /dev/sda1 with EF or EF00 type, and do:
|
||||
|
||||
* Use /proc/fb to find an out of range device id, for example this can be
|
||||
added to Linux kernel: fbcon=map:9
|
||||
* Look in dmesg/lsmod for potential framebuffer devices to blacklist
|
||||
mkfs.vfat /dev/sda1
|
||||
mkdir /boot/efi
|
||||
mount /dev/sda1 /boot/efi
|
||||
|
||||
* Copy/unpack EFI binary to /boot/efi/EFI/BOOT/bootx64.efi and edit
|
||||
/boot/efi/EFI/BOOT/bootx64.cfg file to add these:
|
||||
|
||||
[global]
|
||||
default=XenLinux
|
||||
[XenLinux]
|
||||
options=dom0_mem=min:512M,max:512M,512M
|
||||
kernel=vmlinuz-xen root=/dev/sda2 ro console=tty0 nomodeset
|
||||
ramdisk=initrd-xen.gz
|
||||
|
||||
* Many more boot options are supported, this readme covers only some examples!
|
||||
|
||||
* Troubleshooting dom0 crashes, freezes, blank screen at boot, etc:
|
||||
|
||||
* Set an out-of-range device id, eg. fbcon=map:9 (Look for more in /proc/fb)
|
||||
* Blacklist framebuffer devices (Look in dmesg/lsmod)
|
||||
* Compile Linux kernel with CONFIG_FB=n
|
||||
* Use a serial cable to see early boot messages
|
||||
* Use another VGA card :-)
|
||||
|
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -5,9 +5,8 @@
|
|||
# Written by Chris Abela <chris.abela@maltats.com>, 20100515
|
||||
# Modified by Mario Preksavec <mario@slackware.hr>
|
||||
|
||||
KERNEL=${KERNEL:-4.4.240}
|
||||
XEN=${XEN:-4.13.1}
|
||||
BOOTLOADER=${BOOTLOADER:-lilo}
|
||||
KERNEL=${KERNEL:-5.13.8}
|
||||
XEN=${XEN:-4.15.0}
|
||||
|
||||
ROOTMOD=${ROOTMOD:-ext4}
|
||||
ROOTFS=${ROOTFS:-ext4}
|
||||
|
@ -15,18 +14,11 @@ ROOTDEV=${ROOTDEV:-/dev/sda2}
|
|||
|
||||
if [ -z "$ARCH" ]; then
|
||||
case "$( uname -m )" in
|
||||
i?86) ARCH=i686 ;;
|
||||
x86_64) ARCH=x86_64 ;;
|
||||
*) echo "Unsupported architecture detected ($ARCH)"; exit ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
if [ "$BOOTLOADER" = lilo ] && [ ! -x /usr/bin/mbootpack ]; then
|
||||
echo "LILO bootloader requires mbootpack."
|
||||
echo "Get it from slackbuilds.org and rerun this script."
|
||||
exit
|
||||
fi
|
||||
|
||||
if [ ! -d /usr/src/linux-$KERNEL ]; then
|
||||
echo "Missing kernel source in /usr/src/linux-$KERNEL"
|
||||
echo "Get it from kernel.org and rerun this script."
|
||||
|
@ -78,15 +70,7 @@ cp -a $TMP/lib/modules/$KERNEL-xen /lib/modules
|
|||
mkinitrd -c -k $KERNEL-xen -m $ROOTMOD -f $ROOTFS -r $ROOTDEV \
|
||||
-o /boot/initrd-$KERNEL-xen.gz
|
||||
|
||||
# For lilo we use mbootpack
|
||||
if [ "$BOOTLOADER" = lilo ]; then
|
||||
gzip -d -c /boot/xen-$XEN.gz > xen-$XEN
|
||||
mbootpack -m arch/x86/boot/bzImage -m /boot/initrd-$KERNEL-xen.gz xen-$XEN \
|
||||
-o /boot/vmlinuz-$KERNEL-xen
|
||||
else
|
||||
cp arch/x86/boot/bzImage /boot/vmlinuz-$KERNEL-xen
|
||||
fi
|
||||
|
||||
cp arch/x86/boot/bzImage /boot/vmlinuz-$KERNEL-xen
|
||||
cp System.map /boot/System.map-$KERNEL-xen
|
||||
cp .config /boot/config-$KERNEL-xen
|
||||
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
|
||||
set -e
|
||||
|
||||
KERNEL=${KERNEL:-4.4.240}
|
||||
KERNEL=${KERNEL:-5.13.8}
|
||||
|
||||
# Build an image for the root file system and another for the swap
|
||||
# Default values : 8GB and 500MB resepectively.
|
||||
|
|
|
@ -2,9 +2,9 @@ kernel = "/boot/vmlinuz-xen"
|
|||
ramdisk = "/boot/initrd-xen.gz"
|
||||
memory = 128
|
||||
name = "Slackware"
|
||||
vif = [ 'mac=00:16:3e:00:00:01']
|
||||
disk = [ 'file:/full_path_to/slackware.img,xvda1,w',
|
||||
'file:/full_path_to/swap_file,xvda2,w' ]
|
||||
vif = [ "mac=00:16:3e:00:00:01" ]
|
||||
disk = [ "file:/full_path_to/slackware.img,xvda1,w",
|
||||
"file:/full_path_to/swap_file,xvda2,w" ]
|
||||
root = "/dev/xvda1 ro"
|
||||
extra = "3"
|
||||
extra = "console=hvc0 elevator=noop"
|
||||
|
|
|
@ -0,0 +1,49 @@
|
|||
From a32df3463befa04913fd1934ed264038a9eae00f Mon Sep 17 00:00:00 2001
|
||||
Message-Id: <a32df3463befa04913fd1934ed264038a9eae00f.1596577611.git.crobinso@redhat.com>
|
||||
From: Cole Robinson <crobinso@redhat.com>
|
||||
Date: Tue, 4 Aug 2020 17:04:50 -0400
|
||||
Subject: [PATCH 1/2] BaseTools: fix ucs-2 lookup on python 3.9
|
||||
|
||||
python3.9 changed/fixed codec.register behavior to always replace
|
||||
hyphen with underscore for passed in codec names:
|
||||
|
||||
https://bugs.python.org/issue37751
|
||||
|
||||
So the custom Ucs2Search needs to be adapted to handle 'ucs_2' in
|
||||
addition to existing 'ucs-2' for back compat.
|
||||
|
||||
This fixes test failures on python3.9, example:
|
||||
|
||||
======================================================================
|
||||
FAIL: testUtf16InUniFile (CheckUnicodeSourceFiles.Tests)
|
||||
----------------------------------------------------------------------
|
||||
Traceback (most recent call last):
|
||||
File "/builddir/build/BUILD/edk2-edk2-stable202002/BaseTools/Source/Python/AutoGen/UniClassObject.py", line 375, in PreProcess
|
||||
FileIn = UniFileClassObject.OpenUniFile(LongFilePath(File.Path))
|
||||
File "/builddir/build/BUILD/edk2-edk2-stable202002/BaseTools/Source/Python/AutoGen/UniClassObject.py", line 303, in OpenUniFile
|
||||
UniFileClassObject.VerifyUcs2Data(FileIn, FileName, Encoding)
|
||||
File "/builddir/build/BUILD/edk2-edk2-stable202002/BaseTools/Source/Python/AutoGen/UniClassObject.py", line 312, in VerifyUcs2Data
|
||||
Ucs2Info = codecs.lookup('ucs-2')
|
||||
LookupError: unknown encoding: ucs-2
|
||||
|
||||
Signed-off-by: Cole Robinson <crobinso@redhat.com>
|
||||
---
|
||||
BaseTools/Source/Python/AutoGen/UniClassObject.py | 2 +-
|
||||
1 file changed, 1 insertion(+), 1 deletion(-)
|
||||
|
||||
diff --git a/BaseTools/Source/Python/AutoGen/UniClassObject.py b/BaseTools/Source/Python/AutoGen/UniClassObject.py
|
||||
index b2895f7e5c..883c2356e0 100644
|
||||
--- a/BaseTools/Source/Python/AutoGen/UniClassObject.py
|
||||
+++ b/BaseTools/Source/Python/AutoGen/UniClassObject.py
|
||||
@@ -152,7 +152,7 @@ class Ucs2Codec(codecs.Codec):
|
||||
|
||||
TheUcs2Codec = Ucs2Codec()
|
||||
def Ucs2Search(name):
|
||||
- if name == 'ucs-2':
|
||||
+ if name in ['ucs-2', 'ucs_2']:
|
||||
return codecs.CodecInfo(
|
||||
name=name,
|
||||
encode=TheUcs2Codec.encode,
|
||||
--
|
||||
2.26.2
|
||||
|
|
@ -0,0 +1,51 @@
|
|||
From f6e649b25150c1417ebcd595004da6d788d7c9c5 Mon Sep 17 00:00:00 2001
|
||||
Message-Id: <f6e649b25150c1417ebcd595004da6d788d7c9c5.1596577611.git.crobinso@redhat.com>
|
||||
In-Reply-To: <a32df3463befa04913fd1934ed264038a9eae00f.1596577611.git.crobinso@redhat.com>
|
||||
References: <a32df3463befa04913fd1934ed264038a9eae00f.1596577611.git.crobinso@redhat.com>
|
||||
From: Cole Robinson <crobinso@redhat.com>
|
||||
Date: Tue, 4 Aug 2020 17:24:32 -0400
|
||||
Subject: [PATCH 2/2] BaseTools: Work around array.array.tostring() removal in
|
||||
python 3.9
|
||||
|
||||
In python3, array.array.tostring() was a compat alias for tobytes().
|
||||
tostring() was removed in python 3.9.
|
||||
|
||||
Convert this to use tolist() which should be valid for all python
|
||||
versions.
|
||||
|
||||
This fixes this build error on python3.9:
|
||||
|
||||
(Python 3.9.0b5 on linux) Traceback (most recent call last):
|
||||
File "/root/edk2/edk2-edk2-stable202002/BaseTools/BinWrappers/PosixLike/../../Source/Python/Trim/Trim.py", line 593, in Main
|
||||
GenerateVfrBinSec(CommandOptions.ModuleName, CommandOptions.DebugDir, CommandOptions.OutputFile)
|
||||
File "/root/edk2/edk2-edk2-stable202002/BaseTools/BinWrappers/PosixLike/../../Source/Python/Trim/Trim.py", line 449, in GenerateVfrBinSec
|
||||
VfrUniOffsetList = GetVariableOffset(MapFileName, EfiFileName, VfrNameList)
|
||||
File "/root/edk2/edk2-edk2-stable202002/BaseTools/Source/Python/Common/Misc.py", line 88, in GetVariableOffset
|
||||
return _parseForGCC(lines, efifilepath, varnames)
|
||||
File "/root/edk2/edk2-edk2-stable202002/BaseTools/Source/Python/Common/Misc.py", line 151, in _parseForGCC
|
||||
efisecs = PeImageClass(efifilepath).SectionHeaderList
|
||||
File "/root/edk2/edk2-edk2-stable202002/BaseTools/Source/Python/Common/Misc.py", line 1638, in __init__
|
||||
if ByteArray.tostring() != b'PE\0\0':
|
||||
AttributeError: 'array.array' object has no attribute 'tostring'
|
||||
|
||||
Signed-off-by: Cole Robinson <crobinso@redhat.com>
|
||||
---
|
||||
BaseTools/Source/Python/Common/Misc.py | 2 +-
|
||||
1 file changed, 1 insertion(+), 1 deletion(-)
|
||||
|
||||
diff --git a/BaseTools/Source/Python/Common/Misc.py b/BaseTools/Source/Python/Common/Misc.py
|
||||
index da5fb380f0..751b2c24f0 100755
|
||||
--- a/BaseTools/Source/Python/Common/Misc.py
|
||||
+++ b/BaseTools/Source/Python/Common/Misc.py
|
||||
@@ -1635,7 +1635,7 @@ class PeImageClass():
|
||||
ByteArray = array.array('B')
|
||||
ByteArray.fromfile(PeObject, 4)
|
||||
# PE signature should be 'PE\0\0'
|
||||
- if ByteArray.tostring() != b'PE\0\0':
|
||||
+ if ByteArray.tolist() != [ord('P'), ord('E'), 0, 0]:
|
||||
self.ErrorInfo = self.FileName + ' has no valid PE signature PE00'
|
||||
return
|
||||
|
||||
--
|
||||
2.26.2
|
||||
|
|
@ -0,0 +1,60 @@
|
|||
--- xen-ovmf-20190606_20d2e5a125/BaseTools/Source/Python/Eot/EotMain.py.orig 2019-06-06 06:51:42.000000000 +0200
|
||||
+++ xen-ovmf-20190606_20d2e5a125/BaseTools/Source/Python/Eot/EotMain.py 2020-12-25 20:10:44.332843625 +0100
|
||||
@@ -152,11 +152,11 @@
|
||||
try:
|
||||
TmpData = DeCompress('Efi', self[self._HEADER_SIZE_:])
|
||||
DecData = array('B')
|
||||
- DecData.fromstring(TmpData)
|
||||
+ list(map(lambda str: DecData.fromlist([ord(str), 0]), TmpData))
|
||||
except:
|
||||
TmpData = DeCompress('Framework', self[self._HEADER_SIZE_:])
|
||||
DecData = array('B')
|
||||
- DecData.fromstring(TmpData)
|
||||
+ list(map(lambda str: DecData.fromlist([ord(str), 0]), TmpData))
|
||||
|
||||
SectionList = []
|
||||
Offset = 0
|
||||
@@ -196,7 +196,7 @@
|
||||
return len(self)
|
||||
|
||||
def _GetUiString(self):
|
||||
- return codecs.utf_16_decode(self[0:-2].tostring())[0]
|
||||
+ return codecs.utf_16_decode(self[0:-2].tobytes())[0]
|
||||
|
||||
String = property(_GetUiString)
|
||||
|
||||
@@ -738,7 +738,7 @@
|
||||
Offset = self.DataOffset - 4
|
||||
TmpData = DeCompress('Framework', self[self.Offset:])
|
||||
DecData = array('B')
|
||||
- DecData.fromstring(TmpData)
|
||||
+ list(map(lambda str: DecData.fromlist([ord(str), 0]), TmpData))
|
||||
Offset = 0
|
||||
while Offset < len(DecData):
|
||||
Sec = Section()
|
||||
@@ -759,7 +759,7 @@
|
||||
|
||||
TmpData = DeCompress('Lzma', self[self.Offset:])
|
||||
DecData = array('B')
|
||||
- DecData.fromstring(TmpData)
|
||||
+ list(map(lambda str: DecData.fromlist([ord(str), 0]), TmpData))
|
||||
Offset = 0
|
||||
while Offset < len(DecData):
|
||||
Sec = Section()
|
||||
--- xen-ovmf-20190606_20d2e5a125/BaseTools/Source/Python/GenFds/GenFdsGlobalVariable.py.orig 2019-06-06 06:51:42.000000000 +0200
|
||||
+++ xen-ovmf-20190606_20d2e5a125/BaseTools/Source/Python/GenFds/GenFdsGlobalVariable.py 2020-12-25 20:10:39.188843812 +0100
|
||||
@@ -469,12 +469,12 @@
|
||||
GenFdsGlobalVariable.SecCmdList.append(' '.join(Cmd).strip())
|
||||
else:
|
||||
SectionData = array('B', [0, 0, 0, 0])
|
||||
- SectionData.fromstring(Ui.encode("utf_16_le"))
|
||||
+ list(map(lambda str: SectionData.fromlist([ord(str), 0]), Ui))
|
||||
SectionData.append(0)
|
||||
SectionData.append(0)
|
||||
Len = len(SectionData)
|
||||
GenFdsGlobalVariable.SectionHeader.pack_into(SectionData, 0, Len & 0xff, (Len >> 8) & 0xff, (Len >> 16) & 0xff, 0x15)
|
||||
- SaveFileOnChange(Output, SectionData.tostring())
|
||||
+ SaveFileOnChange(Output, SectionData.tobytes())
|
||||
|
||||
elif Ver:
|
||||
Cmd += ("-n", Ver)
|
|
@ -0,0 +1,72 @@
|
|||
From ac9d413015d3bcf1e8f31cda764590b3ee949bc1 Mon Sep 17 00:00:00 2001
|
||||
From: Olaf Hering <olaf@aepfle.de>
|
||||
Date: Wed, 17 Jun 2020 08:13:49 +0200
|
||||
Subject: [PATCH] stubdom/vtpmmgr: simplify handling of hardware_version
|
||||
|
||||
Remove complicated code which deals with a simple boolean, to make gcc10 happy.
|
||||
|
||||
ld: /home/abuild/rpmbuild/BUILD/xen-4.14.20200616T103126.3625b04991/non-dbg/stubdom/vtpmmgr/vtpmmgr.a(vtpm_cmd_handler.o):(.bss+0x0): multiple definition of `tpm_version'; /home/abuild/rpmbuild/BUILD/xen-4.14.20200616T103126.3625b04991/non-dbg/stubdom/vtpmmgr/vtpmmgr.a(vtpmmgr.o):(.bss+0x0): first defined here
|
||||
|
||||
Signed-off-by: Olaf Hering <olaf@aepfle.de>
|
||||
Message-Id: <20200617061349.7623-1-olaf@aepfle.de>
|
||||
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
|
||||
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
|
||||
---
|
||||
stubdom/vtpmmgr/vtpmmgr.c | 8 +++-----
|
||||
stubdom/vtpmmgr/vtpmmgr.h | 9 ---------
|
||||
2 files changed, 3 insertions(+), 14 deletions(-)
|
||||
|
||||
diff --git a/stubdom/vtpmmgr/vtpmmgr.c b/stubdom/vtpmmgr/vtpmmgr.c
|
||||
index 9fddaa24f818..94578adbffdd 100644
|
||||
--- a/stubdom/vtpmmgr/vtpmmgr.c
|
||||
+++ b/stubdom/vtpmmgr/vtpmmgr.c
|
||||
@@ -45,9 +45,7 @@
|
||||
#include "vtpmmgr.h"
|
||||
#include "tcg.h"
|
||||
|
||||
-struct tpm_hardware_version hardware_version = {
|
||||
- .hw_version = TPM1_HARDWARE,
|
||||
-};
|
||||
+static int hardware_version;
|
||||
|
||||
int parse_cmdline_hw(int argc, char** argv)
|
||||
{
|
||||
@@ -55,7 +53,7 @@ int parse_cmdline_hw(int argc, char** argv)
|
||||
|
||||
for (i = 1; i < argc; ++i) {
|
||||
if (!strcmp(argv[i], TPM2_EXTRA_OPT)) {
|
||||
- hardware_version.hw_version = TPM2_HARDWARE;
|
||||
+ hardware_version = 2;
|
||||
break;
|
||||
}
|
||||
}
|
||||
@@ -64,7 +62,7 @@ int parse_cmdline_hw(int argc, char** argv)
|
||||
|
||||
int hw_is_tpm2(void)
|
||||
{
|
||||
- return (hardware_version.hw_version == TPM2_HARDWARE) ? 1 : 0;
|
||||
+ return hardware_version == 2 ? 1 : 0;
|
||||
}
|
||||
|
||||
void main_loop(void) {
|
||||
diff --git a/stubdom/vtpmmgr/vtpmmgr.h b/stubdom/vtpmmgr/vtpmmgr.h
|
||||
index 2e6f8de9e435..6523604bdcf2 100644
|
||||
--- a/stubdom/vtpmmgr/vtpmmgr.h
|
||||
+++ b/stubdom/vtpmmgr/vtpmmgr.h
|
||||
@@ -50,16 +50,7 @@
|
||||
#define RSA_KEY_SIZE 0x0800
|
||||
#define RSA_CIPHER_SIZE (RSA_KEY_SIZE / 8)
|
||||
|
||||
-enum {
|
||||
- TPM1_HARDWARE = 1,
|
||||
- TPM2_HARDWARE,
|
||||
-} tpm_version;
|
||||
|
||||
-struct tpm_hardware_version {
|
||||
- int hw_version;
|
||||
-};
|
||||
-
|
||||
-extern struct tpm_hardware_version hardware_version;
|
||||
|
||||
struct vtpm_globals {
|
||||
int tpm_fd;
|
|
@ -0,0 +1,26 @@
|
|||
--- xen-4.15.0/tools/qemu-xen/configure.orig 2020-11-06 16:30:18.000000000 +0100
|
||||
+++ xen-4.15.0/tools/qemu-xen/configure 2021-04-10 01:32:39.533566877 +0200
|
||||
@@ -2184,7 +2184,6 @@
|
||||
# Check we support --no-pie first; we will need this for building ROMs.
|
||||
if compile_prog "-Werror -fno-pie" "-no-pie"; then
|
||||
CFLAGS_NOPIE="-fno-pie"
|
||||
- LDFLAGS_NOPIE="-no-pie"
|
||||
fi
|
||||
|
||||
if test "$static" = "yes"; then
|
||||
@@ -2200,7 +2199,6 @@
|
||||
fi
|
||||
elif test "$pie" = "no"; then
|
||||
QEMU_CFLAGS="$CFLAGS_NOPIE $QEMU_CFLAGS"
|
||||
- QEMU_LDFLAGS="$LDFLAGS_NOPIE $QEMU_LDFLAGS"
|
||||
elif compile_prog "-Werror -fPIE -DPIE" "-pie"; then
|
||||
QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS"
|
||||
QEMU_LDFLAGS="-pie $QEMU_LDFLAGS"
|
||||
@@ -7996,7 +7994,6 @@
|
||||
echo "QEMU_CFLAGS += -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-non-pointer-null" >> $config_host_mak
|
||||
fi
|
||||
echo "QEMU_LDFLAGS=$QEMU_LDFLAGS" >> $config_host_mak
|
||||
-echo "LDFLAGS_NOPIE=$LDFLAGS_NOPIE" >> $config_host_mak
|
||||
echo "LD_REL_FLAGS=$LD_REL_FLAGS" >> $config_host_mak
|
||||
echo "LD_I386_EMULATION=$ld_i386_emulation" >> $config_host_mak
|
||||
echo "LIBS+=$LIBS" >> $config_host_mak
|
|
@ -1,24 +1,24 @@
|
|||
--- xen-4.6.1/tools/xenstore/Makefile.orig 2016-02-09 15:44:19.000000000 +0100
|
||||
+++ xen-4.6.1/tools/xenstore/Makefile 2016-02-20 22:54:11.877906517 +0100
|
||||
@@ -84,7 +84,7 @@
|
||||
--- xen-4.15.0/tools/xenstore/Makefile.orig 2021-04-06 19:14:18.000000000 +0200
|
||||
+++ xen-4.15.0/tools/xenstore/Makefile 2021-04-09 20:43:12.613910598 +0200
|
||||
@@ -76,7 +76,7 @@
|
||||
$(AR) cr $@ $^
|
||||
|
||||
$(CLIENTS): xenstore
|
||||
- ln -f xenstore $@
|
||||
+ ln -sf xenstore $@
|
||||
|
||||
xenstore: xenstore_client.o $(LIBXENSTORE)
|
||||
$(CC) $< $(LDFLAGS) $(LDLIBS_libxenstore) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
|
||||
@@ -140,7 +140,7 @@
|
||||
xenstore: xenstore_client.o
|
||||
$(CC) $< $(LDFLAGS) $(LDLIBS_libxenstore) $(LDLIBS_libxentoolcore) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
|
||||
@@ -117,7 +117,7 @@
|
||||
$(INSTALL_PROG) xenstore-control $(DESTDIR)$(bindir)
|
||||
$(INSTALL_PROG) xenstore $(DESTDIR)$(bindir)
|
||||
set -e ; for c in $(CLIENTS) ; do \
|
||||
- ln -f $(DESTDIR)$(bindir)/xenstore $(DESTDIR)$(bindir)/$${c} ; \
|
||||
+ ln -sf xenstore $(DESTDIR)$(bindir)/$${c} ; \
|
||||
done
|
||||
$(INSTALL_DIR) $(DESTDIR)$(libdir)
|
||||
$(INSTALL_SHLIB) libxenstore.so.$(MAJOR).$(MINOR) $(DESTDIR)$(libdir)
|
||||
@@ -159,7 +159,7 @@
|
||||
|
||||
.PHONY: uninstall
|
||||
@@ -144,7 +144,7 @@
|
||||
$(INSTALL_DIR) $(DESTDIR)$(bindir)
|
||||
$(INSTALL_PROG) xenstore $(DESTDIR)$(bindir)
|
||||
set -e ; for c in $(CLIENTS) ; do \
|
||||
|
@ -26,4 +26,4 @@
|
|||
+ ln -sf xenstore $(DESTDIR)$(bindir)/$${c} ; \
|
||||
done
|
||||
|
||||
-include $(DEPS)
|
||||
-include $(DEPS_INCLUDE)
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
# Slackware build script for xen
|
||||
|
||||
# Copyright 2010, 2011, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Mario Preksavec, Zagreb, Croatia
|
||||
# Copyright 2010, 2021 Mario Preksavec, Zagreb, Croatia
|
||||
# All rights reserved.
|
||||
#
|
||||
# Redistribution and use of this script, with or without modification, is
|
||||
|
@ -25,14 +25,14 @@
|
|||
cd $(dirname $0) ; CWD=$(pwd)
|
||||
|
||||
PRGNAM=xen
|
||||
VERSION=${VERSION:-4.13.1}
|
||||
BUILD=${BUILD:-3}
|
||||
VERSION=${VERSION:-4.15.0}
|
||||
BUILD=${BUILD:-1}
|
||||
TAG=${TAG:-_SBo}
|
||||
PKGTYPE=${PKGTYPE:-tgz}
|
||||
|
||||
SEABIOS=${SEABIOS:-1.12.1}
|
||||
OVMF=${OVMF:-20190606_20d2e5a125}
|
||||
IPXE=${IPXE:-1dd56dbd11082fb622c2ed21cfaced4f47d798a6}
|
||||
SEABIOS=${SEABIOS:-1.14.0}
|
||||
OVMF=${OVMF:-20200807_a3741780fe}
|
||||
IPXE=${IPXE:-988d2c13cdf0f0b4140685af35ced70ac5b3283c}
|
||||
|
||||
if [ -z "$ARCH" ]; then
|
||||
case "$( uname -m )" in
|
||||
|
@ -55,11 +55,12 @@ PKG=$TMP/package-$PRGNAM
|
|||
OUTPUT=${OUTPUT:-/tmp}
|
||||
|
||||
if [ "$ARCH" = "i586" ]; then
|
||||
SLKCFLAGS="-O2 -march=i586 -mtune=i686"
|
||||
LIBDIRSUFFIX=""
|
||||
elif [ "$ARCH" = "i686" ]; then
|
||||
SLKCFLAGS="-O2 -march=i686 -mtune=i686"
|
||||
LIBDIRSUFFIX=""
|
||||
cat << EOF
|
||||
|
||||
*** Xen x86/32 target no longer supported!
|
||||
|
||||
EOF
|
||||
exit
|
||||
elif [ "$ARCH" = "x86_64" ]; then
|
||||
SLKCFLAGS="-O2 -fPIC"
|
||||
LIBDIRSUFFIX="64"
|
||||
|
@ -105,11 +106,6 @@ case "${USE_LIBSSH:-no}" in
|
|||
*) CONF_QEMUU+=" --disable-libssh" ;;
|
||||
esac
|
||||
|
||||
case "${USE_BLUEZ:-no}" in
|
||||
yes) CONF_QEMUU+=" --enable-bluez" ;;
|
||||
*) CONF_QEMUU+=" --disable-bluez" ;;
|
||||
esac
|
||||
|
||||
case "${USE_GTK:-no}" in
|
||||
yes) CONF_QEMUU+=" --enable-gtk" ;;
|
||||
*) CONF_QEMUU+=" --disable-gtk" ;;
|
||||
|
@ -179,6 +175,16 @@ if [ "$(ldd --version | awk '{print $NF; exit}')" = "2.27" ]; then
|
|||
( cd tools/qemu-xen && patch -p1 <$CWD/patches/glibc-memfd_fix_configure_test.patch )
|
||||
fi
|
||||
|
||||
# Fix ovmf firmware build
|
||||
( cd tools/firmware/ovmf-dir-remote && \
|
||||
patch -p1 <$CWD/patches/0001-BaseTools-fix-ucs-2-lookup-on-python-3.9.patch
|
||||
patch -p1 <$CWD/patches/0002-BaseTools-Work-around-array.array.tostring-removal-i.patch
|
||||
patch -p1 <$CWD/patches/0003-BaseTools-replace-deprecated-fromstring-and-tostring.diff
|
||||
)
|
||||
|
||||
# Fix binutils-2.36 build
|
||||
patch -p1 <$CWD/patches/qemu-xen-no-pie.diff
|
||||
|
||||
CFLAGS="$SLKCFLAGS" \
|
||||
CXXFLAGS="$SLKCFLAGS" \
|
||||
./configure \
|
||||
|
@ -200,6 +206,9 @@ make install-xen \
|
|||
MANDIR=/usr/man \
|
||||
DESTDIR=$PKG
|
||||
|
||||
echo CONFIG_GOLANG=n >> xen/.config
|
||||
echo CONFIG_GOLANG=n > tools/.config
|
||||
|
||||
make install-tools \
|
||||
docdir=/usr/doc/$PRGNAM-$VERSION \
|
||||
DOCDIR=/usr/doc/$PRGNAM-$VERSION \
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
PRGNAM="xen"
|
||||
VERSION="4.13.1"
|
||||
VERSION="4.15.0"
|
||||
HOMEPAGE="http://www.xenproject.org/"
|
||||
DOWNLOAD="http://mirror.slackware.hr/sources/xen/xen-4.13.1.tar.gz \
|
||||
http://mirror.slackware.hr/sources/xen-extfiles/ipxe-git-1dd56dbd11082fb622c2ed21cfaced4f47d798a6.tar.gz \
|
||||
DOWNLOAD="http://mirror.slackware.hr/sources/xen/xen-4.15.0.tar.gz \
|
||||
http://mirror.slackware.hr/sources/xen-extfiles/ipxe-git-988d2c13cdf0f0b4140685af35ced70ac5b3283c.tar.gz \
|
||||
http://mirror.slackware.hr/sources/xen-extfiles/lwip-1.3.0.tar.gz \
|
||||
http://mirror.slackware.hr/sources/xen-extfiles/zlib-1.2.3.tar.gz \
|
||||
http://mirror.slackware.hr/sources/xen-extfiles/newlib-1.16.0.tar.gz \
|
||||
|
@ -11,10 +11,10 @@ DOWNLOAD="http://mirror.slackware.hr/sources/xen/xen-4.13.1.tar.gz \
|
|||
http://mirror.slackware.hr/sources/xen-extfiles/polarssl-1.1.4-gpl.tgz \
|
||||
http://mirror.slackware.hr/sources/xen-extfiles/gmp-4.3.2.tar.bz2 \
|
||||
http://mirror.slackware.hr/sources/xen-extfiles/tpm_emulator-0.7.4.tar.gz \
|
||||
http://mirror.slackware.hr/sources/xen-seabios/seabios-1.12.1.tar.gz \
|
||||
http://mirror.slackware.hr/sources/xen-ovmf/xen-ovmf-20190606_20d2e5a125.tar.bz2"
|
||||
MD5SUM="e26fe8f9ce39463734e6ede45c6e11b8 \
|
||||
b3ab0488a989a089207302111d12e1a0 \
|
||||
http://mirror.slackware.hr/sources/xen-seabios/seabios-1.14.0.tar.gz \
|
||||
http://mirror.slackware.hr/sources/xen-ovmf/xen-ovmf-20200807_a3741780fe.tar.bz2"
|
||||
MD5SUM="899d5b9dd6725543cf3b224de9a5d27a \
|
||||
1c3f5c0d6d824697361481aa7004fc5b \
|
||||
36cc57650cffda9a0269493be2a169bb \
|
||||
debc62758716a169df9f62e6ab2bc634 \
|
||||
bf8f1f9e3ca83d732c00a79a6ef29bc4 \
|
||||
|
@ -23,8 +23,8 @@ MD5SUM="e26fe8f9ce39463734e6ede45c6e11b8 \
|
|||
7b72caf22b01464ee7d6165f2fd85f44 \
|
||||
dd60683d7057917e34630b4a787932e8 \
|
||||
e26becb8a6a2b6695f6b3e8097593db8 \
|
||||
6cb6cba431fd725126ddb5ec529ab85c \
|
||||
a6063a0d3d45e6f77deea8c80569653e"
|
||||
9df3b7de6376850d09161137e7a9b61f \
|
||||
b5a9f9870e147106cd917afba83011e2"
|
||||
DOWNLOAD_x86_64=""
|
||||
MD5SUM_x86_64=""
|
||||
REQUIRES="acpica yajl"
|
||||
|
|
|
@ -1,50 +0,0 @@
|
|||
From aeb46e92f915f19a61d5a8a1f4b696793f64e6fb Mon Sep 17 00:00:00 2001
|
||||
From: Julien Grall <jgrall@amazon.com>
|
||||
Date: Thu, 19 Mar 2020 13:17:31 +0000
|
||||
Subject: [PATCH] xen/common: event_channel: Don't ignore error in
|
||||
get_free_port()
|
||||
|
||||
Currently, get_free_port() is assuming that the port has been allocated
|
||||
when evtchn_allocate_port() is not return -EBUSY.
|
||||
|
||||
However, the function may return an error when:
|
||||
- We exhausted all the event channels. This can happen if the limit
|
||||
configured by the administrator for the guest ('max_event_channels'
|
||||
in xl cfg) is higher than the ABI used by the guest. For instance,
|
||||
if the guest is using 2L, the limit should not be higher than 4095.
|
||||
- We cannot allocate memory (e.g Xen has not more memory).
|
||||
|
||||
Users of get_free_port() (such as EVTCHNOP_alloc_unbound) will validly
|
||||
assuming the port was valid and will next call evtchn_from_port(). This
|
||||
will result to a crash as the memory backing the event channel structure
|
||||
is not present.
|
||||
|
||||
Fixes: 368ae9a05fe ("xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU")
|
||||
Signed-off-by: Julien Grall <jgrall@amazon.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
---
|
||||
xen/common/event_channel.c | 8 ++++----
|
||||
1 file changed, 4 insertions(+), 4 deletions(-)
|
||||
|
||||
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
|
||||
index e86e2bfab0..a8d182b584 100644
|
||||
--- a/xen/common/event_channel.c
|
||||
+++ b/xen/common/event_channel.c
|
||||
@@ -195,10 +195,10 @@ static int get_free_port(struct domain *d)
|
||||
{
|
||||
int rc = evtchn_allocate_port(d, port);
|
||||
|
||||
- if ( rc == -EBUSY )
|
||||
- continue;
|
||||
-
|
||||
- return port;
|
||||
+ if ( rc == 0 )
|
||||
+ return port;
|
||||
+ else if ( rc != -EBUSY )
|
||||
+ return rc;
|
||||
}
|
||||
|
||||
return -ENOSPC;
|
||||
--
|
||||
2.17.1
|
||||
|
|
@ -1,27 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: x86/shadow: correct an inverted conditional in dirty VRAM tracking
|
||||
|
||||
This originally was "mfn_x(mfn) == INVALID_MFN". Make it like this
|
||||
again, taking the opportunity to also drop the unnecessary nearby
|
||||
braces.
|
||||
|
||||
This is XSA-319.
|
||||
|
||||
Fixes: 246a5a3377c2 ("xen: Use a typesafe to define INVALID_MFN")
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
|
||||
--- a/xen/arch/x86/mm/shadow/common.c
|
||||
+++ b/xen/arch/x86/mm/shadow/common.c
|
||||
@@ -3252,10 +3252,8 @@ int shadow_track_dirty_vram(struct domai
|
||||
int dirty = 0;
|
||||
paddr_t sl1ma = dirty_vram->sl1ma[i];
|
||||
|
||||
- if ( !mfn_eq(mfn, INVALID_MFN) )
|
||||
- {
|
||||
+ if ( mfn_eq(mfn, INVALID_MFN) )
|
||||
dirty = 1;
|
||||
- }
|
||||
else
|
||||
{
|
||||
page = mfn_to_page(mfn);
|
|
@ -1,117 +0,0 @@
|
|||
From: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Subject: x86/spec-ctrl: CPUID/MSR definitions for Special Register Buffer Data Sampling
|
||||
|
||||
This is part of XSA-320 / CVE-2020-0543
|
||||
|
||||
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
Acked-by: Wei Liu <wl@xen.org>
|
||||
|
||||
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
|
||||
index 1d9d816622..9268454297 100644
|
||||
--- a/docs/misc/xen-command-line.pandoc
|
||||
+++ b/docs/misc/xen-command-line.pandoc
|
||||
@@ -483,10 +483,10 @@ accounting for hardware capabilities as enumerated via CPUID.
|
||||
|
||||
Currently accepted:
|
||||
|
||||
-The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`,
|
||||
-`l1d-flush` and `ssbd` are used by default if available and applicable. They can
|
||||
-be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
|
||||
-won't offer them to guests.
|
||||
+The Speculation Control hardware features `srbds-ctrl`, `md-clear`, `ibrsb`,
|
||||
+`stibp`, `ibpb`, `l1d-flush` and `ssbd` are used by default if available and
|
||||
+applicable. They can be ignored, e.g. `no-ibrsb`, at which point Xen won't
|
||||
+use them itself, and won't offer them to guests.
|
||||
|
||||
### cpuid_mask_cpu
|
||||
> `= fam_0f_rev_[cdefg] | fam_10_rev_[bc] | fam_11_rev_b`
|
||||
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
|
||||
index 6cea4227ba..a78f08b927 100644
|
||||
--- a/tools/libxl/libxl_cpuid.c
|
||||
+++ b/tools/libxl/libxl_cpuid.c
|
||||
@@ -213,6 +213,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
|
||||
|
||||
{"avx512-4vnniw",0x00000007, 0, CPUID_REG_EDX, 2, 1},
|
||||
{"avx512-4fmaps",0x00000007, 0, CPUID_REG_EDX, 3, 1},
|
||||
+ {"srbds-ctrl", 0x00000007, 0, CPUID_REG_EDX, 9, 1},
|
||||
{"md-clear", 0x00000007, 0, CPUID_REG_EDX, 10, 1},
|
||||
{"cet-ibt", 0x00000007, 0, CPUID_REG_EDX, 20, 1},
|
||||
{"ibrsb", 0x00000007, 0, CPUID_REG_EDX, 26, 1},
|
||||
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
|
||||
index 603e1d65fd..a09440813b 100644
|
||||
--- a/tools/misc/xen-cpuid.c
|
||||
+++ b/tools/misc/xen-cpuid.c
|
||||
@@ -157,6 +157,7 @@ static const char *const str_7d0[32] =
|
||||
[ 2] = "avx512_4vnniw", [ 3] = "avx512_4fmaps",
|
||||
[ 4] = "fsrm",
|
||||
|
||||
+ /* 8 */ [ 9] = "srbds-ctrl",
|
||||
[10] = "md-clear",
|
||||
/* 12 */ [13] = "tsx-force-abort",
|
||||
|
||||
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
|
||||
index 4b12103482..0cded3c0ad 100644
|
||||
--- a/xen/arch/x86/msr.c
|
||||
+++ b/xen/arch/x86/msr.c
|
||||
@@ -134,6 +134,7 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
|
||||
/* Write-only */
|
||||
case MSR_TSX_FORCE_ABORT:
|
||||
case MSR_TSX_CTRL:
|
||||
+ case MSR_MCU_OPT_CTRL:
|
||||
case MSR_U_CET:
|
||||
case MSR_S_CET:
|
||||
case MSR_PL0_SSP ... MSR_INTERRUPT_SSP_TABLE:
|
||||
@@ -288,6 +289,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
|
||||
/* Read-only */
|
||||
case MSR_TSX_FORCE_ABORT:
|
||||
case MSR_TSX_CTRL:
|
||||
+ case MSR_MCU_OPT_CTRL:
|
||||
case MSR_U_CET:
|
||||
case MSR_S_CET:
|
||||
case MSR_PL0_SSP ... MSR_INTERRUPT_SSP_TABLE:
|
||||
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
|
||||
index 6656c44aec..5fc1c6827e 100644
|
||||
--- a/xen/arch/x86/spec_ctrl.c
|
||||
+++ b/xen/arch/x86/spec_ctrl.c
|
||||
@@ -312,12 +312,13 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
|
||||
printk("Speculative mitigation facilities:\n");
|
||||
|
||||
/* Hardware features which pertain to speculative mitigations. */
|
||||
- printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
|
||||
+ printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
|
||||
(_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
|
||||
(_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP" : "",
|
||||
(_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
|
||||
(_7d0 & cpufeat_mask(X86_FEATURE_SSBD)) ? " SSBD" : "",
|
||||
(_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
|
||||
+ (_7d0 & cpufeat_mask(X86_FEATURE_SRBDS_CTRL)) ? " SRBDS_CTRL" : "",
|
||||
(e8b & cpufeat_mask(X86_FEATURE_IBPB)) ? " IBPB" : "",
|
||||
(caps & ARCH_CAPS_IBRS_ALL) ? " IBRS_ALL" : "",
|
||||
(caps & ARCH_CAPS_RDCL_NO) ? " RDCL_NO" : "",
|
||||
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
|
||||
index 7693c4a71a..91994669e1 100644
|
||||
--- a/xen/include/asm-x86/msr-index.h
|
||||
+++ b/xen/include/asm-x86/msr-index.h
|
||||
@@ -179,6 +179,9 @@
|
||||
#define MSR_IA32_VMX_TRUE_ENTRY_CTLS 0x490
|
||||
#define MSR_IA32_VMX_VMFUNC 0x491
|
||||
|
||||
+#define MSR_MCU_OPT_CTRL 0x00000123
|
||||
+#define MCU_OPT_CTRL_RNGDS_MITG_DIS (_AC(1, ULL) << 0)
|
||||
+
|
||||
#define MSR_U_CET 0x000006a0
|
||||
#define MSR_S_CET 0x000006a2
|
||||
#define MSR_PL0_SSP 0x000006a4
|
||||
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
|
||||
index 2835688f1c..a2482c3627 100644
|
||||
--- a/xen/include/public/arch-x86/cpufeatureset.h
|
||||
+++ b/xen/include/public/arch-x86/cpufeatureset.h
|
||||
@@ -252,6 +252,7 @@ XEN_CPUFEATURE(IBPB, 8*32+12) /*A IBPB support only (no IBRS, used by
|
||||
/* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
|
||||
XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A AVX512 Neural Network Instructions */
|
||||
XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A AVX512 Multiply Accumulation Single Precision */
|
||||
+XEN_CPUFEATURE(SRBDS_CTRL, 9*32+ 9) /* MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS. */
|
||||
XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*A VERW clears microarchitectural buffers */
|
||||
XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
|
||||
XEN_CPUFEATURE(CET_IBT, 9*32+20) /* CET - Indirect Branch Tracking */
|
|
@ -1,179 +0,0 @@
|
|||
From: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Subject: x86/spec-ctrl: Mitigate the Special Register Buffer Data Sampling sidechannel
|
||||
|
||||
See patch documentation and comments.
|
||||
|
||||
This is part of XSA-320 / CVE-2020-0543
|
||||
|
||||
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
|
||||
index 9268454297..c780312531 100644
|
||||
--- a/docs/misc/xen-command-line.pandoc
|
||||
+++ b/docs/misc/xen-command-line.pandoc
|
||||
@@ -1991,7 +1991,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
|
||||
### spec-ctrl (x86)
|
||||
> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
|
||||
> bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
|
||||
-> l1d-flush,branch-harden}=<bool> ]`
|
||||
+> l1d-flush,branch-harden,srb-lock}=<bool> ]`
|
||||
|
||||
Controls for speculative execution sidechannel mitigations. By default, Xen
|
||||
will pick the most appropriate mitigations based on compiled in support,
|
||||
@@ -2068,6 +2068,12 @@ If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_BRANCH`, the
|
||||
speculation barriers to protect selected conditional branches. By default,
|
||||
Xen will enable this mitigation.
|
||||
|
||||
+On hardware supporting SRBDS_CTRL, the `srb-lock=` option can be used to force
|
||||
+or prevent Xen from protect the Special Register Buffer from leaking stale
|
||||
+data. By default, Xen will enable this mitigation, except on parts where MDS
|
||||
+is fixed and TAA is fixed/mitigated (in which case, there is believed to be no
|
||||
+way for an attacker to obtain the stale data).
|
||||
+
|
||||
### sync_console
|
||||
> `= <boolean>`
|
||||
|
||||
diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
|
||||
index feb0f6ce20..75c6e34164 100644
|
||||
--- a/xen/arch/x86/acpi/power.c
|
||||
+++ b/xen/arch/x86/acpi/power.c
|
||||
@@ -295,6 +295,9 @@ static int enter_state(u32 state)
|
||||
ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
|
||||
spec_ctrl_exit_idle(ci);
|
||||
|
||||
+ if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
|
||||
+ wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl);
|
||||
+
|
||||
done:
|
||||
spin_debug_enable();
|
||||
local_irq_restore(flags);
|
||||
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
|
||||
index dc8fdac1a1..b1e51b3aff 100644
|
||||
--- a/xen/arch/x86/smpboot.c
|
||||
+++ b/xen/arch/x86/smpboot.c
|
||||
@@ -361,12 +361,14 @@ void start_secondary(void *unused)
|
||||
microcode_update_one(false);
|
||||
|
||||
/*
|
||||
- * If MSR_SPEC_CTRL is available, apply Xen's default setting and discard
|
||||
- * any firmware settings. Note: MSR_SPEC_CTRL may only become available
|
||||
- * after loading microcode.
|
||||
+ * If any speculative control MSRs are available, apply Xen's default
|
||||
+ * settings. Note: These MSRs may only become available after loading
|
||||
+ * microcode.
|
||||
*/
|
||||
if ( boot_cpu_has(X86_FEATURE_IBRSB) )
|
||||
wrmsrl(MSR_SPEC_CTRL, default_xen_spec_ctrl);
|
||||
+ if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
|
||||
+ wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl);
|
||||
|
||||
tsx_init(); /* Needs microcode. May change HLE/RTM feature bits. */
|
||||
|
||||
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
|
||||
index 5fc1c6827e..33343062a7 100644
|
||||
--- a/xen/arch/x86/spec_ctrl.c
|
||||
+++ b/xen/arch/x86/spec_ctrl.c
|
||||
@@ -65,6 +65,9 @@ static unsigned int __initdata l1d_maxphysaddr;
|
||||
static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
|
||||
static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
|
||||
|
||||
+static int8_t __initdata opt_srb_lock = -1;
|
||||
+uint64_t __read_mostly default_xen_mcu_opt_ctrl;
|
||||
+
|
||||
static int __init parse_spec_ctrl(const char *s)
|
||||
{
|
||||
const char *ss;
|
||||
@@ -112,6 +115,7 @@ static int __init parse_spec_ctrl(const char *s)
|
||||
opt_ssbd = false;
|
||||
opt_l1d_flush = 0;
|
||||
opt_branch_harden = false;
|
||||
+ opt_srb_lock = 0;
|
||||
}
|
||||
else if ( val > 0 )
|
||||
rc = -EINVAL;
|
||||
@@ -178,6 +182,8 @@ static int __init parse_spec_ctrl(const char *s)
|
||||
opt_l1d_flush = val;
|
||||
else if ( (val = parse_boolean("branch-harden", s, ss)) >= 0 )
|
||||
opt_branch_harden = val;
|
||||
+ else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
|
||||
+ opt_srb_lock = val;
|
||||
else
|
||||
rc = -EINVAL;
|
||||
|
||||
@@ -341,7 +347,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
|
||||
"\n");
|
||||
|
||||
/* Settings for Xen's protection, irrespective of guests. */
|
||||
- printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s, Other:%s%s%s%s\n",
|
||||
+ printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s, Other:%s%s%s%s%s\n",
|
||||
thunk == THUNK_NONE ? "N/A" :
|
||||
thunk == THUNK_RETPOLINE ? "RETPOLINE" :
|
||||
thunk == THUNK_LFENCE ? "LFENCE" :
|
||||
@@ -352,6 +358,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
|
||||
(default_xen_spec_ctrl & SPEC_CTRL_SSBD) ? " SSBD+" : " SSBD-",
|
||||
!(caps & ARCH_CAPS_TSX_CTRL) ? "" :
|
||||
(opt_tsx & 1) ? " TSX+" : " TSX-",
|
||||
+ !boot_cpu_has(X86_FEATURE_SRBDS_CTRL) ? "" :
|
||||
+ opt_srb_lock ? " SRB_LOCK+" : " SRB_LOCK-",
|
||||
opt_ibpb ? " IBPB" : "",
|
||||
opt_l1d_flush ? " L1D_FLUSH" : "",
|
||||
opt_md_clear_pv || opt_md_clear_hvm ? " VERW" : "",
|
||||
@@ -1149,6 +1157,34 @@ void __init init_speculation_mitigations(void)
|
||||
tsx_init();
|
||||
}
|
||||
|
||||
+ /* Calculate suitable defaults for MSR_MCU_OPT_CTRL */
|
||||
+ if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
|
||||
+ {
|
||||
+ uint64_t val;
|
||||
+
|
||||
+ rdmsrl(MSR_MCU_OPT_CTRL, val);
|
||||
+
|
||||
+ /*
|
||||
+ * On some SRBDS-affected hardware, it may be safe to relax srb-lock
|
||||
+ * by default.
|
||||
+ *
|
||||
+ * On parts which enumerate MDS_NO and not TAA_NO, TSX is the only way
|
||||
+ * to access the Fill Buffer. If TSX isn't available (inc. SKU
|
||||
+ * reasons on some models), or TSX is explicitly disabled, then there
|
||||
+ * is no need for the extra overhead to protect RDRAND/RDSEED.
|
||||
+ */
|
||||
+ if ( opt_srb_lock == -1 &&
|
||||
+ (caps & (ARCH_CAPS_MDS_NO|ARCH_CAPS_TAA_NO)) == ARCH_CAPS_MDS_NO &&
|
||||
+ (!cpu_has_hle || ((caps & ARCH_CAPS_TSX_CTRL) && opt_tsx == 0)) )
|
||||
+ opt_srb_lock = 0;
|
||||
+
|
||||
+ val &= ~MCU_OPT_CTRL_RNGDS_MITG_DIS;
|
||||
+ if ( !opt_srb_lock )
|
||||
+ val |= MCU_OPT_CTRL_RNGDS_MITG_DIS;
|
||||
+
|
||||
+ default_xen_mcu_opt_ctrl = val;
|
||||
+ }
|
||||
+
|
||||
print_details(thunk, caps);
|
||||
|
||||
/*
|
||||
@@ -1180,6 +1216,9 @@ void __init init_speculation_mitigations(void)
|
||||
|
||||
wrmsrl(MSR_SPEC_CTRL, bsp_delay_spec_ctrl ? 0 : default_xen_spec_ctrl);
|
||||
}
|
||||
+
|
||||
+ if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
|
||||
+ wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl);
|
||||
}
|
||||
|
||||
static void __init __maybe_unused build_assertions(void)
|
||||
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
|
||||
index 9caecddfec..b252bb8631 100644
|
||||
--- a/xen/include/asm-x86/spec_ctrl.h
|
||||
+++ b/xen/include/asm-x86/spec_ctrl.h
|
||||
@@ -54,6 +54,8 @@ extern int8_t opt_pv_l1tf_hwdom, opt_pv_l1tf_domu;
|
||||
*/
|
||||
extern paddr_t l1tf_addr_mask, l1tf_safe_maddr;
|
||||
|
||||
+extern uint64_t default_xen_mcu_opt_ctrl;
|
||||
+
|
||||
static inline void init_shadow_spec_ctrl_state(void)
|
||||
{
|
||||
struct cpu_info *info = get_cpu_info();
|
|
@ -1,36 +0,0 @@
|
|||
From: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Subject: x86/spec-ctrl: Update docs with SRBDS workaround
|
||||
|
||||
RDRAND/RDSEED can be hidden using cpuid= to mitigate SRBDS if microcode
|
||||
isn't available.
|
||||
|
||||
This is part of XSA-320 / CVE-2020-0543.
|
||||
|
||||
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Acked-by: Julien Grall <jgrall@amazon.com>
|
||||
|
||||
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
|
||||
index c780312531..81e12d053c 100644
|
||||
--- a/docs/misc/xen-command-line.pandoc
|
||||
+++ b/docs/misc/xen-command-line.pandoc
|
||||
@@ -481,12 +481,18 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
|
||||
This option allows for fine tuning of the facilities Xen will use, after
|
||||
accounting for hardware capabilities as enumerated via CPUID.
|
||||
|
||||
+Unless otherwise noted, options only have any effect in their negative form,
|
||||
+to hide the named feature(s). Ignoring a feature using this mechanism will
|
||||
+cause Xen not to use the feature, nor offer them as usable to guests.
|
||||
+
|
||||
Currently accepted:
|
||||
|
||||
The Speculation Control hardware features `srbds-ctrl`, `md-clear`, `ibrsb`,
|
||||
`stibp`, `ibpb`, `l1d-flush` and `ssbd` are used by default if available and
|
||||
-applicable. They can be ignored, e.g. `no-ibrsb`, at which point Xen won't
|
||||
-use them itself, and won't offer them to guests.
|
||||
+applicable. They can all be ignored.
|
||||
+
|
||||
+`rdrand` and `rdseed` can be ignored, as a mitigation to XSA-320 /
|
||||
+CVE-2020-0543.
|
||||
|
||||
### cpuid_mask_cpu
|
||||
> `= fam_0f_rev_[cdefg] | fam_10_rev_[bc] | fam_11_rev_b`
|
|
@ -1,63 +0,0 @@
|
|||
From 030300ebbb86c40c12db038714479d746167c767 Mon Sep 17 00:00:00 2001
|
||||
From: Julien Grall <jgrall@amazon.com>
|
||||
Date: Tue, 26 May 2020 18:31:33 +0100
|
||||
Subject: [PATCH] xen: Check the alignment of the offset pased via
|
||||
VCPUOP_register_vcpu_info
|
||||
|
||||
Currently a guest is able to register any guest physical address to use
|
||||
for the vcpu_info structure as long as the structure can fits in the
|
||||
rest of the frame.
|
||||
|
||||
This means a guest can provide an address that is not aligned to the
|
||||
natural alignment of the structure.
|
||||
|
||||
On Arm 32-bit, unaligned access are completely forbidden by the
|
||||
hypervisor. This will result to a data abort which is fatal.
|
||||
|
||||
On Arm 64-bit, unaligned access are only forbidden when used for atomic
|
||||
access. As the structure contains fields (such as evtchn_pending_self)
|
||||
that are updated using atomic operations, any unaligned access will be
|
||||
fatal as well.
|
||||
|
||||
While the misalignment is only fatal on Arm, a generic check is added
|
||||
as an x86 guest shouldn't sensibly pass an unaligned address (this
|
||||
would result to a split lock).
|
||||
|
||||
This is XSA-327.
|
||||
|
||||
Reported-by: Julien Grall <jgrall@amazon.com>
|
||||
Signed-off-by: Julien Grall <jgrall@amazon.com>
|
||||
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
---
|
||||
xen/common/domain.c | 10 ++++++++++
|
||||
1 file changed, 10 insertions(+)
|
||||
|
||||
diff --git a/xen/common/domain.c b/xen/common/domain.c
|
||||
index 7cc9526139a6..e9be05f1d05f 100644
|
||||
--- a/xen/common/domain.c
|
||||
+++ b/xen/common/domain.c
|
||||
@@ -1227,10 +1227,20 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
|
||||
void *mapping;
|
||||
vcpu_info_t *new_info;
|
||||
struct page_info *page;
|
||||
+ unsigned int align;
|
||||
|
||||
if ( offset > (PAGE_SIZE - sizeof(vcpu_info_t)) )
|
||||
return -EINVAL;
|
||||
|
||||
+#ifdef CONFIG_COMPAT
|
||||
+ if ( has_32bit_shinfo(d) )
|
||||
+ align = alignof(new_info->compat);
|
||||
+ else
|
||||
+#endif
|
||||
+ align = alignof(*new_info);
|
||||
+ if ( offset & (align - 1) )
|
||||
+ return -EINVAL;
|
||||
+
|
||||
if ( !mfn_eq(v->vcpu_info_mfn, INVALID_MFN) )
|
||||
return -EINVAL;
|
||||
|
||||
--
|
||||
2.17.1
|
||||
|
|
@ -1,118 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: x86/EPT: ept_set_middle_entry() related adjustments
|
||||
|
||||
ept_split_super_page() wants to further modify the newly allocated
|
||||
table, so have ept_set_middle_entry() return the mapped pointer rather
|
||||
than tearing it down and then getting re-established right again.
|
||||
|
||||
Similarly ept_next_level() wants to hand back a mapped pointer of
|
||||
the next level page, so re-use the one established by
|
||||
ept_set_middle_entry() in case that path was taken.
|
||||
|
||||
Pull the setting of suppress_ve ahead of insertion into the higher level
|
||||
table, and don't have ept_split_super_page() set the field a 2nd time.
|
||||
|
||||
This is part of XSA-328.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/arch/x86/mm/p2m-ept.c
|
||||
+++ b/xen/arch/x86/mm/p2m-ept.c
|
||||
@@ -187,8 +187,9 @@ static void ept_p2m_type_to_flags(struct
|
||||
#define GUEST_TABLE_SUPER_PAGE 2
|
||||
#define GUEST_TABLE_POD_PAGE 3
|
||||
|
||||
-/* Fill in middle levels of ept table */
|
||||
-static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
|
||||
+/* Fill in middle level of ept table; return pointer to mapped new table. */
|
||||
+static ept_entry_t *ept_set_middle_entry(struct p2m_domain *p2m,
|
||||
+ ept_entry_t *ept_entry)
|
||||
{
|
||||
mfn_t mfn;
|
||||
ept_entry_t *table;
|
||||
@@ -196,7 +197,12 @@ static int ept_set_middle_entry(struct p
|
||||
|
||||
mfn = p2m_alloc_ptp(p2m, 0);
|
||||
if ( mfn_eq(mfn, INVALID_MFN) )
|
||||
- return 0;
|
||||
+ return NULL;
|
||||
+
|
||||
+ table = map_domain_page(mfn);
|
||||
+
|
||||
+ for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
|
||||
+ table[i].suppress_ve = 1;
|
||||
|
||||
ept_entry->epte = 0;
|
||||
ept_entry->mfn = mfn_x(mfn);
|
||||
@@ -208,14 +214,7 @@ static int ept_set_middle_entry(struct p
|
||||
|
||||
ept_entry->suppress_ve = 1;
|
||||
|
||||
- table = map_domain_page(mfn);
|
||||
-
|
||||
- for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
|
||||
- table[i].suppress_ve = 1;
|
||||
-
|
||||
- unmap_domain_page(table);
|
||||
-
|
||||
- return 1;
|
||||
+ return table;
|
||||
}
|
||||
|
||||
/* free ept sub tree behind an entry */
|
||||
@@ -253,10 +252,10 @@ static bool_t ept_split_super_page(struc
|
||||
|
||||
ASSERT(is_epte_superpage(ept_entry));
|
||||
|
||||
- if ( !ept_set_middle_entry(p2m, &new_ept) )
|
||||
+ table = ept_set_middle_entry(p2m, &new_ept);
|
||||
+ if ( !table )
|
||||
return 0;
|
||||
|
||||
- table = map_domain_page(_mfn(new_ept.mfn));
|
||||
trunk = 1UL << ((level - 1) * EPT_TABLE_ORDER);
|
||||
|
||||
for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
|
||||
@@ -267,7 +266,6 @@ static bool_t ept_split_super_page(struc
|
||||
epte->sp = (level > 1);
|
||||
epte->mfn += i * trunk;
|
||||
epte->snp = is_iommu_enabled(p2m->domain) && iommu_snoop;
|
||||
- epte->suppress_ve = 1;
|
||||
|
||||
ept_p2m_type_to_flags(p2m, epte, epte->sa_p2mt, epte->access);
|
||||
|
||||
@@ -306,8 +304,7 @@ static int ept_next_level(struct p2m_dom
|
||||
ept_entry_t **table, unsigned long *gfn_remainder,
|
||||
int next_level)
|
||||
{
|
||||
- unsigned long mfn;
|
||||
- ept_entry_t *ept_entry, e;
|
||||
+ ept_entry_t *ept_entry, *next = NULL, e;
|
||||
u32 shift, index;
|
||||
|
||||
shift = next_level * EPT_TABLE_ORDER;
|
||||
@@ -332,19 +329,17 @@ static int ept_next_level(struct p2m_dom
|
||||
if ( read_only )
|
||||
return GUEST_TABLE_MAP_FAILED;
|
||||
|
||||
- if ( !ept_set_middle_entry(p2m, ept_entry) )
|
||||
+ next = ept_set_middle_entry(p2m, ept_entry);
|
||||
+ if ( !next )
|
||||
return GUEST_TABLE_MAP_FAILED;
|
||||
- else
|
||||
- e = atomic_read_ept_entry(ept_entry); /* Refresh */
|
||||
+ /* e is now stale and hence may not be used anymore below. */
|
||||
}
|
||||
-
|
||||
/* The only time sp would be set here is if we had hit a superpage */
|
||||
- if ( is_epte_superpage(&e) )
|
||||
+ else if ( is_epte_superpage(&e) )
|
||||
return GUEST_TABLE_SUPER_PAGE;
|
||||
|
||||
- mfn = e.mfn;
|
||||
unmap_domain_page(*table);
|
||||
- *table = map_domain_page(_mfn(mfn));
|
||||
+ *table = next ?: map_domain_page(_mfn(e.mfn));
|
||||
*gfn_remainder &= (1UL << shift) - 1;
|
||||
return GUEST_TABLE_NORMAL_PAGE;
|
||||
}
|
|
@ -1,48 +0,0 @@
|
|||
From: <security@xenproject.org>
|
||||
Subject: x86/ept: atomically modify entries in ept_next_level
|
||||
|
||||
ept_next_level was passing a live PTE pointer to ept_set_middle_entry,
|
||||
which was then modified without taking into account that the PTE could
|
||||
be part of a live EPT table. This wasn't a security issue because the
|
||||
pages returned by p2m_alloc_ptp are zeroed, so adding such an entry
|
||||
before actually initializing it didn't allow a guest to access
|
||||
physical memory addresses it wasn't supposed to access.
|
||||
|
||||
This is part of XSA-328.
|
||||
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/arch/x86/mm/p2m-ept.c
|
||||
+++ b/xen/arch/x86/mm/p2m-ept.c
|
||||
@@ -307,6 +307,8 @@ static int ept_next_level(struct p2m_dom
|
||||
ept_entry_t *ept_entry, *next = NULL, e;
|
||||
u32 shift, index;
|
||||
|
||||
+ ASSERT(next_level);
|
||||
+
|
||||
shift = next_level * EPT_TABLE_ORDER;
|
||||
|
||||
index = *gfn_remainder >> shift;
|
||||
@@ -323,16 +325,20 @@ static int ept_next_level(struct p2m_dom
|
||||
|
||||
if ( !is_epte_present(&e) )
|
||||
{
|
||||
+ int rc;
|
||||
+
|
||||
if ( e.sa_p2mt == p2m_populate_on_demand )
|
||||
return GUEST_TABLE_POD_PAGE;
|
||||
|
||||
if ( read_only )
|
||||
return GUEST_TABLE_MAP_FAILED;
|
||||
|
||||
- next = ept_set_middle_entry(p2m, ept_entry);
|
||||
+ next = ept_set_middle_entry(p2m, &e);
|
||||
if ( !next )
|
||||
return GUEST_TABLE_MAP_FAILED;
|
||||
- /* e is now stale and hence may not be used anymore below. */
|
||||
+
|
||||
+ rc = atomic_write_ept_entry(p2m, ept_entry, e, next_level);
|
||||
+ ASSERT(rc == 0);
|
||||
}
|
||||
/* The only time sp would be set here is if we had hit a superpage */
|
||||
else if ( is_epte_superpage(&e) )
|
|
@ -1,31 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: vtd: improve IOMMU TLB flush
|
||||
|
||||
Do not limit PSI flushes to order 0 pages, in order to avoid doing a
|
||||
full TLB flush if the passed in page has an order greater than 0 and
|
||||
is aligned. Should increase the performance of IOMMU TLB flushes when
|
||||
dealing with page orders greater than 0.
|
||||
|
||||
This is part of XSA-321.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/drivers/passthrough/vtd/iommu.c
|
||||
+++ b/xen/drivers/passthrough/vtd/iommu.c
|
||||
@@ -570,13 +570,14 @@ static int __must_check iommu_flush_iotl
|
||||
if ( iommu_domid == -1 )
|
||||
continue;
|
||||
|
||||
- if ( page_count != 1 || dfn_eq(dfn, INVALID_DFN) )
|
||||
+ if ( !page_count || (page_count & (page_count - 1)) ||
|
||||
+ dfn_eq(dfn, INVALID_DFN) || !IS_ALIGNED(dfn_x(dfn), page_count) )
|
||||
rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
|
||||
0, flush_dev_iotlb);
|
||||
else
|
||||
rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
|
||||
dfn_to_daddr(dfn),
|
||||
- PAGE_ORDER_4K,
|
||||
+ get_order_from_pages(page_count),
|
||||
!dma_old_pte_present,
|
||||
flush_dev_iotlb);
|
||||
|
|
@ -1,175 +0,0 @@
|
|||
From: <security@xenproject.org>
|
||||
Subject: vtd: prune (and rename) cache flush functions
|
||||
|
||||
Rename __iommu_flush_cache to iommu_sync_cache and remove
|
||||
iommu_flush_cache_page. Also remove the iommu_flush_cache_entry
|
||||
wrapper and just use iommu_sync_cache instead. Note the _entry suffix
|
||||
was meaningless as the wrapper was already taking a size parameter in
|
||||
bytes. While there also constify the addr parameter.
|
||||
|
||||
No functional change intended.
|
||||
|
||||
This is part of XSA-321.
|
||||
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/drivers/passthrough/vtd/extern.h
|
||||
+++ b/xen/drivers/passthrough/vtd/extern.h
|
||||
@@ -43,8 +43,7 @@ void disable_qinval(struct vtd_iommu *io
|
||||
int enable_intremap(struct vtd_iommu *iommu, int eim);
|
||||
void disable_intremap(struct vtd_iommu *iommu);
|
||||
|
||||
-void iommu_flush_cache_entry(void *addr, unsigned int size);
|
||||
-void iommu_flush_cache_page(void *addr, unsigned long npages);
|
||||
+void iommu_sync_cache(const void *addr, unsigned int size);
|
||||
int iommu_alloc(struct acpi_drhd_unit *drhd);
|
||||
void iommu_free(struct acpi_drhd_unit *drhd);
|
||||
|
||||
--- a/xen/drivers/passthrough/vtd/intremap.c
|
||||
+++ b/xen/drivers/passthrough/vtd/intremap.c
|
||||
@@ -230,7 +230,7 @@ static void free_remap_entry(struct vtd_
|
||||
iremap_entries, iremap_entry);
|
||||
|
||||
update_irte(iommu, iremap_entry, &new_ire, false);
|
||||
- iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
|
||||
+ iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
|
||||
iommu_flush_iec_index(iommu, 0, index);
|
||||
|
||||
unmap_vtd_domain_page(iremap_entries);
|
||||
@@ -406,7 +406,7 @@ static int ioapic_rte_to_remap_entry(str
|
||||
}
|
||||
|
||||
update_irte(iommu, iremap_entry, &new_ire, !init);
|
||||
- iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
|
||||
+ iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
|
||||
iommu_flush_iec_index(iommu, 0, index);
|
||||
|
||||
unmap_vtd_domain_page(iremap_entries);
|
||||
@@ -695,7 +695,7 @@ static int msi_msg_to_remap_entry(
|
||||
update_irte(iommu, iremap_entry, &new_ire, msi_desc->irte_initialized);
|
||||
msi_desc->irte_initialized = true;
|
||||
|
||||
- iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
|
||||
+ iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
|
||||
iommu_flush_iec_index(iommu, 0, index);
|
||||
|
||||
unmap_vtd_domain_page(iremap_entries);
|
||||
--- a/xen/drivers/passthrough/vtd/iommu.c
|
||||
+++ b/xen/drivers/passthrough/vtd/iommu.c
|
||||
@@ -140,7 +140,8 @@ static int context_get_domain_id(struct
|
||||
}
|
||||
|
||||
static int iommus_incoherent;
|
||||
-static void __iommu_flush_cache(void *addr, unsigned int size)
|
||||
+
|
||||
+void iommu_sync_cache(const void *addr, unsigned int size)
|
||||
{
|
||||
int i;
|
||||
static unsigned int clflush_size = 0;
|
||||
@@ -155,16 +156,6 @@ static void __iommu_flush_cache(void *ad
|
||||
cacheline_flush((char *)addr + i);
|
||||
}
|
||||
|
||||
-void iommu_flush_cache_entry(void *addr, unsigned int size)
|
||||
-{
|
||||
- __iommu_flush_cache(addr, size);
|
||||
-}
|
||||
-
|
||||
-void iommu_flush_cache_page(void *addr, unsigned long npages)
|
||||
-{
|
||||
- __iommu_flush_cache(addr, PAGE_SIZE * npages);
|
||||
-}
|
||||
-
|
||||
/* Allocate page table, return its machine address */
|
||||
uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid_t node)
|
||||
{
|
||||
@@ -183,7 +174,7 @@ uint64_t alloc_pgtable_maddr(unsigned lo
|
||||
vaddr = __map_domain_page(cur_pg);
|
||||
memset(vaddr, 0, PAGE_SIZE);
|
||||
|
||||
- iommu_flush_cache_page(vaddr, 1);
|
||||
+ iommu_sync_cache(vaddr, PAGE_SIZE);
|
||||
unmap_domain_page(vaddr);
|
||||
cur_pg++;
|
||||
}
|
||||
@@ -216,7 +207,7 @@ static u64 bus_to_context_maddr(struct v
|
||||
}
|
||||
set_root_value(*root, maddr);
|
||||
set_root_present(*root);
|
||||
- iommu_flush_cache_entry(root, sizeof(struct root_entry));
|
||||
+ iommu_sync_cache(root, sizeof(struct root_entry));
|
||||
}
|
||||
maddr = (u64) get_context_addr(*root);
|
||||
unmap_vtd_domain_page(root_entries);
|
||||
@@ -263,7 +254,7 @@ static u64 addr_to_dma_page_maddr(struct
|
||||
*/
|
||||
dma_set_pte_readable(*pte);
|
||||
dma_set_pte_writable(*pte);
|
||||
- iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
|
||||
+ iommu_sync_cache(pte, sizeof(struct dma_pte));
|
||||
}
|
||||
|
||||
if ( level == 2 )
|
||||
@@ -640,7 +631,7 @@ static int __must_check dma_pte_clear_on
|
||||
*flush_flags |= IOMMU_FLUSHF_modified;
|
||||
|
||||
spin_unlock(&hd->arch.mapping_lock);
|
||||
- iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
|
||||
+ iommu_sync_cache(pte, sizeof(struct dma_pte));
|
||||
|
||||
unmap_vtd_domain_page(page);
|
||||
|
||||
@@ -679,7 +670,7 @@ static void iommu_free_page_table(struct
|
||||
iommu_free_pagetable(dma_pte_addr(*pte), next_level);
|
||||
|
||||
dma_clear_pte(*pte);
|
||||
- iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
|
||||
+ iommu_sync_cache(pte, sizeof(struct dma_pte));
|
||||
}
|
||||
|
||||
unmap_vtd_domain_page(pt_vaddr);
|
||||
@@ -1400,7 +1391,7 @@ int domain_context_mapping_one(
|
||||
context_set_address_width(*context, agaw);
|
||||
context_set_fault_enable(*context);
|
||||
context_set_present(*context);
|
||||
- iommu_flush_cache_entry(context, sizeof(struct context_entry));
|
||||
+ iommu_sync_cache(context, sizeof(struct context_entry));
|
||||
spin_unlock(&iommu->lock);
|
||||
|
||||
/* Context entry was previously non-present (with domid 0). */
|
||||
@@ -1564,7 +1555,7 @@ int domain_context_unmap_one(
|
||||
|
||||
context_clear_present(*context);
|
||||
context_clear_entry(*context);
|
||||
- iommu_flush_cache_entry(context, sizeof(struct context_entry));
|
||||
+ iommu_sync_cache(context, sizeof(struct context_entry));
|
||||
|
||||
iommu_domid= domain_iommu_domid(domain, iommu);
|
||||
if ( iommu_domid == -1 )
|
||||
@@ -1791,7 +1782,7 @@ static int __must_check intel_iommu_map_
|
||||
|
||||
*pte = new;
|
||||
|
||||
- iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
|
||||
+ iommu_sync_cache(pte, sizeof(struct dma_pte));
|
||||
spin_unlock(&hd->arch.mapping_lock);
|
||||
unmap_vtd_domain_page(page);
|
||||
|
||||
@@ -1866,7 +1857,7 @@ int iommu_pte_flush(struct domain *d, ui
|
||||
int iommu_domid;
|
||||
int rc = 0;
|
||||
|
||||
- iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
|
||||
+ iommu_sync_cache(pte, sizeof(struct dma_pte));
|
||||
|
||||
for_each_drhd_unit ( drhd )
|
||||
{
|
||||
@@ -2724,7 +2715,7 @@ static int __init intel_iommu_quarantine
|
||||
dma_set_pte_addr(*pte, maddr);
|
||||
dma_set_pte_readable(*pte);
|
||||
}
|
||||
- iommu_flush_cache_page(parent, 1);
|
||||
+ iommu_sync_cache(parent, PAGE_SIZE);
|
||||
|
||||
unmap_vtd_domain_page(parent);
|
||||
parent = map_vtd_domain_page(maddr);
|
|
@ -1,82 +0,0 @@
|
|||
From: <security@xenproject.org>
|
||||
Subject: x86/iommu: introduce a cache sync hook
|
||||
|
||||
The hook is only implemented for VT-d and it uses the already existing
|
||||
iommu_sync_cache function present in VT-d code. The new hook is
|
||||
added so that the cache can be flushed by code outside of VT-d when
|
||||
using shared page tables.
|
||||
|
||||
Note that alloc_pgtable_maddr must use the now locally defined
|
||||
sync_cache function, because IOMMU ops are not yet setup the first
|
||||
time the function gets called during IOMMU initialization.
|
||||
|
||||
No functional change intended.
|
||||
|
||||
This is part of XSA-321.
|
||||
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/drivers/passthrough/vtd/extern.h
|
||||
+++ b/xen/drivers/passthrough/vtd/extern.h
|
||||
@@ -43,7 +43,6 @@ void disable_qinval(struct vtd_iommu *io
|
||||
int enable_intremap(struct vtd_iommu *iommu, int eim);
|
||||
void disable_intremap(struct vtd_iommu *iommu);
|
||||
|
||||
-void iommu_sync_cache(const void *addr, unsigned int size);
|
||||
int iommu_alloc(struct acpi_drhd_unit *drhd);
|
||||
void iommu_free(struct acpi_drhd_unit *drhd);
|
||||
|
||||
--- a/xen/drivers/passthrough/vtd/iommu.c
|
||||
+++ b/xen/drivers/passthrough/vtd/iommu.c
|
||||
@@ -141,7 +141,7 @@ static int context_get_domain_id(struct
|
||||
|
||||
static int iommus_incoherent;
|
||||
|
||||
-void iommu_sync_cache(const void *addr, unsigned int size)
|
||||
+static void sync_cache(const void *addr, unsigned int size)
|
||||
{
|
||||
int i;
|
||||
static unsigned int clflush_size = 0;
|
||||
@@ -174,7 +174,7 @@ uint64_t alloc_pgtable_maddr(unsigned lo
|
||||
vaddr = __map_domain_page(cur_pg);
|
||||
memset(vaddr, 0, PAGE_SIZE);
|
||||
|
||||
- iommu_sync_cache(vaddr, PAGE_SIZE);
|
||||
+ sync_cache(vaddr, PAGE_SIZE);
|
||||
unmap_domain_page(vaddr);
|
||||
cur_pg++;
|
||||
}
|
||||
@@ -2763,6 +2763,7 @@ const struct iommu_ops __initconstrel in
|
||||
.iotlb_flush_all = iommu_flush_iotlb_all,
|
||||
.get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
|
||||
.dump_p2m_table = vtd_dump_p2m_table,
|
||||
+ .sync_cache = sync_cache,
|
||||
};
|
||||
|
||||
const struct iommu_init_ops __initconstrel intel_iommu_init_ops = {
|
||||
--- a/xen/include/asm-x86/iommu.h
|
||||
+++ b/xen/include/asm-x86/iommu.h
|
||||
@@ -121,6 +121,13 @@ extern bool untrusted_msi;
|
||||
int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq,
|
||||
const uint8_t gvec);
|
||||
|
||||
+#define iommu_sync_cache(addr, size) ({ \
|
||||
+ const struct iommu_ops *ops = iommu_get_ops(); \
|
||||
+ \
|
||||
+ if ( ops->sync_cache ) \
|
||||
+ iommu_vcall(ops, sync_cache, addr, size); \
|
||||
+})
|
||||
+
|
||||
#endif /* !__ARCH_X86_IOMMU_H__ */
|
||||
/*
|
||||
* Local variables:
|
||||
--- a/xen/include/xen/iommu.h
|
||||
+++ b/xen/include/xen/iommu.h
|
||||
@@ -250,6 +250,7 @@ struct iommu_ops {
|
||||
int (*setup_hpet_msi)(struct msi_desc *);
|
||||
|
||||
int (*adjust_irq_affinities)(void);
|
||||
+ void (*sync_cache)(const void *addr, unsigned int size);
|
||||
#endif /* CONFIG_X86 */
|
||||
|
||||
int __must_check (*suspend)(void);
|
|
@ -1,36 +0,0 @@
|
|||
From: <security@xenproject.org>
|
||||
Subject: vtd: don't assume addresses are aligned in sync_cache
|
||||
|
||||
Current code in sync_cache assume that the address passed in is
|
||||
aligned to a cache line size. Fix the code to support passing in
|
||||
arbitrary addresses not necessarily aligned to a cache line size.
|
||||
|
||||
This is part of XSA-321.
|
||||
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/drivers/passthrough/vtd/iommu.c
|
||||
+++ b/xen/drivers/passthrough/vtd/iommu.c
|
||||
@@ -143,8 +143,8 @@ static int iommus_incoherent;
|
||||
|
||||
static void sync_cache(const void *addr, unsigned int size)
|
||||
{
|
||||
- int i;
|
||||
- static unsigned int clflush_size = 0;
|
||||
+ static unsigned long clflush_size = 0;
|
||||
+ const void *end = addr + size;
|
||||
|
||||
if ( !iommus_incoherent )
|
||||
return;
|
||||
@@ -152,8 +152,9 @@ static void sync_cache(const void *addr,
|
||||
if ( clflush_size == 0 )
|
||||
clflush_size = get_cache_line_size();
|
||||
|
||||
- for ( i = 0; i < size; i += clflush_size )
|
||||
- cacheline_flush((char *)addr + i);
|
||||
+ addr -= (unsigned long)addr & (clflush_size - 1);
|
||||
+ for ( ; addr < end; addr += clflush_size )
|
||||
+ cacheline_flush((char *)addr);
|
||||
}
|
||||
|
||||
/* Allocate page table, return its machine address */
|
|
@ -1,24 +0,0 @@
|
|||
From: <security@xenproject.org>
|
||||
Subject: x86/alternative: introduce alternative_2
|
||||
|
||||
It's based on alternative_io_2 without inputs or outputs but with an
|
||||
added memory clobber.
|
||||
|
||||
This is part of XSA-321.
|
||||
|
||||
Acked-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/include/asm-x86/alternative.h
|
||||
+++ b/xen/include/asm-x86/alternative.h
|
||||
@@ -114,6 +114,11 @@ extern void alternative_branches(void);
|
||||
#define alternative(oldinstr, newinstr, feature) \
|
||||
asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory")
|
||||
|
||||
+#define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
|
||||
+ asm volatile (ALTERNATIVE_2(oldinstr, newinstr1, feature1, \
|
||||
+ newinstr2, feature2) \
|
||||
+ : : : "memory")
|
||||
+
|
||||
/*
|
||||
* Alternative inline assembly with input.
|
||||
*
|
|
@ -1,91 +0,0 @@
|
|||
From: <security@xenproject.org>
|
||||
Subject: vtd: optimize CPU cache sync
|
||||
|
||||
Some VT-d IOMMUs are non-coherent, which requires a cache write back
|
||||
in order for the changes made by the CPU to be visible to the IOMMU.
|
||||
This cache write back was unconditionally done using clflush, but there are
|
||||
other more efficient instructions to do so, hence implement support
|
||||
for them using the alternative framework.
|
||||
|
||||
This is part of XSA-321.
|
||||
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/drivers/passthrough/vtd/extern.h
|
||||
+++ b/xen/drivers/passthrough/vtd/extern.h
|
||||
@@ -68,7 +68,6 @@ int __must_check qinval_device_iotlb_syn
|
||||
u16 did, u16 size, u64 addr);
|
||||
|
||||
unsigned int get_cache_line_size(void);
|
||||
-void cacheline_flush(char *);
|
||||
void flush_all_cache(void);
|
||||
|
||||
uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid_t node);
|
||||
--- a/xen/drivers/passthrough/vtd/iommu.c
|
||||
+++ b/xen/drivers/passthrough/vtd/iommu.c
|
||||
@@ -31,6 +31,7 @@
|
||||
#include <xen/pci_regs.h>
|
||||
#include <xen/keyhandler.h>
|
||||
#include <asm/msi.h>
|
||||
+#include <asm/nops.h>
|
||||
#include <asm/irq.h>
|
||||
#include <asm/hvm/vmx/vmx.h>
|
||||
#include <asm/p2m.h>
|
||||
@@ -154,7 +155,42 @@ static void sync_cache(const void *addr,
|
||||
|
||||
addr -= (unsigned long)addr & (clflush_size - 1);
|
||||
for ( ; addr < end; addr += clflush_size )
|
||||
- cacheline_flush((char *)addr);
|
||||
+/*
|
||||
+ * The arguments to a macro must not include preprocessor directives. Doing so
|
||||
+ * results in undefined behavior, so we have to create some defines here in
|
||||
+ * order to avoid it.
|
||||
+ */
|
||||
+#if defined(HAVE_AS_CLWB)
|
||||
+# define CLWB_ENCODING "clwb %[p]"
|
||||
+#elif defined(HAVE_AS_XSAVEOPT)
|
||||
+# define CLWB_ENCODING "data16 xsaveopt %[p]" /* clwb */
|
||||
+#else
|
||||
+# define CLWB_ENCODING ".byte 0x66, 0x0f, 0xae, 0x30" /* clwb (%%rax) */
|
||||
+#endif
|
||||
+
|
||||
+#define BASE_INPUT(addr) [p] "m" (*(const char *)(addr))
|
||||
+#if defined(HAVE_AS_CLWB) || defined(HAVE_AS_XSAVEOPT)
|
||||
+# define INPUT BASE_INPUT
|
||||
+#else
|
||||
+# define INPUT(addr) "a" (addr), BASE_INPUT(addr)
|
||||
+#endif
|
||||
+ /*
|
||||
+ * Note regarding the use of NOP_DS_PREFIX: it's faster to do a clflush
|
||||
+ * + prefix than a clflush + nop, and hence the prefix is added instead
|
||||
+ * of letting the alternative framework fill the gap by appending nops.
|
||||
+ */
|
||||
+ alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %[p]",
|
||||
+ "data16 clflush %[p]", /* clflushopt */
|
||||
+ X86_FEATURE_CLFLUSHOPT,
|
||||
+ CLWB_ENCODING,
|
||||
+ X86_FEATURE_CLWB, /* no outputs */,
|
||||
+ INPUT(addr));
|
||||
+#undef INPUT
|
||||
+#undef BASE_INPUT
|
||||
+#undef CLWB_ENCODING
|
||||
+
|
||||
+ alternative_2("", "sfence", X86_FEATURE_CLFLUSHOPT,
|
||||
+ "sfence", X86_FEATURE_CLWB);
|
||||
}
|
||||
|
||||
/* Allocate page table, return its machine address */
|
||||
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
|
||||
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
|
||||
@@ -51,11 +51,6 @@ unsigned int get_cache_line_size(void)
|
||||
return ((cpuid_ebx(1) >> 8) & 0xff) * 8;
|
||||
}
|
||||
|
||||
-void cacheline_flush(char * addr)
|
||||
-{
|
||||
- clflush(addr);
|
||||
-}
|
||||
-
|
||||
void flush_all_cache()
|
||||
{
|
||||
wbinvd();
|
|
@ -1,153 +0,0 @@
|
|||
From: <security@xenproject.org>
|
||||
Subject: x86/ept: flush cache when modifying PTEs and sharing page tables
|
||||
|
||||
Modifications made to the page tables by EPT code need to be written
|
||||
to memory when the page tables are shared with the IOMMU, as Intel
|
||||
IOMMUs can be non-coherent and thus require changes to be written to
|
||||
memory in order to be visible to the IOMMU.
|
||||
|
||||
In order to achieve this make sure data is written back to memory
|
||||
after writing an EPT entry when the recalc bit is not set in
|
||||
atomic_write_ept_entry. If such bit is set, the entry will be
|
||||
adjusted and atomic_write_ept_entry will be called a second time
|
||||
without the recalc bit set. Note that when splitting a super page the
|
||||
new tables resulting of the split should also be written back.
|
||||
|
||||
Failure to do so can allow devices behind the IOMMU access to the
|
||||
stale super page, or cause coherency issues as changes made by the
|
||||
processor to the page tables are not visible to the IOMMU.
|
||||
|
||||
This allows to remove the VT-d specific iommu_pte_flush helper, since
|
||||
the cache write back is now performed by atomic_write_ept_entry, and
|
||||
hence iommu_iotlb_flush can be used to flush the IOMMU TLB. The newly
|
||||
used method (iommu_iotlb_flush) can result in less flushes, since it
|
||||
might sometimes be called rightly with 0 flags, in which case it
|
||||
becomes a no-op.
|
||||
|
||||
This is part of XSA-321.
|
||||
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/arch/x86/mm/p2m-ept.c
|
||||
+++ b/xen/arch/x86/mm/p2m-ept.c
|
||||
@@ -58,6 +58,19 @@ static int atomic_write_ept_entry(struct
|
||||
|
||||
write_atomic(&entryptr->epte, new.epte);
|
||||
|
||||
+ /*
|
||||
+ * The recalc field on the EPT is used to signal either that a
|
||||
+ * recalculation of the EMT field is required (which doesn't effect the
|
||||
+ * IOMMU), or a type change. Type changes can only be between ram_rw,
|
||||
+ * logdirty and ioreq_server: changes to/from logdirty won't work well with
|
||||
+ * an IOMMU anyway, as IOMMU #PFs are not synchronous and will lead to
|
||||
+ * aborts, and changes to/from ioreq_server are already fully flushed
|
||||
+ * before returning to guest context (see
|
||||
+ * XEN_DMOP_map_mem_type_to_ioreq_server).
|
||||
+ */
|
||||
+ if ( !new.recalc && iommu_use_hap_pt(p2m->domain) )
|
||||
+ iommu_sync_cache(entryptr, sizeof(*entryptr));
|
||||
+
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -278,6 +291,9 @@ static bool_t ept_split_super_page(struc
|
||||
break;
|
||||
}
|
||||
|
||||
+ if ( iommu_use_hap_pt(p2m->domain) )
|
||||
+ iommu_sync_cache(table, EPT_PAGETABLE_ENTRIES * sizeof(ept_entry_t));
|
||||
+
|
||||
unmap_domain_page(table);
|
||||
|
||||
/* Even failed we should install the newly allocated ept page. */
|
||||
@@ -337,6 +353,9 @@ static int ept_next_level(struct p2m_dom
|
||||
if ( !next )
|
||||
return GUEST_TABLE_MAP_FAILED;
|
||||
|
||||
+ if ( iommu_use_hap_pt(p2m->domain) )
|
||||
+ iommu_sync_cache(next, EPT_PAGETABLE_ENTRIES * sizeof(ept_entry_t));
|
||||
+
|
||||
rc = atomic_write_ept_entry(p2m, ept_entry, e, next_level);
|
||||
ASSERT(rc == 0);
|
||||
}
|
||||
@@ -821,7 +840,10 @@ out:
|
||||
need_modify_vtd_table )
|
||||
{
|
||||
if ( iommu_use_hap_pt(d) )
|
||||
- rc = iommu_pte_flush(d, gfn, &ept_entry->epte, order, vtd_pte_present);
|
||||
+ rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order),
|
||||
+ (iommu_flags ? IOMMU_FLUSHF_added : 0) |
|
||||
+ (vtd_pte_present ? IOMMU_FLUSHF_modified
|
||||
+ : 0));
|
||||
else if ( need_iommu_pt_sync(d) )
|
||||
rc = iommu_flags ?
|
||||
iommu_legacy_map(d, _dfn(gfn), mfn, order, iommu_flags) :
|
||||
--- a/xen/drivers/passthrough/vtd/iommu.c
|
||||
+++ b/xen/drivers/passthrough/vtd/iommu.c
|
||||
@@ -1884,53 +1884,6 @@ static int intel_iommu_lookup_page(struc
|
||||
return 0;
|
||||
}
|
||||
|
||||
-int iommu_pte_flush(struct domain *d, uint64_t dfn, uint64_t *pte,
|
||||
- int order, int present)
|
||||
-{
|
||||
- struct acpi_drhd_unit *drhd;
|
||||
- struct vtd_iommu *iommu = NULL;
|
||||
- struct domain_iommu *hd = dom_iommu(d);
|
||||
- bool_t flush_dev_iotlb;
|
||||
- int iommu_domid;
|
||||
- int rc = 0;
|
||||
-
|
||||
- iommu_sync_cache(pte, sizeof(struct dma_pte));
|
||||
-
|
||||
- for_each_drhd_unit ( drhd )
|
||||
- {
|
||||
- iommu = drhd->iommu;
|
||||
- if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) )
|
||||
- continue;
|
||||
-
|
||||
- flush_dev_iotlb = !!find_ats_dev_drhd(iommu);
|
||||
- iommu_domid= domain_iommu_domid(d, iommu);
|
||||
- if ( iommu_domid == -1 )
|
||||
- continue;
|
||||
-
|
||||
- rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
|
||||
- __dfn_to_daddr(dfn),
|
||||
- order, !present, flush_dev_iotlb);
|
||||
- if ( rc > 0 )
|
||||
- {
|
||||
- iommu_flush_write_buffer(iommu);
|
||||
- rc = 0;
|
||||
- }
|
||||
- }
|
||||
-
|
||||
- if ( unlikely(rc) )
|
||||
- {
|
||||
- if ( !d->is_shutting_down && printk_ratelimit() )
|
||||
- printk(XENLOG_ERR VTDPREFIX
|
||||
- " d%d: IOMMU pages flush failed: %d\n",
|
||||
- d->domain_id, rc);
|
||||
-
|
||||
- if ( !is_hardware_domain(d) )
|
||||
- domain_crash(d);
|
||||
- }
|
||||
-
|
||||
- return rc;
|
||||
-}
|
||||
-
|
||||
static int __init vtd_ept_page_compatible(struct vtd_iommu *iommu)
|
||||
{
|
||||
u64 ept_cap, vtd_cap = iommu->cap;
|
||||
--- a/xen/include/asm-x86/iommu.h
|
||||
+++ b/xen/include/asm-x86/iommu.h
|
||||
@@ -97,10 +97,6 @@ static inline int iommu_adjust_irq_affin
|
||||
: 0;
|
||||
}
|
||||
|
||||
-/* While VT-d specific, this must get declared in a generic header. */
|
||||
-int __must_check iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte,
|
||||
- int order, int present);
|
||||
-
|
||||
static inline bool iommu_supports_x2apic(void)
|
||||
{
|
||||
return iommu_init_ops && iommu_init_ops->supports_x2apic
|
|
@ -1,39 +0,0 @@
|
|||
From: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Subject: x86/pv: Handle the Intel-specific MSR_MISC_ENABLE correctly
|
||||
|
||||
This MSR doesn't exist on AMD hardware, and switching away from the safe
|
||||
functions in the common MSR path was an erroneous change.
|
||||
|
||||
Partially revert the change.
|
||||
|
||||
This is XSA-333.
|
||||
|
||||
Fixes: 4fdc932b3cc ("x86/Intel: drop another 32-bit leftover")
|
||||
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Wei Liu <wl@xen.org>
|
||||
|
||||
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
|
||||
index efeb2a727e..6332c74b80 100644
|
||||
--- a/xen/arch/x86/pv/emul-priv-op.c
|
||||
+++ b/xen/arch/x86/pv/emul-priv-op.c
|
||||
@@ -924,7 +924,8 @@ static int read_msr(unsigned int reg, uint64_t *val,
|
||||
return X86EMUL_OKAY;
|
||||
|
||||
case MSR_IA32_MISC_ENABLE:
|
||||
- rdmsrl(reg, *val);
|
||||
+ if ( rdmsr_safe(reg, *val) )
|
||||
+ break;
|
||||
*val = guest_misc_enable(*val);
|
||||
return X86EMUL_OKAY;
|
||||
|
||||
@@ -1059,7 +1060,8 @@ static int write_msr(unsigned int reg, uint64_t val,
|
||||
break;
|
||||
|
||||
case MSR_IA32_MISC_ENABLE:
|
||||
- rdmsrl(reg, temp);
|
||||
+ if ( rdmsr_safe(reg, temp) )
|
||||
+ break;
|
||||
if ( val != guest_misc_enable(temp) )
|
||||
goto invalid;
|
||||
return X86EMUL_OKAY;
|
|
@ -1,51 +0,0 @@
|
|||
From: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Subject: xen/memory: Don't skip the RCU unlock path in acquire_resource()
|
||||
|
||||
In the case that an HVM Stubdomain makes an XENMEM_acquire_resource hypercall,
|
||||
the FIXME path will bypass rcu_unlock_domain() on the way out of the function.
|
||||
|
||||
Move the check to the start of the function. This does change the behaviour
|
||||
of the get-size path for HVM Stubdomains, but that functionality is currently
|
||||
broken and unused anyway, as well as being quite useless to entities which
|
||||
can't actually map the resource anyway.
|
||||
|
||||
This is XSA-334.
|
||||
|
||||
Fixes: 83fa6552ce ("common: add a new mappable resource type: XENMEM_resource_grant_table")
|
||||
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
diff --git a/xen/common/memory.c b/xen/common/memory.c
|
||||
index 1a3c9ffb30..29741d8904 100644
|
||||
--- a/xen/common/memory.c
|
||||
+++ b/xen/common/memory.c
|
||||
@@ -1058,6 +1058,14 @@ static int acquire_resource(
|
||||
xen_pfn_t mfn_list[32];
|
||||
int rc;
|
||||
|
||||
+ /*
|
||||
+ * FIXME: Until foreign pages inserted into the P2M are properly
|
||||
+ * reference counted, it is unsafe to allow mapping of
|
||||
+ * resource pages unless the caller is the hardware domain.
|
||||
+ */
|
||||
+ if ( paging_mode_translate(currd) && !is_hardware_domain(currd) )
|
||||
+ return -EACCES;
|
||||
+
|
||||
if ( copy_from_guest(&xmar, arg, 1) )
|
||||
return -EFAULT;
|
||||
|
||||
@@ -1114,14 +1122,6 @@ static int acquire_resource(
|
||||
xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
|
||||
unsigned int i;
|
||||
|
||||
- /*
|
||||
- * FIXME: Until foreign pages inserted into the P2M are properly
|
||||
- * reference counted, it is unsafe to allow mapping of
|
||||
- * resource pages unless the caller is the hardware domain.
|
||||
- */
|
||||
- if ( !is_hardware_domain(currd) )
|
||||
- return -EACCES;
|
||||
-
|
||||
if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
|
||||
rc = -EFAULT;
|
||||
|
|
@ -1,84 +0,0 @@
|
|||
From c5bd2924c6d6a5bcbffb8b5e7798a88970131c07 Mon Sep 17 00:00:00 2001
|
||||
From: Gerd Hoffmann <kraxel@redhat.com>
|
||||
Date: Mon, 17 Aug 2020 08:34:22 +0200
|
||||
Subject: [PATCH] usb: fix setup_len init (CVE-2020-14364)
|
||||
|
||||
Store calculated setup_len in a local variable, verify it, and only
|
||||
write it to the struct (USBDevice->setup_len) in case it passed the
|
||||
sanity checks.
|
||||
|
||||
This prevents other code (do_token_{in,out} functions specifically)
|
||||
from working with invalid USBDevice->setup_len values and overrunning
|
||||
the USBDevice->setup_buf[] buffer.
|
||||
|
||||
Fixes: CVE-2020-14364
|
||||
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
|
||||
---
|
||||
hw/usb/core.c | 16 ++++++++++------
|
||||
1 file changed, 10 insertions(+), 6 deletions(-)
|
||||
|
||||
diff --git a/hw/usb/core.c b/hw/usb/core.c
|
||||
index 5abd128b6bc5..5234dcc73fea 100644
|
||||
--- a/hw/usb/core.c
|
||||
+++ b/hw/usb/core.c
|
||||
@@ -129,6 +129,7 @@ void usb_wakeup(USBEndpoint *ep, unsigned int stream)
|
||||
static void do_token_setup(USBDevice *s, USBPacket *p)
|
||||
{
|
||||
int request, value, index;
|
||||
+ unsigned int setup_len;
|
||||
|
||||
if (p->iov.size != 8) {
|
||||
p->status = USB_RET_STALL;
|
||||
@@ -138,14 +139,15 @@ static void do_token_setup(USBDevice *s, USBPacket *p)
|
||||
usb_packet_copy(p, s->setup_buf, p->iov.size);
|
||||
s->setup_index = 0;
|
||||
p->actual_length = 0;
|
||||
- s->setup_len = (s->setup_buf[7] << 8) | s->setup_buf[6];
|
||||
- if (s->setup_len > sizeof(s->data_buf)) {
|
||||
+ setup_len = (s->setup_buf[7] << 8) | s->setup_buf[6];
|
||||
+ if (setup_len > sizeof(s->data_buf)) {
|
||||
fprintf(stderr,
|
||||
"usb_generic_handle_packet: ctrl buffer too small (%d > %zu)\n",
|
||||
- s->setup_len, sizeof(s->data_buf));
|
||||
+ setup_len, sizeof(s->data_buf));
|
||||
p->status = USB_RET_STALL;
|
||||
return;
|
||||
}
|
||||
+ s->setup_len = setup_len;
|
||||
|
||||
request = (s->setup_buf[0] << 8) | s->setup_buf[1];
|
||||
value = (s->setup_buf[3] << 8) | s->setup_buf[2];
|
||||
@@ -259,26 +261,28 @@ static void do_token_out(USBDevice *s, USBPacket *p)
|
||||
static void do_parameter(USBDevice *s, USBPacket *p)
|
||||
{
|
||||
int i, request, value, index;
|
||||
+ unsigned int setup_len;
|
||||
|
||||
for (i = 0; i < 8; i++) {
|
||||
s->setup_buf[i] = p->parameter >> (i*8);
|
||||
}
|
||||
|
||||
s->setup_state = SETUP_STATE_PARAM;
|
||||
- s->setup_len = (s->setup_buf[7] << 8) | s->setup_buf[6];
|
||||
s->setup_index = 0;
|
||||
|
||||
request = (s->setup_buf[0] << 8) | s->setup_buf[1];
|
||||
value = (s->setup_buf[3] << 8) | s->setup_buf[2];
|
||||
index = (s->setup_buf[5] << 8) | s->setup_buf[4];
|
||||
|
||||
- if (s->setup_len > sizeof(s->data_buf)) {
|
||||
+ setup_len = (s->setup_buf[7] << 8) | s->setup_buf[6];
|
||||
+ if (setup_len > sizeof(s->data_buf)) {
|
||||
fprintf(stderr,
|
||||
"usb_generic_handle_packet: ctrl buffer too small (%d > %zu)\n",
|
||||
- s->setup_len, sizeof(s->data_buf));
|
||||
+ setup_len, sizeof(s->data_buf));
|
||||
p->status = USB_RET_STALL;
|
||||
return;
|
||||
}
|
||||
+ s->setup_len = setup_len;
|
||||
|
||||
if (p->pid == USB_TOKEN_OUT) {
|
||||
usb_packet_copy(p, s->data_buf, s->setup_len);
|
||||
--
|
||||
2.18.4
|
|
@ -1,283 +0,0 @@
|
|||
From: Roger Pau Monné <roger.pau@citrix.com>
|
||||
Subject: x86/vpt: fix race when migrating timers between vCPUs
|
||||
|
||||
The current vPT code will migrate the emulated timers between vCPUs
|
||||
(change the pt->vcpu field) while just holding the destination lock,
|
||||
either from create_periodic_time or pt_adjust_global_vcpu_target if
|
||||
the global target is adjusted. Changing the periodic_timer vCPU field
|
||||
in this way creates a race where a third party could grab the lock in
|
||||
the unlocked region of pt_adjust_global_vcpu_target (or before
|
||||
create_periodic_time performs the vcpu change) and then release the
|
||||
lock from a different vCPU, creating a locking imbalance.
|
||||
|
||||
Introduce a per-domain rwlock in order to protect periodic_time
|
||||
migration between vCPU lists. Taking the lock in read mode prevents
|
||||
any timer from being migrated to a different vCPU, while taking it in
|
||||
write mode allows performing migration of timers across vCPUs. The
|
||||
per-vcpu locks are still used to protect all the other fields from the
|
||||
periodic_timer struct.
|
||||
|
||||
Note that such migration shouldn't happen frequently, and hence
|
||||
there's no performance drop as a result of such locking.
|
||||
|
||||
This is XSA-336.
|
||||
|
||||
Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
|
||||
Tested-by: Igor Druzhinin <igor.druzhinin@citrix.com>
|
||||
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
---
|
||||
Changes since v2:
|
||||
- Re-order pt_adjust_vcpu to remove one if.
|
||||
- Fix pt_lock to not call pt_vcpu_lock, as we might end up using a
|
||||
stale value of pt->vcpu when taking the per-vcpu lock.
|
||||
|
||||
Changes since v1:
|
||||
- Use a per-domain rwlock to protect timer vCPU migration.
|
||||
|
||||
--- a/xen/arch/x86/hvm/hvm.c
|
||||
+++ b/xen/arch/x86/hvm/hvm.c
|
||||
@@ -658,6 +658,8 @@ int hvm_domain_initialise(struct domain
|
||||
/* need link to containing domain */
|
||||
d->arch.hvm.pl_time->domain = d;
|
||||
|
||||
+ rwlock_init(&d->arch.hvm.pl_time->pt_migrate);
|
||||
+
|
||||
/* Set the default IO Bitmap. */
|
||||
if ( is_hardware_domain(d) )
|
||||
{
|
||||
--- a/xen/arch/x86/hvm/vpt.c
|
||||
+++ b/xen/arch/x86/hvm/vpt.c
|
||||
@@ -153,23 +153,32 @@ static int pt_irq_masked(struct periodic
|
||||
return 1;
|
||||
}
|
||||
|
||||
-static void pt_lock(struct periodic_time *pt)
|
||||
+static void pt_vcpu_lock(struct vcpu *v)
|
||||
{
|
||||
- struct vcpu *v;
|
||||
+ read_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
|
||||
+ spin_lock(&v->arch.hvm.tm_lock);
|
||||
+}
|
||||
|
||||
- for ( ; ; )
|
||||
- {
|
||||
- v = pt->vcpu;
|
||||
- spin_lock(&v->arch.hvm.tm_lock);
|
||||
- if ( likely(pt->vcpu == v) )
|
||||
- break;
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
- }
|
||||
+static void pt_vcpu_unlock(struct vcpu *v)
|
||||
+{
|
||||
+ spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ read_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
|
||||
+}
|
||||
+
|
||||
+static void pt_lock(struct periodic_time *pt)
|
||||
+{
|
||||
+ /*
|
||||
+ * We cannot use pt_vcpu_lock here, because we need to acquire the
|
||||
+ * per-domain lock first and then (re-)fetch the value of pt->vcpu, or
|
||||
+ * else we might be using a stale value of pt->vcpu.
|
||||
+ */
|
||||
+ read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
|
||||
+ spin_lock(&pt->vcpu->arch.hvm.tm_lock);
|
||||
}
|
||||
|
||||
static void pt_unlock(struct periodic_time *pt)
|
||||
{
|
||||
- spin_unlock(&pt->vcpu->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_unlock(pt->vcpu);
|
||||
}
|
||||
|
||||
static void pt_process_missed_ticks(struct periodic_time *pt)
|
||||
@@ -219,7 +228,7 @@ void pt_save_timer(struct vcpu *v)
|
||||
if ( v->pause_flags & VPF_blocked )
|
||||
return;
|
||||
|
||||
- spin_lock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_lock(v);
|
||||
|
||||
list_for_each_entry ( pt, head, list )
|
||||
if ( !pt->do_not_freeze )
|
||||
@@ -227,7 +236,7 @@ void pt_save_timer(struct vcpu *v)
|
||||
|
||||
pt_freeze_time(v);
|
||||
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_unlock(v);
|
||||
}
|
||||
|
||||
void pt_restore_timer(struct vcpu *v)
|
||||
@@ -235,7 +244,7 @@ void pt_restore_timer(struct vcpu *v)
|
||||
struct list_head *head = &v->arch.hvm.tm_list;
|
||||
struct periodic_time *pt;
|
||||
|
||||
- spin_lock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_lock(v);
|
||||
|
||||
list_for_each_entry ( pt, head, list )
|
||||
{
|
||||
@@ -248,7 +257,7 @@ void pt_restore_timer(struct vcpu *v)
|
||||
|
||||
pt_thaw_time(v);
|
||||
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_unlock(v);
|
||||
}
|
||||
|
||||
static void pt_timer_fn(void *data)
|
||||
@@ -309,7 +318,7 @@ int pt_update_irq(struct vcpu *v)
|
||||
int irq, pt_vector = -1;
|
||||
bool level;
|
||||
|
||||
- spin_lock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_lock(v);
|
||||
|
||||
earliest_pt = NULL;
|
||||
max_lag = -1ULL;
|
||||
@@ -339,7 +348,7 @@ int pt_update_irq(struct vcpu *v)
|
||||
|
||||
if ( earliest_pt == NULL )
|
||||
{
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_unlock(v);
|
||||
return -1;
|
||||
}
|
||||
|
||||
@@ -347,7 +356,7 @@ int pt_update_irq(struct vcpu *v)
|
||||
irq = earliest_pt->irq;
|
||||
level = earliest_pt->level;
|
||||
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_unlock(v);
|
||||
|
||||
switch ( earliest_pt->source )
|
||||
{
|
||||
@@ -394,7 +403,7 @@ int pt_update_irq(struct vcpu *v)
|
||||
time_cb *cb = NULL;
|
||||
void *cb_priv;
|
||||
|
||||
- spin_lock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_lock(v);
|
||||
/* Make sure the timer is still on the list. */
|
||||
list_for_each_entry ( pt, &v->arch.hvm.tm_list, list )
|
||||
if ( pt == earliest_pt )
|
||||
@@ -404,7 +413,7 @@ int pt_update_irq(struct vcpu *v)
|
||||
cb_priv = pt->priv;
|
||||
break;
|
||||
}
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_unlock(v);
|
||||
|
||||
if ( cb != NULL )
|
||||
cb(v, cb_priv);
|
||||
@@ -441,12 +450,12 @@ void pt_intr_post(struct vcpu *v, struct
|
||||
if ( intack.source == hvm_intsrc_vector )
|
||||
return;
|
||||
|
||||
- spin_lock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_lock(v);
|
||||
|
||||
pt = is_pt_irq(v, intack);
|
||||
if ( pt == NULL )
|
||||
{
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_unlock(v);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -455,7 +464,7 @@ void pt_intr_post(struct vcpu *v, struct
|
||||
cb = pt->cb;
|
||||
cb_priv = pt->priv;
|
||||
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_unlock(v);
|
||||
|
||||
if ( cb != NULL )
|
||||
cb(v, cb_priv);
|
||||
@@ -466,12 +475,12 @@ void pt_migrate(struct vcpu *v)
|
||||
struct list_head *head = &v->arch.hvm.tm_list;
|
||||
struct periodic_time *pt;
|
||||
|
||||
- spin_lock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_lock(v);
|
||||
|
||||
list_for_each_entry ( pt, head, list )
|
||||
migrate_timer(&pt->timer, v->processor);
|
||||
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ pt_vcpu_unlock(v);
|
||||
}
|
||||
|
||||
void create_periodic_time(
|
||||
@@ -490,7 +499,7 @@ void create_periodic_time(
|
||||
|
||||
destroy_periodic_time(pt);
|
||||
|
||||
- spin_lock(&v->arch.hvm.tm_lock);
|
||||
+ write_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
|
||||
|
||||
pt->pending_intr_nr = 0;
|
||||
pt->do_not_freeze = 0;
|
||||
@@ -540,7 +549,7 @@ void create_periodic_time(
|
||||
init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
|
||||
set_timer(&pt->timer, pt->scheduled);
|
||||
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ write_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
|
||||
}
|
||||
|
||||
void destroy_periodic_time(struct periodic_time *pt)
|
||||
@@ -565,30 +574,20 @@ void destroy_periodic_time(struct period
|
||||
|
||||
static void pt_adjust_vcpu(struct periodic_time *pt, struct vcpu *v)
|
||||
{
|
||||
- int on_list;
|
||||
-
|
||||
ASSERT(pt->source == PTSRC_isa || pt->source == PTSRC_ioapic);
|
||||
|
||||
if ( pt->vcpu == NULL )
|
||||
return;
|
||||
|
||||
- pt_lock(pt);
|
||||
- on_list = pt->on_list;
|
||||
- if ( pt->on_list )
|
||||
- list_del(&pt->list);
|
||||
- pt->on_list = 0;
|
||||
- pt_unlock(pt);
|
||||
-
|
||||
- spin_lock(&v->arch.hvm.tm_lock);
|
||||
+ write_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
|
||||
pt->vcpu = v;
|
||||
- if ( on_list )
|
||||
+ if ( pt->on_list )
|
||||
{
|
||||
- pt->on_list = 1;
|
||||
+ list_del(&pt->list);
|
||||
list_add(&pt->list, &v->arch.hvm.tm_list);
|
||||
-
|
||||
migrate_timer(&pt->timer, v->processor);
|
||||
}
|
||||
- spin_unlock(&v->arch.hvm.tm_lock);
|
||||
+ write_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
|
||||
}
|
||||
|
||||
void pt_adjust_global_vcpu_target(struct vcpu *v)
|
||||
--- a/xen/include/asm-x86/hvm/vpt.h
|
||||
+++ b/xen/include/asm-x86/hvm/vpt.h
|
||||
@@ -128,6 +128,13 @@ struct pl_time { /* platform time */
|
||||
struct RTCState vrtc;
|
||||
struct HPETState vhpet;
|
||||
struct PMTState vpmt;
|
||||
+ /*
|
||||
+ * rwlock to prevent periodic_time vCPU migration. Take the lock in read
|
||||
+ * mode in order to prevent the vcpu field of periodic_time from changing.
|
||||
+ * Lock must be taken in write mode when changes to the vcpu field are
|
||||
+ * performed, as it allows exclusive access to all the timers of a domain.
|
||||
+ */
|
||||
+ rwlock_t pt_migrate;
|
||||
/* guest_time = Xen sys time + stime_offset */
|
||||
int64_t stime_offset;
|
||||
/* Ensures monotonicity in appropriate timer modes. */
|
|
@ -1,87 +0,0 @@
|
|||
From: Roger Pau Monné <roger.pau@citrix.com>
|
||||
Subject: x86/msi: get rid of read_msi_msg
|
||||
|
||||
It's safer and faster to just use the cached last written
|
||||
(untranslated) MSI message stored in msi_desc for the single user that
|
||||
calls read_msi_msg.
|
||||
|
||||
This also prevents relying on the data read from the device MSI
|
||||
registers in order to figure out the index into the IOMMU interrupt
|
||||
remapping table, which is not safe.
|
||||
|
||||
This is part of XSA-337.
|
||||
|
||||
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
--- a/xen/arch/x86/msi.c
|
||||
+++ b/xen/arch/x86/msi.c
|
||||
@@ -183,54 +183,6 @@ void msi_compose_msg(unsigned vector, co
|
||||
MSI_DATA_VECTOR(vector);
|
||||
}
|
||||
|
||||
-static bool read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
|
||||
-{
|
||||
- switch ( entry->msi_attrib.type )
|
||||
- {
|
||||
- case PCI_CAP_ID_MSI:
|
||||
- {
|
||||
- struct pci_dev *dev = entry->dev;
|
||||
- int pos = entry->msi_attrib.pos;
|
||||
- uint16_t data;
|
||||
-
|
||||
- msg->address_lo = pci_conf_read32(dev->sbdf,
|
||||
- msi_lower_address_reg(pos));
|
||||
- if ( entry->msi_attrib.is_64 )
|
||||
- {
|
||||
- msg->address_hi = pci_conf_read32(dev->sbdf,
|
||||
- msi_upper_address_reg(pos));
|
||||
- data = pci_conf_read16(dev->sbdf, msi_data_reg(pos, 1));
|
||||
- }
|
||||
- else
|
||||
- {
|
||||
- msg->address_hi = 0;
|
||||
- data = pci_conf_read16(dev->sbdf, msi_data_reg(pos, 0));
|
||||
- }
|
||||
- msg->data = data;
|
||||
- break;
|
||||
- }
|
||||
- case PCI_CAP_ID_MSIX:
|
||||
- {
|
||||
- void __iomem *base = entry->mask_base;
|
||||
-
|
||||
- if ( unlikely(!msix_memory_decoded(entry->dev,
|
||||
- entry->msi_attrib.pos)) )
|
||||
- return false;
|
||||
- msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
|
||||
- msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
|
||||
- msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
|
||||
- break;
|
||||
- }
|
||||
- default:
|
||||
- BUG();
|
||||
- }
|
||||
-
|
||||
- if ( iommu_intremap )
|
||||
- iommu_read_msi_from_ire(entry, msg);
|
||||
-
|
||||
- return true;
|
||||
-}
|
||||
-
|
||||
static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
|
||||
{
|
||||
entry->msg = *msg;
|
||||
@@ -302,10 +254,7 @@ void set_msi_affinity(struct irq_desc *d
|
||||
|
||||
ASSERT(spin_is_locked(&desc->lock));
|
||||
|
||||
- memset(&msg, 0, sizeof(msg));
|
||||
- if ( !read_msi_msg(msi_desc, &msg) )
|
||||
- return;
|
||||
-
|
||||
+ msg = msi_desc->msg;
|
||||
msg.data &= ~MSI_DATA_VECTOR_MASK;
|
||||
msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
|
||||
msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
|
|
@ -1,181 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: x86/MSI-X: restrict reading of table/PBA bases from BARs
|
||||
|
||||
When assigned to less trusted or un-trusted guests, devices may change
|
||||
state behind our backs (they may e.g. get reset by means we may not know
|
||||
about). Therefore we should avoid reading BARs from hardware once a
|
||||
device is no longer owned by Dom0. Furthermore when we can't read a BAR,
|
||||
or when we read zero, we shouldn't instead use the caller provided
|
||||
address unless that caller can be trusted.
|
||||
|
||||
Re-arrange the logic in msix_capability_init() such that only Dom0 (and
|
||||
only if the device isn't DomU-owned yet) or calls through
|
||||
PHYSDEVOP_prepare_msix will actually result in the reading of the
|
||||
respective BAR register(s). Additionally do so only as long as in-use
|
||||
table entries are known (note that invocation of PHYSDEVOP_prepare_msix
|
||||
counts as a "pseudo" entry). In all other uses the value already
|
||||
recorded will get used instead.
|
||||
|
||||
Clear the recorded values in _pci_cleanup_msix() as well as on the one
|
||||
affected error path. (Adjust this error path to also avoid blindly
|
||||
disabling MSI-X when it was enabled on entry to the function.)
|
||||
|
||||
While moving around variable declarations (in many cases to reduce their
|
||||
scopes), also adjust some of their types.
|
||||
|
||||
This is part of XSA-337.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
|
||||
|
||||
--- a/xen/arch/x86/msi.c
|
||||
+++ b/xen/arch/x86/msi.c
|
||||
@@ -769,16 +769,14 @@ static int msix_capability_init(struct p
|
||||
{
|
||||
struct arch_msix *msix = dev->msix;
|
||||
struct msi_desc *entry = NULL;
|
||||
- int vf;
|
||||
u16 control;
|
||||
u64 table_paddr;
|
||||
u32 table_offset;
|
||||
- u8 bir, pbus, pslot, pfunc;
|
||||
u16 seg = dev->seg;
|
||||
u8 bus = dev->bus;
|
||||
u8 slot = PCI_SLOT(dev->devfn);
|
||||
u8 func = PCI_FUNC(dev->devfn);
|
||||
- bool maskall = msix->host_maskall;
|
||||
+ bool maskall = msix->host_maskall, zap_on_error = false;
|
||||
unsigned int pos = pci_find_cap_offset(seg, bus, slot, func,
|
||||
PCI_CAP_ID_MSIX);
|
||||
|
||||
@@ -820,43 +818,45 @@ static int msix_capability_init(struct p
|
||||
|
||||
/* Locate MSI-X table region */
|
||||
table_offset = pci_conf_read32(dev->sbdf, msix_table_offset_reg(pos));
|
||||
- bir = (u8)(table_offset & PCI_MSIX_BIRMASK);
|
||||
- table_offset &= ~PCI_MSIX_BIRMASK;
|
||||
+ if ( !msix->used_entries &&
|
||||
+ (!msi ||
|
||||
+ (is_hardware_domain(current->domain) &&
|
||||
+ (dev->domain == current->domain || dev->domain == dom_io))) )
|
||||
+ {
|
||||
+ unsigned int bir = table_offset & PCI_MSIX_BIRMASK, pbus, pslot, pfunc;
|
||||
+ int vf;
|
||||
+ paddr_t pba_paddr;
|
||||
+ unsigned int pba_offset;
|
||||
|
||||
- if ( !dev->info.is_virtfn )
|
||||
- {
|
||||
- pbus = bus;
|
||||
- pslot = slot;
|
||||
- pfunc = func;
|
||||
- vf = -1;
|
||||
- }
|
||||
- else
|
||||
- {
|
||||
- pbus = dev->info.physfn.bus;
|
||||
- pslot = PCI_SLOT(dev->info.physfn.devfn);
|
||||
- pfunc = PCI_FUNC(dev->info.physfn.devfn);
|
||||
- vf = PCI_BDF2(dev->bus, dev->devfn);
|
||||
- }
|
||||
-
|
||||
- table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
|
||||
- WARN_ON(msi && msi->table_base != table_paddr);
|
||||
- if ( !table_paddr )
|
||||
- {
|
||||
- if ( !msi || !msi->table_base )
|
||||
+ if ( !dev->info.is_virtfn )
|
||||
{
|
||||
- pci_conf_write16(dev->sbdf, msix_control_reg(pos),
|
||||
- control & ~PCI_MSIX_FLAGS_ENABLE);
|
||||
- xfree(entry);
|
||||
- return -ENXIO;
|
||||
+ pbus = bus;
|
||||
+ pslot = slot;
|
||||
+ pfunc = func;
|
||||
+ vf = -1;
|
||||
+ }
|
||||
+ else
|
||||
+ {
|
||||
+ pbus = dev->info.physfn.bus;
|
||||
+ pslot = PCI_SLOT(dev->info.physfn.devfn);
|
||||
+ pfunc = PCI_FUNC(dev->info.physfn.devfn);
|
||||
+ vf = PCI_BDF2(dev->bus, dev->devfn);
|
||||
}
|
||||
- table_paddr = msi->table_base;
|
||||
- }
|
||||
- table_paddr += table_offset;
|
||||
|
||||
- if ( !msix->used_entries )
|
||||
- {
|
||||
- u64 pba_paddr;
|
||||
- u32 pba_offset;
|
||||
+ table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
|
||||
+ WARN_ON(msi && msi->table_base != table_paddr);
|
||||
+ if ( !table_paddr )
|
||||
+ {
|
||||
+ if ( !msi || !msi->table_base )
|
||||
+ {
|
||||
+ pci_conf_write16(dev->sbdf, msix_control_reg(pos),
|
||||
+ control & ~PCI_MSIX_FLAGS_ENABLE);
|
||||
+ xfree(entry);
|
||||
+ return -ENXIO;
|
||||
+ }
|
||||
+ table_paddr = msi->table_base;
|
||||
+ }
|
||||
+ table_paddr += table_offset & ~PCI_MSIX_BIRMASK;
|
||||
|
||||
msix->table.first = PFN_DOWN(table_paddr);
|
||||
msix->table.last = PFN_DOWN(table_paddr +
|
||||
@@ -875,7 +875,18 @@ static int msix_capability_init(struct p
|
||||
BITS_TO_LONGS(msix->nr_entries) - 1);
|
||||
WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, msix->pba.first,
|
||||
msix->pba.last));
|
||||
+
|
||||
+ zap_on_error = true;
|
||||
+ }
|
||||
+ else if ( !msix->table.first )
|
||||
+ {
|
||||
+ pci_conf_write16(dev->sbdf, msix_control_reg(pos), control);
|
||||
+ xfree(entry);
|
||||
+ return -ENODATA;
|
||||
}
|
||||
+ else
|
||||
+ table_paddr = (msix->table.first << PAGE_SHIFT) +
|
||||
+ (table_offset & ~PCI_MSIX_BIRMASK & ~PAGE_MASK);
|
||||
|
||||
if ( entry )
|
||||
{
|
||||
@@ -886,8 +897,15 @@ static int msix_capability_init(struct p
|
||||
|
||||
if ( idx < 0 )
|
||||
{
|
||||
- pci_conf_write16(dev->sbdf, msix_control_reg(pos),
|
||||
- control & ~PCI_MSIX_FLAGS_ENABLE);
|
||||
+ if ( zap_on_error )
|
||||
+ {
|
||||
+ msix->table.first = 0;
|
||||
+ msix->pba.first = 0;
|
||||
+
|
||||
+ control &= ~PCI_MSIX_FLAGS_ENABLE;
|
||||
+ }
|
||||
+
|
||||
+ pci_conf_write16(dev->sbdf, msix_control_reg(pos), control);
|
||||
xfree(entry);
|
||||
return idx;
|
||||
}
|
||||
@@ -1076,9 +1094,14 @@ static void _pci_cleanup_msix(struct arc
|
||||
if ( rangeset_remove_range(mmio_ro_ranges, msix->table.first,
|
||||
msix->table.last) )
|
||||
WARN();
|
||||
+ msix->table.first = 0;
|
||||
+ msix->table.last = 0;
|
||||
+
|
||||
if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
|
||||
msix->pba.last) )
|
||||
WARN();
|
||||
+ msix->pba.first = 0;
|
||||
+ msix->pba.last = 0;
|
||||
}
|
||||
}
|
||||
|
|
@ -1,42 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: evtchn: relax port_is_valid()
|
||||
|
||||
To avoid ports potentially becoming invalid behind the back of certain
|
||||
other functions (due to ->max_evtchn shrinking) because of
|
||||
- a guest invoking evtchn_reset() and from a 2nd vCPU opening new
|
||||
channels in parallel (see also XSA-343),
|
||||
- alloc_unbound_xen_event_channel() produced channels living above the
|
||||
2-level range (see also XSA-342),
|
||||
drop the max_evtchns check from port_is_valid(). For a port for which
|
||||
the function once returned "true", the returned value may not turn into
|
||||
"false" later on. The function's result may only depend on bounds which
|
||||
can only ever grow (which is the case for d->valid_evtchns).
|
||||
|
||||
This also eliminates a false sense of safety, utilized by some of the
|
||||
users (see again XSA-343): Without a suitable lock held, d->max_evtchns
|
||||
may change at any time, and hence deducing that certain other operations
|
||||
are safe when port_is_valid() returned true is not legitimate. The
|
||||
opportunities to abuse this may get widened by the change here
|
||||
(depending on guest and host configuration), but will be taken care of
|
||||
by the other XSA.
|
||||
|
||||
This is XSA-338.
|
||||
|
||||
Fixes: 48974e6ce52e ("evtchn: use a per-domain variable for the max number of event channels")
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
Reviewed-by: Julien Grall <jgrall@amazon.com>
|
||||
---
|
||||
v5: New, split from larger patch.
|
||||
|
||||
--- a/xen/include/xen/event.h
|
||||
+++ b/xen/include/xen/event.h
|
||||
@@ -107,8 +107,6 @@ void notify_via_xen_event_channel(struct
|
||||
|
||||
static inline bool_t port_is_valid(struct domain *d, unsigned int p)
|
||||
{
|
||||
- if ( p >= d->max_evtchns )
|
||||
- return 0;
|
||||
return p < read_atomic(&d->valid_evtchns);
|
||||
}
|
||||
|
|
@ -1,76 +0,0 @@
|
|||
From: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Subject: x86/pv: Avoid double exception injection
|
||||
|
||||
There is at least one path (SYSENTER with NT set, Xen converts to #GP) which
|
||||
ends up injecting the #GP fault twice, first in compat_sysenter(), and then a
|
||||
second time in compat_test_all_events(), due to the stale TBF_EXCEPTION left
|
||||
in TRAPBOUNCE_flags.
|
||||
|
||||
The guest kernel sees the second fault first, which is a kernel level #GP
|
||||
pointing at the head of the #GP handler, and is therefore a userspace
|
||||
trigger-able DoS.
|
||||
|
||||
This particular bug has bitten us several times before, so rearrange
|
||||
{compat_,}create_bounce_frame() to clobber TRAPBOUNCE on success, rather than
|
||||
leaving this task to one area of code which isn't used uniformly.
|
||||
|
||||
Other scenarios which might result in a double injection (e.g. two calls
|
||||
directly to compat_create_bounce_frame) will now crash the guest, which is far
|
||||
more obvious than letting the kernel run with corrupt state.
|
||||
|
||||
This is XSA-339
|
||||
|
||||
Fixes: fdac9515607b ("x86: clear EFLAGS.NT in SYSENTER entry path")
|
||||
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
|
||||
index c3e62f8734..73619f57ca 100644
|
||||
--- a/xen/arch/x86/x86_64/compat/entry.S
|
||||
+++ b/xen/arch/x86/x86_64/compat/entry.S
|
||||
@@ -78,7 +78,6 @@ compat_process_softirqs:
|
||||
sti
|
||||
.Lcompat_bounce_exception:
|
||||
call compat_create_bounce_frame
|
||||
- movb $0, TRAPBOUNCE_flags(%rdx)
|
||||
jmp compat_test_all_events
|
||||
|
||||
ALIGN
|
||||
@@ -352,7 +351,13 @@ __UNLIKELY_END(compat_bounce_null_selector)
|
||||
movl %eax,UREGS_cs+8(%rsp)
|
||||
movl TRAPBOUNCE_eip(%rdx),%eax
|
||||
movl %eax,UREGS_rip+8(%rsp)
|
||||
+
|
||||
+ /* Trapbounce complete. Clobber state to avoid an erroneous second injection. */
|
||||
+ xor %eax, %eax
|
||||
+ mov %ax, TRAPBOUNCE_cs(%rdx)
|
||||
+ mov %al, TRAPBOUNCE_flags(%rdx)
|
||||
ret
|
||||
+
|
||||
.section .fixup,"ax"
|
||||
.Lfx13:
|
||||
xorl %edi,%edi
|
||||
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
|
||||
index 1e880eb9f6..71a00e846b 100644
|
||||
--- a/xen/arch/x86/x86_64/entry.S
|
||||
+++ b/xen/arch/x86/x86_64/entry.S
|
||||
@@ -90,7 +90,6 @@ process_softirqs:
|
||||
sti
|
||||
.Lbounce_exception:
|
||||
call create_bounce_frame
|
||||
- movb $0, TRAPBOUNCE_flags(%rdx)
|
||||
jmp test_all_events
|
||||
|
||||
ALIGN
|
||||
@@ -512,6 +511,11 @@ UNLIKELY_START(z, create_bounce_frame_bad_bounce_ip)
|
||||
jmp asm_domain_crash_synchronous /* Does not return */
|
||||
__UNLIKELY_END(create_bounce_frame_bad_bounce_ip)
|
||||
movq %rax,UREGS_rip+8(%rsp)
|
||||
+
|
||||
+ /* Trapbounce complete. Clobber state to avoid an erroneous second injection. */
|
||||
+ xor %eax, %eax
|
||||
+ mov %rax, TRAPBOUNCE_eip(%rdx)
|
||||
+ mov %al, TRAPBOUNCE_flags(%rdx)
|
||||
ret
|
||||
|
||||
.pushsection .fixup, "ax", @progbits
|
|
@ -1,65 +0,0 @@
|
|||
From: Julien Grall <jgrall@amazon.com>
|
||||
Subject: xen/evtchn: Add missing barriers when accessing/allocating an event channel
|
||||
|
||||
While the allocation of a bucket is always performed with the per-domain
|
||||
lock, the bucket may be accessed without the lock taken (for instance, see
|
||||
evtchn_send()).
|
||||
|
||||
Instead such sites relies on port_is_valid() to return a non-zero value
|
||||
when the port has a struct evtchn associated to it. The function will
|
||||
mostly check whether the port is less than d->valid_evtchns as all the
|
||||
buckets/event channels should be allocated up to that point.
|
||||
|
||||
Unfortunately a compiler is free to re-order the assignment in
|
||||
evtchn_allocate_port() so it would be possible to have d->valid_evtchns
|
||||
updated before the new bucket has finish to allocate.
|
||||
|
||||
Additionally on Arm, even if this was compiled "correctly", the
|
||||
processor can still re-order the memory access.
|
||||
|
||||
Add a write memory barrier in the allocation side and a read memory
|
||||
barrier when the port is valid to prevent any re-ordering issue.
|
||||
|
||||
This is XSA-340.
|
||||
|
||||
Reported-by: Julien Grall <jgrall@amazon.com>
|
||||
Signed-off-by: Julien Grall <jgrall@amazon.com>
|
||||
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
|
||||
--- a/xen/common/event_channel.c
|
||||
+++ b/xen/common/event_channel.c
|
||||
@@ -178,6 +178,13 @@ int evtchn_allocate_port(struct domain *
|
||||
return -ENOMEM;
|
||||
bucket_from_port(d, port) = chn;
|
||||
|
||||
+ /*
|
||||
+ * d->valid_evtchns is used to check whether the bucket can be
|
||||
+ * accessed without the per-domain lock. Therefore,
|
||||
+ * d->valid_evtchns should be seen *after* the new bucket has
|
||||
+ * been setup.
|
||||
+ */
|
||||
+ smp_wmb();
|
||||
write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
|
||||
}
|
||||
|
||||
--- a/xen/include/xen/event.h
|
||||
+++ b/xen/include/xen/event.h
|
||||
@@ -107,7 +107,17 @@ void notify_via_xen_event_channel(struct
|
||||
|
||||
static inline bool_t port_is_valid(struct domain *d, unsigned int p)
|
||||
{
|
||||
- return p < read_atomic(&d->valid_evtchns);
|
||||
+ if ( p >= read_atomic(&d->valid_evtchns) )
|
||||
+ return false;
|
||||
+
|
||||
+ /*
|
||||
+ * The caller will usually access the event channel afterwards and
|
||||
+ * may be done without taking the per-domain lock. The barrier is
|
||||
+ * going in pair the smp_wmb() barrier in evtchn_allocate_port().
|
||||
+ */
|
||||
+ smp_rmb();
|
||||
+
|
||||
+ return true;
|
||||
}
|
||||
|
||||
static inline struct evtchn *evtchn_from_port(struct domain *d, unsigned int p)
|
|
@ -1,145 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: evtchn/x86: enforce correct upper limit for 32-bit guests
|
||||
|
||||
The recording of d->max_evtchns in evtchn_2l_init(), in particular with
|
||||
the limited set of callers of the function, is insufficient. Neither for
|
||||
PV nor for HVM guests the bitness is known at domain_create() time, yet
|
||||
the upper bound in 2-level mode depends upon guest bitness. Recording
|
||||
too high a limit "allows" x86 32-bit domains to open not properly usable
|
||||
event channels, management of which (inside Xen) would then result in
|
||||
corruption of the shared info and vCPU info structures.
|
||||
|
||||
Keep the upper limit dynamic for the 2-level case, introducing a helper
|
||||
function to retrieve the effective limit. This helper is now supposed to
|
||||
be private to the event channel code. The used in do_poll() and
|
||||
domain_dump_evtchn_info() weren't consistent with port uses elsewhere
|
||||
and hence get switched to port_is_valid().
|
||||
|
||||
Furthermore FIFO mode's setup_ports() gets adjusted to loop only up to
|
||||
the prior ABI limit, rather than all the way up to the new one.
|
||||
|
||||
Finally a word on the change to do_poll(): Accessing ->max_evtchns
|
||||
without holding a suitable lock was never safe, as it as well as
|
||||
->evtchn_port_ops may change behind do_poll()'s back. Using
|
||||
port_is_valid() instead widens some the window for potential abuse,
|
||||
until we've dealt with the race altogether (see XSA-343).
|
||||
|
||||
This is XSA-342.
|
||||
|
||||
Reported-by: Julien Grall <jgrall@amazon.com>
|
||||
Fixes: 48974e6ce52e ("evtchn: use a per-domain variable for the max number of event channels")
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
Reviewed-by: Julien Grall <jgrall@amazon.com>
|
||||
|
||||
--- a/xen/common/event_2l.c
|
||||
+++ b/xen/common/event_2l.c
|
||||
@@ -103,7 +103,6 @@ static const struct evtchn_port_ops evtc
|
||||
void evtchn_2l_init(struct domain *d)
|
||||
{
|
||||
d->evtchn_port_ops = &evtchn_port_ops_2l;
|
||||
- d->max_evtchns = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
|
||||
}
|
||||
|
||||
/*
|
||||
--- a/xen/common/event_channel.c
|
||||
+++ b/xen/common/event_channel.c
|
||||
@@ -151,7 +151,7 @@ static void free_evtchn_bucket(struct do
|
||||
|
||||
int evtchn_allocate_port(struct domain *d, evtchn_port_t port)
|
||||
{
|
||||
- if ( port > d->max_evtchn_port || port >= d->max_evtchns )
|
||||
+ if ( port > d->max_evtchn_port || port >= max_evtchns(d) )
|
||||
return -ENOSPC;
|
||||
|
||||
if ( port_is_valid(d, port) )
|
||||
@@ -1396,13 +1396,11 @@ static void domain_dump_evtchn_info(stru
|
||||
|
||||
spin_lock(&d->event_lock);
|
||||
|
||||
- for ( port = 1; port < d->max_evtchns; ++port )
|
||||
+ for ( port = 1; port_is_valid(d, port); ++port )
|
||||
{
|
||||
const struct evtchn *chn;
|
||||
char *ssid;
|
||||
|
||||
- if ( !port_is_valid(d, port) )
|
||||
- continue;
|
||||
chn = evtchn_from_port(d, port);
|
||||
if ( chn->state == ECS_FREE )
|
||||
continue;
|
||||
--- a/xen/common/event_fifo.c
|
||||
+++ b/xen/common/event_fifo.c
|
||||
@@ -478,7 +478,7 @@ static void cleanup_event_array(struct d
|
||||
d->evtchn_fifo = NULL;
|
||||
}
|
||||
|
||||
-static void setup_ports(struct domain *d)
|
||||
+static void setup_ports(struct domain *d, unsigned int prev_evtchns)
|
||||
{
|
||||
unsigned int port;
|
||||
|
||||
@@ -488,7 +488,7 @@ static void setup_ports(struct domain *d
|
||||
* - save its pending state.
|
||||
* - set default priority.
|
||||
*/
|
||||
- for ( port = 1; port < d->max_evtchns; port++ )
|
||||
+ for ( port = 1; port < prev_evtchns; port++ )
|
||||
{
|
||||
struct evtchn *evtchn;
|
||||
|
||||
@@ -546,6 +546,8 @@ int evtchn_fifo_init_control(struct evtc
|
||||
if ( !d->evtchn_fifo )
|
||||
{
|
||||
struct vcpu *vcb;
|
||||
+ /* Latch the value before it changes during setup_event_array(). */
|
||||
+ unsigned int prev_evtchns = max_evtchns(d);
|
||||
|
||||
for_each_vcpu ( d, vcb ) {
|
||||
rc = setup_control_block(vcb);
|
||||
@@ -562,8 +564,7 @@ int evtchn_fifo_init_control(struct evtc
|
||||
goto error;
|
||||
|
||||
d->evtchn_port_ops = &evtchn_port_ops_fifo;
|
||||
- d->max_evtchns = EVTCHN_FIFO_NR_CHANNELS;
|
||||
- setup_ports(d);
|
||||
+ setup_ports(d, prev_evtchns);
|
||||
}
|
||||
else
|
||||
rc = map_control_block(v, gfn, offset);
|
||||
--- a/xen/common/schedule.c
|
||||
+++ b/xen/common/schedule.c
|
||||
@@ -1434,7 +1434,7 @@ static long do_poll(struct sched_poll *s
|
||||
goto out;
|
||||
|
||||
rc = -EINVAL;
|
||||
- if ( port >= d->max_evtchns )
|
||||
+ if ( !port_is_valid(d, port) )
|
||||
goto out;
|
||||
|
||||
rc = 0;
|
||||
--- a/xen/include/xen/event.h
|
||||
+++ b/xen/include/xen/event.h
|
||||
@@ -105,6 +105,12 @@ void notify_via_xen_event_channel(struct
|
||||
#define bucket_from_port(d, p) \
|
||||
((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
|
||||
|
||||
+static inline unsigned int max_evtchns(const struct domain *d)
|
||||
+{
|
||||
+ return d->evtchn_fifo ? EVTCHN_FIFO_NR_CHANNELS
|
||||
+ : BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
|
||||
+}
|
||||
+
|
||||
static inline bool_t port_is_valid(struct domain *d, unsigned int p)
|
||||
{
|
||||
if ( p >= read_atomic(&d->valid_evtchns) )
|
||||
--- a/xen/include/xen/sched.h
|
||||
+++ b/xen/include/xen/sched.h
|
||||
@@ -382,7 +382,6 @@ struct domain
|
||||
/* Event channel information. */
|
||||
struct evtchn *evtchn; /* first bucket only */
|
||||
struct evtchn **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */
|
||||
- unsigned int max_evtchns; /* number supported by ABI */
|
||||
unsigned int max_evtchn_port; /* max permitted port number */
|
||||
unsigned int valid_evtchns; /* number of allocated event channels */
|
||||
spinlock_t event_lock;
|
|
@ -1,199 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: evtchn: evtchn_reset() shouldn't succeed with still-open ports
|
||||
|
||||
While the function closes all ports, it does so without holding any
|
||||
lock, and hence racing requests may be issued causing new ports to get
|
||||
opened. This would have been problematic in particular if such a newly
|
||||
opened port had a port number above the new implementation limit (i.e.
|
||||
when switching from FIFO to 2-level) after the reset, as prior to
|
||||
"evtchn: relax port_is_valid()" this could have led to e.g.
|
||||
evtchn_close()'s "BUG_ON(!port_is_valid(d2, port2))" to trigger.
|
||||
|
||||
Introduce a counter of active ports and check that it's (still) no
|
||||
larger then the number of Xen internally used ones after obtaining the
|
||||
necessary lock in evtchn_reset().
|
||||
|
||||
As to the access model of the new {active,xen}_evtchns fields - while
|
||||
all writes get done using write_atomic(), reads ought to use
|
||||
read_atomic() only when outside of a suitably locked region.
|
||||
|
||||
Note that as of now evtchn_bind_virq() and evtchn_bind_ipi() don't have
|
||||
a need to call check_free_port().
|
||||
|
||||
This is part of XSA-343.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
Reviewed-by: Julien Grall <jgrall@amazon.com>
|
||||
---
|
||||
v7: Drop optimization from evtchn_reset().
|
||||
v6: Fix loop exit condition in evtchn_reset(). Use {read,write}_atomic()
|
||||
also for xen_evtchns.
|
||||
v5: Move increment in alloc_unbound_xen_event_channel() out of the inner
|
||||
locked region.
|
||||
v4: Account for Xen internal ports.
|
||||
v3: Document intended access next to new struct field.
|
||||
v2: Add comment to check_free_port(). Drop commented out calls.
|
||||
|
||||
--- a/xen/common/event_channel.c
|
||||
+++ b/xen/common/event_channel.c
|
||||
@@ -188,6 +188,8 @@ int evtchn_allocate_port(struct domain *
|
||||
write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
|
||||
}
|
||||
|
||||
+ write_atomic(&d->active_evtchns, d->active_evtchns + 1);
|
||||
+
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -211,11 +213,26 @@ static int get_free_port(struct domain *
|
||||
return -ENOSPC;
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * Check whether a port is still marked free, and if so update the domain
|
||||
+ * counter accordingly. To be used on function exit paths.
|
||||
+ */
|
||||
+static void check_free_port(struct domain *d, evtchn_port_t port)
|
||||
+{
|
||||
+ if ( port_is_valid(d, port) &&
|
||||
+ evtchn_from_port(d, port)->state == ECS_FREE )
|
||||
+ write_atomic(&d->active_evtchns, d->active_evtchns - 1);
|
||||
+}
|
||||
+
|
||||
void evtchn_free(struct domain *d, struct evtchn *chn)
|
||||
{
|
||||
/* Clear pending event to avoid unexpected behavior on re-bind. */
|
||||
evtchn_port_clear_pending(d, chn);
|
||||
|
||||
+ if ( consumer_is_xen(chn) )
|
||||
+ write_atomic(&d->xen_evtchns, d->xen_evtchns - 1);
|
||||
+ write_atomic(&d->active_evtchns, d->active_evtchns - 1);
|
||||
+
|
||||
/* Reset binding to vcpu0 when the channel is freed. */
|
||||
chn->state = ECS_FREE;
|
||||
chn->notify_vcpu_id = 0;
|
||||
@@ -258,6 +275,7 @@ static long evtchn_alloc_unbound(evtchn_
|
||||
alloc->port = port;
|
||||
|
||||
out:
|
||||
+ check_free_port(d, port);
|
||||
spin_unlock(&d->event_lock);
|
||||
rcu_unlock_domain(d);
|
||||
|
||||
@@ -351,6 +369,7 @@ static long evtchn_bind_interdomain(evtc
|
||||
bind->local_port = lport;
|
||||
|
||||
out:
|
||||
+ check_free_port(ld, lport);
|
||||
spin_unlock(&ld->event_lock);
|
||||
if ( ld != rd )
|
||||
spin_unlock(&rd->event_lock);
|
||||
@@ -488,7 +507,7 @@ static long evtchn_bind_pirq(evtchn_bind
|
||||
struct domain *d = current->domain;
|
||||
struct vcpu *v = d->vcpu[0];
|
||||
struct pirq *info;
|
||||
- int port, pirq = bind->pirq;
|
||||
+ int port = 0, pirq = bind->pirq;
|
||||
long rc;
|
||||
|
||||
if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
|
||||
@@ -536,6 +555,7 @@ static long evtchn_bind_pirq(evtchn_bind
|
||||
arch_evtchn_bind_pirq(d, pirq);
|
||||
|
||||
out:
|
||||
+ check_free_port(d, port);
|
||||
spin_unlock(&d->event_lock);
|
||||
|
||||
return rc;
|
||||
@@ -1011,10 +1031,10 @@ int evtchn_unmask(unsigned int port)
|
||||
return 0;
|
||||
}
|
||||
|
||||
-
|
||||
int evtchn_reset(struct domain *d)
|
||||
{
|
||||
unsigned int i;
|
||||
+ int rc = 0;
|
||||
|
||||
if ( d != current->domain && !d->controller_pause_count )
|
||||
return -EINVAL;
|
||||
@@ -1024,7 +1044,9 @@ int evtchn_reset(struct domain *d)
|
||||
|
||||
spin_lock(&d->event_lock);
|
||||
|
||||
- if ( d->evtchn_fifo )
|
||||
+ if ( d->active_evtchns > d->xen_evtchns )
|
||||
+ rc = -EAGAIN;
|
||||
+ else if ( d->evtchn_fifo )
|
||||
{
|
||||
/* Switching back to 2-level ABI. */
|
||||
evtchn_fifo_destroy(d);
|
||||
@@ -1033,7 +1055,7 @@ int evtchn_reset(struct domain *d)
|
||||
|
||||
spin_unlock(&d->event_lock);
|
||||
|
||||
- return 0;
|
||||
+ return rc;
|
||||
}
|
||||
|
||||
static long evtchn_set_priority(const struct evtchn_set_priority *set_priority)
|
||||
@@ -1219,10 +1241,9 @@ int alloc_unbound_xen_event_channel(
|
||||
|
||||
spin_lock(&ld->event_lock);
|
||||
|
||||
- rc = get_free_port(ld);
|
||||
+ port = rc = get_free_port(ld);
|
||||
if ( rc < 0 )
|
||||
goto out;
|
||||
- port = rc;
|
||||
chn = evtchn_from_port(ld, port);
|
||||
|
||||
rc = xsm_evtchn_unbound(XSM_TARGET, ld, chn, remote_domid);
|
||||
@@ -1238,7 +1259,10 @@ int alloc_unbound_xen_event_channel(
|
||||
|
||||
spin_unlock(&chn->lock);
|
||||
|
||||
+ write_atomic(&ld->xen_evtchns, ld->xen_evtchns + 1);
|
||||
+
|
||||
out:
|
||||
+ check_free_port(ld, port);
|
||||
spin_unlock(&ld->event_lock);
|
||||
|
||||
return rc < 0 ? rc : port;
|
||||
@@ -1314,6 +1338,7 @@ int evtchn_init(struct domain *d, unsign
|
||||
return -EINVAL;
|
||||
}
|
||||
evtchn_from_port(d, 0)->state = ECS_RESERVED;
|
||||
+ write_atomic(&d->active_evtchns, 0);
|
||||
|
||||
#if MAX_VIRT_CPUS > BITS_PER_LONG
|
||||
d->poll_mask = xzalloc_array(unsigned long, BITS_TO_LONGS(d->max_vcpus));
|
||||
@@ -1340,6 +1365,8 @@ void evtchn_destroy(struct domain *d)
|
||||
for ( i = 0; port_is_valid(d, i); i++ )
|
||||
evtchn_close(d, i, 0);
|
||||
|
||||
+ ASSERT(!d->active_evtchns);
|
||||
+
|
||||
clear_global_virq_handlers(d);
|
||||
|
||||
evtchn_fifo_destroy(d);
|
||||
--- a/xen/include/xen/sched.h
|
||||
+++ b/xen/include/xen/sched.h
|
||||
@@ -361,6 +361,16 @@ struct domain
|
||||
struct evtchn **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */
|
||||
unsigned int max_evtchn_port; /* max permitted port number */
|
||||
unsigned int valid_evtchns; /* number of allocated event channels */
|
||||
+ /*
|
||||
+ * Number of in-use event channels. Writers should use write_atomic().
|
||||
+ * Readers need to use read_atomic() only when not holding event_lock.
|
||||
+ */
|
||||
+ unsigned int active_evtchns;
|
||||
+ /*
|
||||
+ * Number of event channels used internally by Xen (not subject to
|
||||
+ * EVTCHNOP_reset). Read/write access like for active_evtchns.
|
||||
+ */
|
||||
+ unsigned int xen_evtchns;
|
||||
spinlock_t event_lock;
|
||||
const struct evtchn_port_ops *evtchn_port_ops;
|
||||
struct evtchn_fifo_domain *evtchn_fifo;
|
|
@ -1,295 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: evtchn: convert per-channel lock to be IRQ-safe
|
||||
|
||||
... in order for send_guest_{global,vcpu}_virq() to be able to make use
|
||||
of it.
|
||||
|
||||
This is part of XSA-343.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Acked-by: Julien Grall <jgrall@amazon.com>
|
||||
---
|
||||
v6: New.
|
||||
---
|
||||
TBD: This is the "dumb" conversion variant. In a couple of cases the
|
||||
slightly simpler spin_{,un}lock_irq() could apparently be used.
|
||||
|
||||
--- a/xen/common/event_channel.c
|
||||
+++ b/xen/common/event_channel.c
|
||||
@@ -248,6 +248,7 @@ static long evtchn_alloc_unbound(evtchn_
|
||||
int port;
|
||||
domid_t dom = alloc->dom;
|
||||
long rc;
|
||||
+ unsigned long flags;
|
||||
|
||||
d = rcu_lock_domain_by_any_id(dom);
|
||||
if ( d == NULL )
|
||||
@@ -263,14 +264,14 @@ static long evtchn_alloc_unbound(evtchn_
|
||||
if ( rc )
|
||||
goto out;
|
||||
|
||||
- spin_lock(&chn->lock);
|
||||
+ spin_lock_irqsave(&chn->lock, flags);
|
||||
|
||||
chn->state = ECS_UNBOUND;
|
||||
if ( (chn->u.unbound.remote_domid = alloc->remote_dom) == DOMID_SELF )
|
||||
chn->u.unbound.remote_domid = current->domain->domain_id;
|
||||
evtchn_port_init(d, chn);
|
||||
|
||||
- spin_unlock(&chn->lock);
|
||||
+ spin_unlock_irqrestore(&chn->lock, flags);
|
||||
|
||||
alloc->port = port;
|
||||
|
||||
@@ -283,26 +284,32 @@ static long evtchn_alloc_unbound(evtchn_
|
||||
}
|
||||
|
||||
|
||||
-static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
|
||||
+static unsigned long double_evtchn_lock(struct evtchn *lchn,
|
||||
+ struct evtchn *rchn)
|
||||
{
|
||||
- if ( lchn < rchn )
|
||||
+ unsigned long flags;
|
||||
+
|
||||
+ if ( lchn <= rchn )
|
||||
{
|
||||
- spin_lock(&lchn->lock);
|
||||
- spin_lock(&rchn->lock);
|
||||
+ spin_lock_irqsave(&lchn->lock, flags);
|
||||
+ if ( lchn != rchn )
|
||||
+ spin_lock(&rchn->lock);
|
||||
}
|
||||
else
|
||||
{
|
||||
- if ( lchn != rchn )
|
||||
- spin_lock(&rchn->lock);
|
||||
+ spin_lock_irqsave(&rchn->lock, flags);
|
||||
spin_lock(&lchn->lock);
|
||||
}
|
||||
+
|
||||
+ return flags;
|
||||
}
|
||||
|
||||
-static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn)
|
||||
+static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn,
|
||||
+ unsigned long flags)
|
||||
{
|
||||
- spin_unlock(&lchn->lock);
|
||||
if ( lchn != rchn )
|
||||
- spin_unlock(&rchn->lock);
|
||||
+ spin_unlock(&lchn->lock);
|
||||
+ spin_unlock_irqrestore(&rchn->lock, flags);
|
||||
}
|
||||
|
||||
static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
|
||||
@@ -312,6 +319,7 @@ static long evtchn_bind_interdomain(evtc
|
||||
int lport, rport = bind->remote_port;
|
||||
domid_t rdom = bind->remote_dom;
|
||||
long rc;
|
||||
+ unsigned long flags;
|
||||
|
||||
if ( rdom == DOMID_SELF )
|
||||
rdom = current->domain->domain_id;
|
||||
@@ -347,7 +355,7 @@ static long evtchn_bind_interdomain(evtc
|
||||
if ( rc )
|
||||
goto out;
|
||||
|
||||
- double_evtchn_lock(lchn, rchn);
|
||||
+ flags = double_evtchn_lock(lchn, rchn);
|
||||
|
||||
lchn->u.interdomain.remote_dom = rd;
|
||||
lchn->u.interdomain.remote_port = rport;
|
||||
@@ -364,7 +372,7 @@ static long evtchn_bind_interdomain(evtc
|
||||
*/
|
||||
evtchn_port_set_pending(ld, lchn->notify_vcpu_id, lchn);
|
||||
|
||||
- double_evtchn_unlock(lchn, rchn);
|
||||
+ double_evtchn_unlock(lchn, rchn, flags);
|
||||
|
||||
bind->local_port = lport;
|
||||
|
||||
@@ -387,6 +395,7 @@ int evtchn_bind_virq(evtchn_bind_virq_t
|
||||
struct domain *d = current->domain;
|
||||
int virq = bind->virq, vcpu = bind->vcpu;
|
||||
int rc = 0;
|
||||
+ unsigned long flags;
|
||||
|
||||
if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
|
||||
return -EINVAL;
|
||||
@@ -424,14 +433,14 @@ int evtchn_bind_virq(evtchn_bind_virq_t
|
||||
|
||||
chn = evtchn_from_port(d, port);
|
||||
|
||||
- spin_lock(&chn->lock);
|
||||
+ spin_lock_irqsave(&chn->lock, flags);
|
||||
|
||||
chn->state = ECS_VIRQ;
|
||||
chn->notify_vcpu_id = vcpu;
|
||||
chn->u.virq = virq;
|
||||
evtchn_port_init(d, chn);
|
||||
|
||||
- spin_unlock(&chn->lock);
|
||||
+ spin_unlock_irqrestore(&chn->lock, flags);
|
||||
|
||||
v->virq_to_evtchn[virq] = bind->port = port;
|
||||
|
||||
@@ -448,6 +457,7 @@ static long evtchn_bind_ipi(evtchn_bind_
|
||||
struct domain *d = current->domain;
|
||||
int port, vcpu = bind->vcpu;
|
||||
long rc = 0;
|
||||
+ unsigned long flags;
|
||||
|
||||
if ( domain_vcpu(d, vcpu) == NULL )
|
||||
return -ENOENT;
|
||||
@@ -459,13 +469,13 @@ static long evtchn_bind_ipi(evtchn_bind_
|
||||
|
||||
chn = evtchn_from_port(d, port);
|
||||
|
||||
- spin_lock(&chn->lock);
|
||||
+ spin_lock_irqsave(&chn->lock, flags);
|
||||
|
||||
chn->state = ECS_IPI;
|
||||
chn->notify_vcpu_id = vcpu;
|
||||
evtchn_port_init(d, chn);
|
||||
|
||||
- spin_unlock(&chn->lock);
|
||||
+ spin_unlock_irqrestore(&chn->lock, flags);
|
||||
|
||||
bind->port = port;
|
||||
|
||||
@@ -509,6 +519,7 @@ static long evtchn_bind_pirq(evtchn_bind
|
||||
struct pirq *info;
|
||||
int port = 0, pirq = bind->pirq;
|
||||
long rc;
|
||||
+ unsigned long flags;
|
||||
|
||||
if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
|
||||
return -EINVAL;
|
||||
@@ -541,14 +552,14 @@ static long evtchn_bind_pirq(evtchn_bind
|
||||
goto out;
|
||||
}
|
||||
|
||||
- spin_lock(&chn->lock);
|
||||
+ spin_lock_irqsave(&chn->lock, flags);
|
||||
|
||||
chn->state = ECS_PIRQ;
|
||||
chn->u.pirq.irq = pirq;
|
||||
link_pirq_port(port, chn, v);
|
||||
evtchn_port_init(d, chn);
|
||||
|
||||
- spin_unlock(&chn->lock);
|
||||
+ spin_unlock_irqrestore(&chn->lock, flags);
|
||||
|
||||
bind->port = port;
|
||||
|
||||
@@ -569,6 +580,7 @@ int evtchn_close(struct domain *d1, int
|
||||
struct evtchn *chn1, *chn2;
|
||||
int port2;
|
||||
long rc = 0;
|
||||
+ unsigned long flags;
|
||||
|
||||
again:
|
||||
spin_lock(&d1->event_lock);
|
||||
@@ -668,14 +680,14 @@ int evtchn_close(struct domain *d1, int
|
||||
BUG_ON(chn2->state != ECS_INTERDOMAIN);
|
||||
BUG_ON(chn2->u.interdomain.remote_dom != d1);
|
||||
|
||||
- double_evtchn_lock(chn1, chn2);
|
||||
+ flags = double_evtchn_lock(chn1, chn2);
|
||||
|
||||
evtchn_free(d1, chn1);
|
||||
|
||||
chn2->state = ECS_UNBOUND;
|
||||
chn2->u.unbound.remote_domid = d1->domain_id;
|
||||
|
||||
- double_evtchn_unlock(chn1, chn2);
|
||||
+ double_evtchn_unlock(chn1, chn2, flags);
|
||||
|
||||
goto out;
|
||||
|
||||
@@ -683,9 +695,9 @@ int evtchn_close(struct domain *d1, int
|
||||
BUG();
|
||||
}
|
||||
|
||||
- spin_lock(&chn1->lock);
|
||||
+ spin_lock_irqsave(&chn1->lock, flags);
|
||||
evtchn_free(d1, chn1);
|
||||
- spin_unlock(&chn1->lock);
|
||||
+ spin_unlock_irqrestore(&chn1->lock, flags);
|
||||
|
||||
out:
|
||||
if ( d2 != NULL )
|
||||
@@ -705,13 +717,14 @@ int evtchn_send(struct domain *ld, unsig
|
||||
struct evtchn *lchn, *rchn;
|
||||
struct domain *rd;
|
||||
int rport, ret = 0;
|
||||
+ unsigned long flags;
|
||||
|
||||
if ( !port_is_valid(ld, lport) )
|
||||
return -EINVAL;
|
||||
|
||||
lchn = evtchn_from_port(ld, lport);
|
||||
|
||||
- spin_lock(&lchn->lock);
|
||||
+ spin_lock_irqsave(&lchn->lock, flags);
|
||||
|
||||
/* Guest cannot send via a Xen-attached event channel. */
|
||||
if ( unlikely(consumer_is_xen(lchn)) )
|
||||
@@ -746,7 +759,7 @@ int evtchn_send(struct domain *ld, unsig
|
||||
}
|
||||
|
||||
out:
|
||||
- spin_unlock(&lchn->lock);
|
||||
+ spin_unlock_irqrestore(&lchn->lock, flags);
|
||||
|
||||
return ret;
|
||||
}
|
||||
@@ -1238,6 +1251,7 @@ int alloc_unbound_xen_event_channel(
|
||||
{
|
||||
struct evtchn *chn;
|
||||
int port, rc;
|
||||
+ unsigned long flags;
|
||||
|
||||
spin_lock(&ld->event_lock);
|
||||
|
||||
@@ -1250,14 +1264,14 @@ int alloc_unbound_xen_event_channel(
|
||||
if ( rc )
|
||||
goto out;
|
||||
|
||||
- spin_lock(&chn->lock);
|
||||
+ spin_lock_irqsave(&chn->lock, flags);
|
||||
|
||||
chn->state = ECS_UNBOUND;
|
||||
chn->xen_consumer = get_xen_consumer(notification_fn);
|
||||
chn->notify_vcpu_id = lvcpu;
|
||||
chn->u.unbound.remote_domid = remote_domid;
|
||||
|
||||
- spin_unlock(&chn->lock);
|
||||
+ spin_unlock_irqrestore(&chn->lock, flags);
|
||||
|
||||
write_atomic(&ld->xen_evtchns, ld->xen_evtchns + 1);
|
||||
|
||||
@@ -1280,11 +1294,12 @@ void notify_via_xen_event_channel(struct
|
||||
{
|
||||
struct evtchn *lchn, *rchn;
|
||||
struct domain *rd;
|
||||
+ unsigned long flags;
|
||||
|
||||
ASSERT(port_is_valid(ld, lport));
|
||||
lchn = evtchn_from_port(ld, lport);
|
||||
|
||||
- spin_lock(&lchn->lock);
|
||||
+ spin_lock_irqsave(&lchn->lock, flags);
|
||||
|
||||
if ( likely(lchn->state == ECS_INTERDOMAIN) )
|
||||
{
|
||||
@@ -1294,7 +1309,7 @@ void notify_via_xen_event_channel(struct
|
||||
evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
|
||||
}
|
||||
|
||||
- spin_unlock(&lchn->lock);
|
||||
+ spin_unlock_irqrestore(&lchn->lock, flags);
|
||||
}
|
||||
|
||||
void evtchn_check_pollers(struct domain *d, unsigned int port)
|
|
@ -1,392 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: evtchn: address races with evtchn_reset()
|
||||
|
||||
Neither d->evtchn_port_ops nor max_evtchns(d) may be used in an entirely
|
||||
lock-less manner, as both may change by a racing evtchn_reset(). In the
|
||||
common case, at least one of the domain's event lock or the per-channel
|
||||
lock needs to be held. In the specific case of the inter-domain sending
|
||||
by evtchn_send() and notify_via_xen_event_channel() holding the other
|
||||
side's per-channel lock is sufficient, as the channel can't change state
|
||||
without both per-channel locks held. Without such a channel changing
|
||||
state, evtchn_reset() can't complete successfully.
|
||||
|
||||
Lock-free accesses continue to be permitted for the shim (calling some
|
||||
otherwise internal event channel functions), as this happens while the
|
||||
domain is in effectively single-threaded mode. Special care also needs
|
||||
taking for the shim's marking of in-use ports as ECS_RESERVED (allowing
|
||||
use of such ports in the shim case is okay because switching into and
|
||||
hence also out of FIFO mode is impossihble there).
|
||||
|
||||
As a side effect, certain operations on Xen bound event channels which
|
||||
were mistakenly permitted so far (e.g. unmask or poll) will be refused
|
||||
now.
|
||||
|
||||
This is part of XSA-343.
|
||||
|
||||
Reported-by: Julien Grall <jgrall@amazon.com>
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Acked-by: Julien Grall <jgrall@amazon.com>
|
||||
---
|
||||
v9: Add arch_evtchn_is_special() to fix PV shim.
|
||||
v8: Add BUILD_BUG_ON() in evtchn_usable().
|
||||
v7: Add locking related comment ahead of struct evtchn_port_ops.
|
||||
v6: New.
|
||||
---
|
||||
TBD: I've been considering to move some of the wrappers from xen/event.h
|
||||
into event_channel.c (or even drop them altogether), when they
|
||||
require external locking (e.g. evtchn_port_init() or
|
||||
evtchn_port_set_priority()). Does anyone have a strong opinion
|
||||
either way?
|
||||
|
||||
--- a/xen/arch/x86/irq.c
|
||||
+++ b/xen/arch/x86/irq.c
|
||||
@@ -2488,14 +2488,24 @@ static void dump_irqs(unsigned char key)
|
||||
|
||||
for ( i = 0; i < action->nr_guests; )
|
||||
{
|
||||
+ struct evtchn *evtchn;
|
||||
+ unsigned int pending = 2, masked = 2;
|
||||
+
|
||||
d = action->guest[i++];
|
||||
pirq = domain_irq_to_pirq(d, irq);
|
||||
info = pirq_info(d, pirq);
|
||||
+ evtchn = evtchn_from_port(d, info->evtchn);
|
||||
+ local_irq_disable();
|
||||
+ if ( spin_trylock(&evtchn->lock) )
|
||||
+ {
|
||||
+ pending = evtchn_is_pending(d, evtchn);
|
||||
+ masked = evtchn_is_masked(d, evtchn);
|
||||
+ spin_unlock(&evtchn->lock);
|
||||
+ }
|
||||
+ local_irq_enable();
|
||||
printk("d%d:%3d(%c%c%c)%c",
|
||||
- d->domain_id, pirq,
|
||||
- evtchn_port_is_pending(d, info->evtchn) ? 'P' : '-',
|
||||
- evtchn_port_is_masked(d, info->evtchn) ? 'M' : '-',
|
||||
- info->masked ? 'M' : '-',
|
||||
+ d->domain_id, pirq, "-P?"[pending],
|
||||
+ "-M?"[masked], info->masked ? 'M' : '-',
|
||||
i < action->nr_guests ? ',' : '\n');
|
||||
}
|
||||
}
|
||||
--- a/xen/arch/x86/pv/shim.c
|
||||
+++ b/xen/arch/x86/pv/shim.c
|
||||
@@ -660,8 +660,11 @@ void pv_shim_inject_evtchn(unsigned int
|
||||
if ( port_is_valid(guest, port) )
|
||||
{
|
||||
struct evtchn *chn = evtchn_from_port(guest, port);
|
||||
+ unsigned long flags;
|
||||
|
||||
+ spin_lock_irqsave(&chn->lock, flags);
|
||||
evtchn_port_set_pending(guest, chn->notify_vcpu_id, chn);
|
||||
+ spin_unlock_irqrestore(&chn->lock, flags);
|
||||
}
|
||||
}
|
||||
|
||||
--- a/xen/common/event_2l.c
|
||||
+++ b/xen/common/event_2l.c
|
||||
@@ -63,8 +63,10 @@ static void evtchn_2l_unmask(struct doma
|
||||
}
|
||||
}
|
||||
|
||||
-static bool evtchn_2l_is_pending(const struct domain *d, evtchn_port_t port)
|
||||
+static bool evtchn_2l_is_pending(const struct domain *d,
|
||||
+ const struct evtchn *evtchn)
|
||||
{
|
||||
+ evtchn_port_t port = evtchn->port;
|
||||
unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
|
||||
|
||||
ASSERT(port < max_ports);
|
||||
@@ -72,8 +74,10 @@ static bool evtchn_2l_is_pending(const s
|
||||
guest_test_bit(d, port, &shared_info(d, evtchn_pending)));
|
||||
}
|
||||
|
||||
-static bool evtchn_2l_is_masked(const struct domain *d, evtchn_port_t port)
|
||||
+static bool evtchn_2l_is_masked(const struct domain *d,
|
||||
+ const struct evtchn *evtchn)
|
||||
{
|
||||
+ evtchn_port_t port = evtchn->port;
|
||||
unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
|
||||
|
||||
ASSERT(port < max_ports);
|
||||
--- a/xen/common/event_channel.c
|
||||
+++ b/xen/common/event_channel.c
|
||||
@@ -156,8 +156,9 @@ int evtchn_allocate_port(struct domain *
|
||||
|
||||
if ( port_is_valid(d, port) )
|
||||
{
|
||||
- if ( evtchn_from_port(d, port)->state != ECS_FREE ||
|
||||
- evtchn_port_is_busy(d, port) )
|
||||
+ const struct evtchn *chn = evtchn_from_port(d, port);
|
||||
+
|
||||
+ if ( chn->state != ECS_FREE || evtchn_is_busy(d, chn) )
|
||||
return -EBUSY;
|
||||
}
|
||||
else
|
||||
@@ -774,6 +775,7 @@ void send_guest_vcpu_virq(struct vcpu *v
|
||||
unsigned long flags;
|
||||
int port;
|
||||
struct domain *d;
|
||||
+ struct evtchn *chn;
|
||||
|
||||
ASSERT(!virq_is_global(virq));
|
||||
|
||||
@@ -784,7 +786,10 @@ void send_guest_vcpu_virq(struct vcpu *v
|
||||
goto out;
|
||||
|
||||
d = v->domain;
|
||||
- evtchn_port_set_pending(d, v->vcpu_id, evtchn_from_port(d, port));
|
||||
+ chn = evtchn_from_port(d, port);
|
||||
+ spin_lock(&chn->lock);
|
||||
+ evtchn_port_set_pending(d, v->vcpu_id, chn);
|
||||
+ spin_unlock(&chn->lock);
|
||||
|
||||
out:
|
||||
spin_unlock_irqrestore(&v->virq_lock, flags);
|
||||
@@ -813,7 +818,9 @@ void send_guest_global_virq(struct domai
|
||||
goto out;
|
||||
|
||||
chn = evtchn_from_port(d, port);
|
||||
+ spin_lock(&chn->lock);
|
||||
evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
|
||||
+ spin_unlock(&chn->lock);
|
||||
|
||||
out:
|
||||
spin_unlock_irqrestore(&v->virq_lock, flags);
|
||||
@@ -823,6 +830,7 @@ void send_guest_pirq(struct domain *d, c
|
||||
{
|
||||
int port;
|
||||
struct evtchn *chn;
|
||||
+ unsigned long flags;
|
||||
|
||||
/*
|
||||
* PV guests: It should not be possible to race with __evtchn_close(). The
|
||||
@@ -837,7 +845,9 @@ void send_guest_pirq(struct domain *d, c
|
||||
}
|
||||
|
||||
chn = evtchn_from_port(d, port);
|
||||
+ spin_lock_irqsave(&chn->lock, flags);
|
||||
evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
|
||||
+ spin_unlock_irqrestore(&chn->lock, flags);
|
||||
}
|
||||
|
||||
static struct domain *global_virq_handlers[NR_VIRQS] __read_mostly;
|
||||
@@ -1034,12 +1044,15 @@ int evtchn_unmask(unsigned int port)
|
||||
{
|
||||
struct domain *d = current->domain;
|
||||
struct evtchn *evtchn;
|
||||
+ unsigned long flags;
|
||||
|
||||
if ( unlikely(!port_is_valid(d, port)) )
|
||||
return -EINVAL;
|
||||
|
||||
evtchn = evtchn_from_port(d, port);
|
||||
+ spin_lock_irqsave(&evtchn->lock, flags);
|
||||
evtchn_port_unmask(d, evtchn);
|
||||
+ spin_unlock_irqrestore(&evtchn->lock, flags);
|
||||
|
||||
return 0;
|
||||
}
|
||||
@@ -1449,8 +1462,8 @@ static void domain_dump_evtchn_info(stru
|
||||
|
||||
printk(" %4u [%d/%d/",
|
||||
port,
|
||||
- evtchn_port_is_pending(d, port),
|
||||
- evtchn_port_is_masked(d, port));
|
||||
+ evtchn_is_pending(d, chn),
|
||||
+ evtchn_is_masked(d, chn));
|
||||
evtchn_port_print_state(d, chn);
|
||||
printk("]: s=%d n=%d x=%d",
|
||||
chn->state, chn->notify_vcpu_id, chn->xen_consumer);
|
||||
--- a/xen/common/event_fifo.c
|
||||
+++ b/xen/common/event_fifo.c
|
||||
@@ -296,23 +296,26 @@ static void evtchn_fifo_unmask(struct do
|
||||
evtchn_fifo_set_pending(v, evtchn);
|
||||
}
|
||||
|
||||
-static bool evtchn_fifo_is_pending(const struct domain *d, evtchn_port_t port)
|
||||
+static bool evtchn_fifo_is_pending(const struct domain *d,
|
||||
+ const struct evtchn *evtchn)
|
||||
{
|
||||
- const event_word_t *word = evtchn_fifo_word_from_port(d, port);
|
||||
+ const event_word_t *word = evtchn_fifo_word_from_port(d, evtchn->port);
|
||||
|
||||
return word && guest_test_bit(d, EVTCHN_FIFO_PENDING, word);
|
||||
}
|
||||
|
||||
-static bool_t evtchn_fifo_is_masked(const struct domain *d, evtchn_port_t port)
|
||||
+static bool_t evtchn_fifo_is_masked(const struct domain *d,
|
||||
+ const struct evtchn *evtchn)
|
||||
{
|
||||
- const event_word_t *word = evtchn_fifo_word_from_port(d, port);
|
||||
+ const event_word_t *word = evtchn_fifo_word_from_port(d, evtchn->port);
|
||||
|
||||
return !word || guest_test_bit(d, EVTCHN_FIFO_MASKED, word);
|
||||
}
|
||||
|
||||
-static bool_t evtchn_fifo_is_busy(const struct domain *d, evtchn_port_t port)
|
||||
+static bool_t evtchn_fifo_is_busy(const struct domain *d,
|
||||
+ const struct evtchn *evtchn)
|
||||
{
|
||||
- const event_word_t *word = evtchn_fifo_word_from_port(d, port);
|
||||
+ const event_word_t *word = evtchn_fifo_word_from_port(d, evtchn->port);
|
||||
|
||||
return word && guest_test_bit(d, EVTCHN_FIFO_LINKED, word);
|
||||
}
|
||||
--- a/xen/include/asm-x86/event.h
|
||||
+++ b/xen/include/asm-x86/event.h
|
||||
@@ -47,4 +47,10 @@ static inline bool arch_virq_is_global(u
|
||||
return true;
|
||||
}
|
||||
|
||||
+#ifdef CONFIG_PV_SHIM
|
||||
+# include <asm/pv/shim.h>
|
||||
+# define arch_evtchn_is_special(chn) \
|
||||
+ (pv_shim && (chn)->port && (chn)->state == ECS_RESERVED)
|
||||
+#endif
|
||||
+
|
||||
#endif
|
||||
--- a/xen/include/xen/event.h
|
||||
+++ b/xen/include/xen/event.h
|
||||
@@ -133,6 +133,24 @@ static inline struct evtchn *evtchn_from
|
||||
return bucket_from_port(d, p) + (p % EVTCHNS_PER_BUCKET);
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * "usable" as in "by a guest", i.e. Xen consumed channels are assumed to be
|
||||
+ * taken care of separately where used for Xen's internal purposes.
|
||||
+ */
|
||||
+static bool evtchn_usable(const struct evtchn *evtchn)
|
||||
+{
|
||||
+ if ( evtchn->xen_consumer )
|
||||
+ return false;
|
||||
+
|
||||
+#ifdef arch_evtchn_is_special
|
||||
+ if ( arch_evtchn_is_special(evtchn) )
|
||||
+ return true;
|
||||
+#endif
|
||||
+
|
||||
+ BUILD_BUG_ON(ECS_FREE > ECS_RESERVED);
|
||||
+ return evtchn->state > ECS_RESERVED;
|
||||
+}
|
||||
+
|
||||
/* Wait on a Xen-attached event channel. */
|
||||
#define wait_on_xen_event_channel(port, condition) \
|
||||
do { \
|
||||
@@ -165,19 +183,24 @@ int evtchn_reset(struct domain *d);
|
||||
|
||||
/*
|
||||
* Low-level event channel port ops.
|
||||
+ *
|
||||
+ * All hooks have to be called with a lock held which prevents the channel
|
||||
+ * from changing state. This may be the domain event lock, the per-channel
|
||||
+ * lock, or in the case of sending interdomain events also the other side's
|
||||
+ * per-channel lock. Exceptions apply in certain cases for the PV shim.
|
||||
*/
|
||||
struct evtchn_port_ops {
|
||||
void (*init)(struct domain *d, struct evtchn *evtchn);
|
||||
void (*set_pending)(struct vcpu *v, struct evtchn *evtchn);
|
||||
void (*clear_pending)(struct domain *d, struct evtchn *evtchn);
|
||||
void (*unmask)(struct domain *d, struct evtchn *evtchn);
|
||||
- bool (*is_pending)(const struct domain *d, evtchn_port_t port);
|
||||
- bool (*is_masked)(const struct domain *d, evtchn_port_t port);
|
||||
+ bool (*is_pending)(const struct domain *d, const struct evtchn *evtchn);
|
||||
+ bool (*is_masked)(const struct domain *d, const struct evtchn *evtchn);
|
||||
/*
|
||||
* Is the port unavailable because it's still being cleaned up
|
||||
* after being closed?
|
||||
*/
|
||||
- bool (*is_busy)(const struct domain *d, evtchn_port_t port);
|
||||
+ bool (*is_busy)(const struct domain *d, const struct evtchn *evtchn);
|
||||
int (*set_priority)(struct domain *d, struct evtchn *evtchn,
|
||||
unsigned int priority);
|
||||
void (*print_state)(struct domain *d, const struct evtchn *evtchn);
|
||||
@@ -193,38 +216,67 @@ static inline void evtchn_port_set_pendi
|
||||
unsigned int vcpu_id,
|
||||
struct evtchn *evtchn)
|
||||
{
|
||||
- d->evtchn_port_ops->set_pending(d->vcpu[vcpu_id], evtchn);
|
||||
+ if ( evtchn_usable(evtchn) )
|
||||
+ d->evtchn_port_ops->set_pending(d->vcpu[vcpu_id], evtchn);
|
||||
}
|
||||
|
||||
static inline void evtchn_port_clear_pending(struct domain *d,
|
||||
struct evtchn *evtchn)
|
||||
{
|
||||
- d->evtchn_port_ops->clear_pending(d, evtchn);
|
||||
+ if ( evtchn_usable(evtchn) )
|
||||
+ d->evtchn_port_ops->clear_pending(d, evtchn);
|
||||
}
|
||||
|
||||
static inline void evtchn_port_unmask(struct domain *d,
|
||||
struct evtchn *evtchn)
|
||||
{
|
||||
- d->evtchn_port_ops->unmask(d, evtchn);
|
||||
+ if ( evtchn_usable(evtchn) )
|
||||
+ d->evtchn_port_ops->unmask(d, evtchn);
|
||||
}
|
||||
|
||||
-static inline bool evtchn_port_is_pending(const struct domain *d,
|
||||
- evtchn_port_t port)
|
||||
+static inline bool evtchn_is_pending(const struct domain *d,
|
||||
+ const struct evtchn *evtchn)
|
||||
{
|
||||
- return d->evtchn_port_ops->is_pending(d, port);
|
||||
+ return evtchn_usable(evtchn) && d->evtchn_port_ops->is_pending(d, evtchn);
|
||||
}
|
||||
|
||||
-static inline bool evtchn_port_is_masked(const struct domain *d,
|
||||
- evtchn_port_t port)
|
||||
+static inline bool evtchn_port_is_pending(struct domain *d, evtchn_port_t port)
|
||||
{
|
||||
- return d->evtchn_port_ops->is_masked(d, port);
|
||||
+ struct evtchn *evtchn = evtchn_from_port(d, port);
|
||||
+ bool rc;
|
||||
+ unsigned long flags;
|
||||
+
|
||||
+ spin_lock_irqsave(&evtchn->lock, flags);
|
||||
+ rc = evtchn_is_pending(d, evtchn);
|
||||
+ spin_unlock_irqrestore(&evtchn->lock, flags);
|
||||
+
|
||||
+ return rc;
|
||||
+}
|
||||
+
|
||||
+static inline bool evtchn_is_masked(const struct domain *d,
|
||||
+ const struct evtchn *evtchn)
|
||||
+{
|
||||
+ return !evtchn_usable(evtchn) || d->evtchn_port_ops->is_masked(d, evtchn);
|
||||
+}
|
||||
+
|
||||
+static inline bool evtchn_port_is_masked(struct domain *d, evtchn_port_t port)
|
||||
+{
|
||||
+ struct evtchn *evtchn = evtchn_from_port(d, port);
|
||||
+ bool rc;
|
||||
+ unsigned long flags;
|
||||
+
|
||||
+ spin_lock_irqsave(&evtchn->lock, flags);
|
||||
+ rc = evtchn_is_masked(d, evtchn);
|
||||
+ spin_unlock_irqrestore(&evtchn->lock, flags);
|
||||
+
|
||||
+ return rc;
|
||||
}
|
||||
|
||||
-static inline bool evtchn_port_is_busy(const struct domain *d,
|
||||
- evtchn_port_t port)
|
||||
+static inline bool evtchn_is_busy(const struct domain *d,
|
||||
+ const struct evtchn *evtchn)
|
||||
{
|
||||
return d->evtchn_port_ops->is_busy &&
|
||||
- d->evtchn_port_ops->is_busy(d, port);
|
||||
+ d->evtchn_port_ops->is_busy(d, evtchn);
|
||||
}
|
||||
|
||||
static inline int evtchn_port_set_priority(struct domain *d,
|
||||
@@ -233,6 +285,8 @@ static inline int evtchn_port_set_priori
|
||||
{
|
||||
if ( !d->evtchn_port_ops->set_priority )
|
||||
return -ENOSYS;
|
||||
+ if ( !evtchn_usable(evtchn) )
|
||||
+ return -EACCES;
|
||||
return d->evtchn_port_ops->set_priority(d, evtchn, priority);
|
||||
}
|
||||
|
|
@ -1,130 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: evtchn: arrange for preemption in evtchn_destroy()
|
||||
|
||||
Especially closing of fully established interdomain channels can take
|
||||
quite some time, due to the locking involved. Therefore we shouldn't
|
||||
assume we can clean up still active ports all in one go. Besides adding
|
||||
the necessary preemption check, also avoid pointlessly starting from
|
||||
(or now really ending at) 0; 1 is the lowest numbered port which may
|
||||
need closing.
|
||||
|
||||
Since we're now reducing ->valid_evtchns, free_xen_event_channel(),
|
||||
and (at least to be on the safe side) notify_via_xen_event_channel()
|
||||
need to cope with attempts to close / unbind from / send through already
|
||||
closed (and no longer valid, as per port_is_valid()) ports.
|
||||
|
||||
This is part of XSA-344.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Acked-by: Julien Grall <jgrall@amazon.com>
|
||||
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
|
||||
--- a/xen/common/domain.c
|
||||
+++ b/xen/common/domain.c
|
||||
@@ -770,12 +770,14 @@ int domain_kill(struct domain *d)
|
||||
return domain_kill(d);
|
||||
d->is_dying = DOMDYING_dying;
|
||||
argo_destroy(d);
|
||||
- evtchn_destroy(d);
|
||||
gnttab_release_mappings(d);
|
||||
vnuma_destroy(d->vnuma);
|
||||
domain_set_outstanding_pages(d, 0);
|
||||
/* fallthrough */
|
||||
case DOMDYING_dying:
|
||||
+ rc = evtchn_destroy(d);
|
||||
+ if ( rc )
|
||||
+ break;
|
||||
rc = domain_relinquish_resources(d);
|
||||
if ( rc != 0 )
|
||||
break;
|
||||
--- a/xen/common/event_channel.c
|
||||
+++ b/xen/common/event_channel.c
|
||||
@@ -1297,7 +1297,16 @@ int alloc_unbound_xen_event_channel(
|
||||
|
||||
void free_xen_event_channel(struct domain *d, int port)
|
||||
{
|
||||
- BUG_ON(!port_is_valid(d, port));
|
||||
+ if ( !port_is_valid(d, port) )
|
||||
+ {
|
||||
+ /*
|
||||
+ * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
|
||||
+ * with the spin_barrier() and BUG_ON() in evtchn_destroy().
|
||||
+ */
|
||||
+ smp_rmb();
|
||||
+ BUG_ON(!d->is_dying);
|
||||
+ return;
|
||||
+ }
|
||||
|
||||
evtchn_close(d, port, 0);
|
||||
}
|
||||
@@ -1309,7 +1318,17 @@ void notify_via_xen_event_channel(struct
|
||||
struct domain *rd;
|
||||
unsigned long flags;
|
||||
|
||||
- ASSERT(port_is_valid(ld, lport));
|
||||
+ if ( !port_is_valid(ld, lport) )
|
||||
+ {
|
||||
+ /*
|
||||
+ * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
|
||||
+ * with the spin_barrier() and BUG_ON() in evtchn_destroy().
|
||||
+ */
|
||||
+ smp_rmb();
|
||||
+ ASSERT(ld->is_dying);
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
lchn = evtchn_from_port(ld, lport);
|
||||
|
||||
spin_lock_irqsave(&lchn->lock, flags);
|
||||
@@ -1380,8 +1399,7 @@ int evtchn_init(struct domain *d, unsign
|
||||
return 0;
|
||||
}
|
||||
|
||||
-
|
||||
-void evtchn_destroy(struct domain *d)
|
||||
+int evtchn_destroy(struct domain *d)
|
||||
{
|
||||
unsigned int i;
|
||||
|
||||
@@ -1390,14 +1408,29 @@ void evtchn_destroy(struct domain *d)
|
||||
spin_barrier(&d->event_lock);
|
||||
|
||||
/* Close all existing event channels. */
|
||||
- for ( i = 0; port_is_valid(d, i); i++ )
|
||||
+ for ( i = d->valid_evtchns; --i; )
|
||||
+ {
|
||||
evtchn_close(d, i, 0);
|
||||
|
||||
+ /*
|
||||
+ * Avoid preempting when called from domain_create()'s error path,
|
||||
+ * and don't check too often (choice of frequency is arbitrary).
|
||||
+ */
|
||||
+ if ( i && !(i & 0x3f) && d->is_dying != DOMDYING_dead &&
|
||||
+ hypercall_preempt_check() )
|
||||
+ {
|
||||
+ write_atomic(&d->valid_evtchns, i);
|
||||
+ return -ERESTART;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
ASSERT(!d->active_evtchns);
|
||||
|
||||
clear_global_virq_handlers(d);
|
||||
|
||||
evtchn_fifo_destroy(d);
|
||||
+
|
||||
+ return 0;
|
||||
}
|
||||
|
||||
|
||||
--- a/xen/include/xen/sched.h
|
||||
+++ b/xen/include/xen/sched.h
|
||||
@@ -136,7 +136,7 @@ struct evtchn
|
||||
} __attribute__((aligned(64)));
|
||||
|
||||
int evtchn_init(struct domain *d, unsigned int max_port);
|
||||
-void evtchn_destroy(struct domain *d); /* from domain_kill */
|
||||
+int evtchn_destroy(struct domain *d); /* from domain_kill */
|
||||
void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
|
||||
|
||||
struct waitqueue_vcpu;
|
|
@ -1,203 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: evtchn: arrange for preemption in evtchn_reset()
|
||||
|
||||
Like for evtchn_destroy() looping over all possible event channels to
|
||||
close them can take a significant amount of time. Unlike done there, we
|
||||
can't alter domain properties (i.e. d->valid_evtchns) here. Borrow, in a
|
||||
lightweight form, the paging domctl continuation concept, redirecting
|
||||
the continuations to different sub-ops. Just like there this is to be
|
||||
able to allow for predictable overall results of the involved sub-ops:
|
||||
Racing requests should either complete or be refused.
|
||||
|
||||
Note that a domain can't interfere with an already started (by a remote
|
||||
domain) reset, due to being paused. It can prevent a remote reset from
|
||||
happening by leaving a reset unfinished, but that's only going to affect
|
||||
itself.
|
||||
|
||||
This is part of XSA-344.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Acked-by: Julien Grall <jgrall@amazon.com>
|
||||
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
|
||||
--- a/xen/common/domain.c
|
||||
+++ b/xen/common/domain.c
|
||||
@@ -1214,7 +1214,7 @@ void domain_unpause_except_self(struct d
|
||||
domain_unpause(d);
|
||||
}
|
||||
|
||||
-int domain_soft_reset(struct domain *d)
|
||||
+int domain_soft_reset(struct domain *d, bool resuming)
|
||||
{
|
||||
struct vcpu *v;
|
||||
int rc;
|
||||
@@ -1228,7 +1228,7 @@ int domain_soft_reset(struct domain *d)
|
||||
}
|
||||
spin_unlock(&d->shutdown_lock);
|
||||
|
||||
- rc = evtchn_reset(d);
|
||||
+ rc = evtchn_reset(d, resuming);
|
||||
if ( rc )
|
||||
return rc;
|
||||
|
||||
--- a/xen/common/domctl.c
|
||||
+++ b/xen/common/domctl.c
|
||||
@@ -572,12 +572,22 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
|
||||
}
|
||||
|
||||
case XEN_DOMCTL_soft_reset:
|
||||
+ case XEN_DOMCTL_soft_reset_cont:
|
||||
if ( d == current->domain ) /* no domain_pause() */
|
||||
{
|
||||
ret = -EINVAL;
|
||||
break;
|
||||
}
|
||||
- ret = domain_soft_reset(d);
|
||||
+ ret = domain_soft_reset(d, op->cmd == XEN_DOMCTL_soft_reset_cont);
|
||||
+ if ( ret == -ERESTART )
|
||||
+ {
|
||||
+ op->cmd = XEN_DOMCTL_soft_reset_cont;
|
||||
+ if ( !__copy_field_to_guest(u_domctl, op, cmd) )
|
||||
+ ret = hypercall_create_continuation(__HYPERVISOR_domctl,
|
||||
+ "h", u_domctl);
|
||||
+ else
|
||||
+ ret = -EFAULT;
|
||||
+ }
|
||||
break;
|
||||
|
||||
case XEN_DOMCTL_destroydomain:
|
||||
--- a/xen/common/event_channel.c
|
||||
+++ b/xen/common/event_channel.c
|
||||
@@ -1057,7 +1057,7 @@ int evtchn_unmask(unsigned int port)
|
||||
return 0;
|
||||
}
|
||||
|
||||
-int evtchn_reset(struct domain *d)
|
||||
+int evtchn_reset(struct domain *d, bool resuming)
|
||||
{
|
||||
unsigned int i;
|
||||
int rc = 0;
|
||||
@@ -1065,11 +1065,40 @@ int evtchn_reset(struct domain *d)
|
||||
if ( d != current->domain && !d->controller_pause_count )
|
||||
return -EINVAL;
|
||||
|
||||
- for ( i = 0; port_is_valid(d, i); i++ )
|
||||
+ spin_lock(&d->event_lock);
|
||||
+
|
||||
+ /*
|
||||
+ * If we are resuming, then start where we stopped. Otherwise, check
|
||||
+ * that a reset operation is not already in progress, and if none is,
|
||||
+ * record that this is now the case.
|
||||
+ */
|
||||
+ i = resuming ? d->next_evtchn : !d->next_evtchn;
|
||||
+ if ( i > d->next_evtchn )
|
||||
+ d->next_evtchn = i;
|
||||
+
|
||||
+ spin_unlock(&d->event_lock);
|
||||
+
|
||||
+ if ( !i )
|
||||
+ return -EBUSY;
|
||||
+
|
||||
+ for ( ; port_is_valid(d, i); i++ )
|
||||
+ {
|
||||
evtchn_close(d, i, 1);
|
||||
|
||||
+ /* NB: Choice of frequency is arbitrary. */
|
||||
+ if ( !(i & 0x3f) && hypercall_preempt_check() )
|
||||
+ {
|
||||
+ spin_lock(&d->event_lock);
|
||||
+ d->next_evtchn = i;
|
||||
+ spin_unlock(&d->event_lock);
|
||||
+ return -ERESTART;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
spin_lock(&d->event_lock);
|
||||
|
||||
+ d->next_evtchn = 0;
|
||||
+
|
||||
if ( d->active_evtchns > d->xen_evtchns )
|
||||
rc = -EAGAIN;
|
||||
else if ( d->evtchn_fifo )
|
||||
@@ -1204,7 +1233,8 @@ long do_event_channel_op(int cmd, XEN_GU
|
||||
break;
|
||||
}
|
||||
|
||||
- case EVTCHNOP_reset: {
|
||||
+ case EVTCHNOP_reset:
|
||||
+ case EVTCHNOP_reset_cont: {
|
||||
struct evtchn_reset reset;
|
||||
struct domain *d;
|
||||
|
||||
@@ -1217,9 +1247,13 @@ long do_event_channel_op(int cmd, XEN_GU
|
||||
|
||||
rc = xsm_evtchn_reset(XSM_TARGET, current->domain, d);
|
||||
if ( !rc )
|
||||
- rc = evtchn_reset(d);
|
||||
+ rc = evtchn_reset(d, cmd == EVTCHNOP_reset_cont);
|
||||
|
||||
rcu_unlock_domain(d);
|
||||
+
|
||||
+ if ( rc == -ERESTART )
|
||||
+ rc = hypercall_create_continuation(__HYPERVISOR_event_channel_op,
|
||||
+ "ih", EVTCHNOP_reset_cont, arg);
|
||||
break;
|
||||
}
|
||||
|
||||
--- a/xen/include/public/domctl.h
|
||||
+++ b/xen/include/public/domctl.h
|
||||
@@ -1152,7 +1152,10 @@ struct xen_domctl {
|
||||
#define XEN_DOMCTL_iomem_permission 20
|
||||
#define XEN_DOMCTL_ioport_permission 21
|
||||
#define XEN_DOMCTL_hypercall_init 22
|
||||
-#define XEN_DOMCTL_arch_setup 23 /* Obsolete IA64 only */
|
||||
+#ifdef __XEN__
|
||||
+/* #define XEN_DOMCTL_arch_setup 23 Obsolete IA64 only */
|
||||
+#define XEN_DOMCTL_soft_reset_cont 23
|
||||
+#endif
|
||||
#define XEN_DOMCTL_settimeoffset 24
|
||||
#define XEN_DOMCTL_getvcpuaffinity 25
|
||||
#define XEN_DOMCTL_real_mode_area 26 /* Obsolete PPC only */
|
||||
--- a/xen/include/public/event_channel.h
|
||||
+++ b/xen/include/public/event_channel.h
|
||||
@@ -74,6 +74,9 @@
|
||||
#define EVTCHNOP_init_control 11
|
||||
#define EVTCHNOP_expand_array 12
|
||||
#define EVTCHNOP_set_priority 13
|
||||
+#ifdef __XEN__
|
||||
+#define EVTCHNOP_reset_cont 14
|
||||
+#endif
|
||||
/* ` } */
|
||||
|
||||
typedef uint32_t evtchn_port_t;
|
||||
--- a/xen/include/xen/event.h
|
||||
+++ b/xen/include/xen/event.h
|
||||
@@ -171,7 +171,7 @@ void evtchn_check_pollers(struct domain
|
||||
void evtchn_2l_init(struct domain *d);
|
||||
|
||||
/* Close all event channels and reset to 2-level ABI. */
|
||||
-int evtchn_reset(struct domain *d);
|
||||
+int evtchn_reset(struct domain *d, bool resuming);
|
||||
|
||||
/*
|
||||
* Low-level event channel port ops.
|
||||
--- a/xen/include/xen/sched.h
|
||||
+++ b/xen/include/xen/sched.h
|
||||
@@ -394,6 +394,8 @@ struct domain
|
||||
* EVTCHNOP_reset). Read/write access like for active_evtchns.
|
||||
*/
|
||||
unsigned int xen_evtchns;
|
||||
+ /* Port to resume from in evtchn_reset(), when in a continuation. */
|
||||
+ unsigned int next_evtchn;
|
||||
spinlock_t event_lock;
|
||||
const struct evtchn_port_ops *evtchn_port_ops;
|
||||
struct evtchn_fifo_domain *evtchn_fifo;
|
||||
@@ -663,7 +665,7 @@ int domain_shutdown(struct domain *d, u8
|
||||
void domain_resume(struct domain *d);
|
||||
void domain_pause_for_debugger(void);
|
||||
|
||||
-int domain_soft_reset(struct domain *d);
|
||||
+int domain_soft_reset(struct domain *d, bool resuming);
|
||||
|
||||
int vcpu_start_shutdown_deferral(struct vcpu *v);
|
||||
void vcpu_end_shutdown_deferral(struct vcpu *v);
|
|
@ -1,94 +0,0 @@
|
|||
From b3e0d4e37b7902533a463812374947d4d6d2e463 Mon Sep 17 00:00:00 2001
|
||||
From: Wei Liu <wei.liu2@citrix.com>
|
||||
Date: Sat, 11 Jan 2020 21:57:41 +0000
|
||||
Subject: [PATCH 1/3] x86/mm: Refactor map_pages_to_xen to have only a single
|
||||
exit path
|
||||
|
||||
We will soon need to perform clean-ups before returning.
|
||||
|
||||
No functional change.
|
||||
|
||||
This is part of XSA-345.
|
||||
|
||||
Reported-by: Hongyan Xia <hongyxia@amazon.com>
|
||||
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
|
||||
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
|
||||
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
|
||||
Acked-by: Jan Beulich <jbeulich@suse.com>
|
||||
---
|
||||
xen/arch/x86/mm.c | 17 +++++++++++------
|
||||
1 file changed, 11 insertions(+), 6 deletions(-)
|
||||
|
||||
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
|
||||
index 30dffb68e8..133a393875 100644
|
||||
--- a/xen/arch/x86/mm.c
|
||||
+++ b/xen/arch/x86/mm.c
|
||||
@@ -5187,6 +5187,7 @@ int map_pages_to_xen(
|
||||
l2_pgentry_t *pl2e, ol2e;
|
||||
l1_pgentry_t *pl1e, ol1e;
|
||||
unsigned int i;
|
||||
+ int rc = -ENOMEM;
|
||||
|
||||
#define flush_flags(oldf) do { \
|
||||
unsigned int o_ = (oldf); \
|
||||
@@ -5207,7 +5208,8 @@ int map_pages_to_xen(
|
||||
l3_pgentry_t ol3e, *pl3e = virt_to_xen_l3e(virt);
|
||||
|
||||
if ( !pl3e )
|
||||
- return -ENOMEM;
|
||||
+ goto out;
|
||||
+
|
||||
ol3e = *pl3e;
|
||||
|
||||
if ( cpu_has_page1gb &&
|
||||
@@ -5295,7 +5297,7 @@ int map_pages_to_xen(
|
||||
|
||||
pl2e = alloc_xen_pagetable();
|
||||
if ( pl2e == NULL )
|
||||
- return -ENOMEM;
|
||||
+ goto out;
|
||||
|
||||
for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
|
||||
l2e_write(pl2e + i,
|
||||
@@ -5324,7 +5326,7 @@ int map_pages_to_xen(
|
||||
|
||||
pl2e = virt_to_xen_l2e(virt);
|
||||
if ( !pl2e )
|
||||
- return -ENOMEM;
|
||||
+ goto out;
|
||||
|
||||
if ( ((((virt >> PAGE_SHIFT) | mfn_x(mfn)) &
|
||||
((1u << PAGETABLE_ORDER) - 1)) == 0) &&
|
||||
@@ -5367,7 +5369,7 @@ int map_pages_to_xen(
|
||||
{
|
||||
pl1e = virt_to_xen_l1e(virt);
|
||||
if ( pl1e == NULL )
|
||||
- return -ENOMEM;
|
||||
+ goto out;
|
||||
}
|
||||
else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
|
||||
{
|
||||
@@ -5394,7 +5396,7 @@ int map_pages_to_xen(
|
||||
|
||||
pl1e = alloc_xen_pagetable();
|
||||
if ( pl1e == NULL )
|
||||
- return -ENOMEM;
|
||||
+ goto out;
|
||||
|
||||
for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
|
||||
l1e_write(&pl1e[i],
|
||||
@@ -5538,7 +5540,10 @@ int map_pages_to_xen(
|
||||
|
||||
#undef flush_flags
|
||||
|
||||
- return 0;
|
||||
+ rc = 0;
|
||||
+
|
||||
+ out:
|
||||
+ return rc;
|
||||
}
|
||||
|
||||
int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
|
||||
--
|
||||
2.25.1
|
||||
|
|
@ -1,68 +0,0 @@
|
|||
From 9f6f35b833d295acaaa2d8ff8cf309bf688cfd50 Mon Sep 17 00:00:00 2001
|
||||
From: Wei Liu <wei.liu2@citrix.com>
|
||||
Date: Sat, 11 Jan 2020 21:57:42 +0000
|
||||
Subject: [PATCH 2/3] x86/mm: Refactor modify_xen_mappings to have one exit
|
||||
path
|
||||
|
||||
We will soon need to perform clean-ups before returning.
|
||||
|
||||
No functional change.
|
||||
|
||||
This is part of XSA-345.
|
||||
|
||||
Reported-by: Hongyan Xia <hongyxia@amazon.com>
|
||||
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
|
||||
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
|
||||
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
|
||||
Acked-by: Jan Beulich <jbeulich@suse.com>
|
||||
---
|
||||
xen/arch/x86/mm.c | 12 +++++++++---
|
||||
1 file changed, 9 insertions(+), 3 deletions(-)
|
||||
|
||||
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
|
||||
index 133a393875..af726d3274 100644
|
||||
--- a/xen/arch/x86/mm.c
|
||||
+++ b/xen/arch/x86/mm.c
|
||||
@@ -5570,6 +5570,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
|
||||
l1_pgentry_t *pl1e;
|
||||
unsigned int i;
|
||||
unsigned long v = s;
|
||||
+ int rc = -ENOMEM;
|
||||
|
||||
/* Set of valid PTE bits which may be altered. */
|
||||
#define FLAGS_MASK (_PAGE_NX|_PAGE_RW|_PAGE_PRESENT)
|
||||
@@ -5611,7 +5612,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
|
||||
/* PAGE1GB: shatter the superpage and fall through. */
|
||||
pl2e = alloc_xen_pagetable();
|
||||
if ( !pl2e )
|
||||
- return -ENOMEM;
|
||||
+ goto out;
|
||||
+
|
||||
for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
|
||||
l2e_write(pl2e + i,
|
||||
l2e_from_pfn(l3e_get_pfn(*pl3e) +
|
||||
@@ -5666,7 +5668,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
|
||||
/* PSE: shatter the superpage and try again. */
|
||||
pl1e = alloc_xen_pagetable();
|
||||
if ( !pl1e )
|
||||
- return -ENOMEM;
|
||||
+ goto out;
|
||||
+
|
||||
for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
|
||||
l1e_write(&pl1e[i],
|
||||
l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
|
||||
@@ -5795,7 +5798,10 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
|
||||
flush_area(NULL, FLUSH_TLB_GLOBAL);
|
||||
|
||||
#undef FLAGS_MASK
|
||||
- return 0;
|
||||
+ rc = 0;
|
||||
+
|
||||
+ out:
|
||||
+ return rc;
|
||||
}
|
||||
|
||||
#undef flush_area
|
||||
--
|
||||
2.25.1
|
||||
|
|
@ -1,249 +0,0 @@
|
|||
From 0ff9a8453dc47cd47eee9659d5916afb5094e871 Mon Sep 17 00:00:00 2001
|
||||
From: Hongyan Xia <hongyxia@amazon.com>
|
||||
Date: Sat, 11 Jan 2020 21:57:43 +0000
|
||||
Subject: [PATCH 3/3] x86/mm: Prevent some races in hypervisor mapping updates
|
||||
|
||||
map_pages_to_xen will attempt to coalesce mappings into 2MiB and 1GiB
|
||||
superpages if possible, to maximize TLB efficiency. This means both
|
||||
replacing superpage entries with smaller entries, and replacing
|
||||
smaller entries with superpages.
|
||||
|
||||
Unfortunately, while some potential races are handled correctly,
|
||||
others are not. These include:
|
||||
|
||||
1. When one processor modifies a sub-superpage mapping while another
|
||||
processor replaces the entire range with a superpage.
|
||||
|
||||
Take the following example:
|
||||
|
||||
Suppose L3[N] points to L2. And suppose we have two processors, A and
|
||||
B.
|
||||
|
||||
* A walks the pagetables, get a pointer to L2.
|
||||
* B replaces L3[N] with a 1GiB mapping.
|
||||
* B Frees L2
|
||||
* A writes L2[M] #
|
||||
|
||||
This is race exacerbated by the fact that virt_to_xen_l[21]e doesn't
|
||||
handle higher-level superpages properly: If you call virt_xen_to_l2e
|
||||
on a virtual address within an L3 superpage, you'll either hit a BUG()
|
||||
(most likely), or get a pointer into the middle of a data page; same
|
||||
with virt_xen_to_l1 on a virtual address within either an L3 or L2
|
||||
superpage.
|
||||
|
||||
So take the following example:
|
||||
|
||||
* A reads pl3e and discovers it to point to an L2.
|
||||
* B replaces L3[N] with a 1GiB mapping
|
||||
* A calls virt_to_xen_l2e() and hits the BUG_ON() #
|
||||
|
||||
2. When two processors simultaneously try to replace a sub-superpage
|
||||
mapping with a superpage mapping.
|
||||
|
||||
Take the following example:
|
||||
|
||||
Suppose L3[N] points to L2. And suppose we have two processors, A and B,
|
||||
both trying to replace L3[N] with a superpage.
|
||||
|
||||
* A walks the pagetables, get a pointer to pl3e, and takes a copy ol3e pointing to L2.
|
||||
* B walks the pagetables, gets a pointre to pl3e, and takes a copy ol3e pointing to L2.
|
||||
* A writes the new value into L3[N]
|
||||
* B writes the new value into L3[N]
|
||||
* A recursively frees all the L1's under L2, then frees L2
|
||||
* B recursively double-frees all the L1's under L2, then double-frees L2 #
|
||||
|
||||
Fix this by grabbing a lock for the entirety of the mapping update
|
||||
operation.
|
||||
|
||||
Rather than grabbing map_pgdir_lock for the entire operation, however,
|
||||
repurpose the PGT_locked bit from L3's page->type_info as a lock.
|
||||
This means that rather than locking the entire address space, we
|
||||
"only" lock a single 512GiB chunk of hypervisor address space at a
|
||||
time.
|
||||
|
||||
There was a proposal for a lock-and-reverify approach, where we walk
|
||||
the pagetables to the point where we decide what to do; then grab the
|
||||
map_pgdir_lock, re-verify the information we collected without the
|
||||
lock, and finally make the change (starting over again if anything had
|
||||
changed). Without being able to guarantee that the L2 table wasn't
|
||||
freed, however, that means every read would need to be considered
|
||||
potentially unsafe. Thinking carefully about that is probably
|
||||
something that wants to be done on public, not under time pressure.
|
||||
|
||||
This is part of XSA-345.
|
||||
|
||||
Reported-by: Hongyan Xia <hongyxia@amazon.com>
|
||||
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
|
||||
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
---
|
||||
xen/arch/x86/mm.c | 92 +++++++++++++++++++++++++++++++++++++++++++++--
|
||||
1 file changed, 89 insertions(+), 3 deletions(-)
|
||||
|
||||
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
|
||||
index af726d3274..d6a0761f43 100644
|
||||
--- a/xen/arch/x86/mm.c
|
||||
+++ b/xen/arch/x86/mm.c
|
||||
@@ -2167,6 +2167,50 @@ void page_unlock(struct page_info *page)
|
||||
current_locked_page_set(NULL);
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * L3 table locks:
|
||||
+ *
|
||||
+ * Used for serialization in map_pages_to_xen() and modify_xen_mappings().
|
||||
+ *
|
||||
+ * For Xen PT pages, the page->u.inuse.type_info is unused and it is safe to
|
||||
+ * reuse the PGT_locked flag. This lock is taken only when we move down to L3
|
||||
+ * tables and below, since L4 (and above, for 5-level paging) is still globally
|
||||
+ * protected by map_pgdir_lock.
|
||||
+ *
|
||||
+ * PV MMU update hypercalls call map_pages_to_xen while holding a page's page_lock().
|
||||
+ * This has two implications:
|
||||
+ * - We cannot reuse reuse current_locked_page_* for debugging
|
||||
+ * - To avoid the chance of deadlock, even for different pages, we
|
||||
+ * must never grab page_lock() after grabbing l3t_lock(). This
|
||||
+ * includes any page_lock()-based locks, such as
|
||||
+ * mem_sharing_page_lock().
|
||||
+ *
|
||||
+ * Also note that we grab the map_pgdir_lock while holding the
|
||||
+ * l3t_lock(), so to avoid deadlock we must avoid grabbing them in
|
||||
+ * reverse order.
|
||||
+ */
|
||||
+static void l3t_lock(struct page_info *page)
|
||||
+{
|
||||
+ unsigned long x, nx;
|
||||
+
|
||||
+ do {
|
||||
+ while ( (x = page->u.inuse.type_info) & PGT_locked )
|
||||
+ cpu_relax();
|
||||
+ nx = x | PGT_locked;
|
||||
+ } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
|
||||
+}
|
||||
+
|
||||
+static void l3t_unlock(struct page_info *page)
|
||||
+{
|
||||
+ unsigned long x, nx, y = page->u.inuse.type_info;
|
||||
+
|
||||
+ do {
|
||||
+ x = y;
|
||||
+ BUG_ON(!(x & PGT_locked));
|
||||
+ nx = x & ~PGT_locked;
|
||||
+ } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
|
||||
+}
|
||||
+
|
||||
#ifdef CONFIG_PV
|
||||
/*
|
||||
* PTE flags that a guest may change without re-validating the PTE.
|
||||
@@ -5177,6 +5221,23 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
|
||||
flush_area_local((const void *)v, f) : \
|
||||
flush_area_all((const void *)v, f))
|
||||
|
||||
+#define L3T_INIT(page) (page) = ZERO_BLOCK_PTR
|
||||
+
|
||||
+#define L3T_LOCK(page) \
|
||||
+ do { \
|
||||
+ if ( locking ) \
|
||||
+ l3t_lock(page); \
|
||||
+ } while ( false )
|
||||
+
|
||||
+#define L3T_UNLOCK(page) \
|
||||
+ do { \
|
||||
+ if ( locking && (page) != ZERO_BLOCK_PTR ) \
|
||||
+ { \
|
||||
+ l3t_unlock(page); \
|
||||
+ (page) = ZERO_BLOCK_PTR; \
|
||||
+ } \
|
||||
+ } while ( false )
|
||||
+
|
||||
int map_pages_to_xen(
|
||||
unsigned long virt,
|
||||
mfn_t mfn,
|
||||
@@ -5188,6 +5249,7 @@ int map_pages_to_xen(
|
||||
l1_pgentry_t *pl1e, ol1e;
|
||||
unsigned int i;
|
||||
int rc = -ENOMEM;
|
||||
+ struct page_info *current_l3page;
|
||||
|
||||
#define flush_flags(oldf) do { \
|
||||
unsigned int o_ = (oldf); \
|
||||
@@ -5203,13 +5265,20 @@ int map_pages_to_xen(
|
||||
} \
|
||||
} while (0)
|
||||
|
||||
+ L3T_INIT(current_l3page);
|
||||
+
|
||||
while ( nr_mfns != 0 )
|
||||
{
|
||||
- l3_pgentry_t ol3e, *pl3e = virt_to_xen_l3e(virt);
|
||||
+ l3_pgentry_t *pl3e, ol3e;
|
||||
|
||||
+ L3T_UNLOCK(current_l3page);
|
||||
+
|
||||
+ pl3e = virt_to_xen_l3e(virt);
|
||||
if ( !pl3e )
|
||||
goto out;
|
||||
|
||||
+ current_l3page = virt_to_page(pl3e);
|
||||
+ L3T_LOCK(current_l3page);
|
||||
ol3e = *pl3e;
|
||||
|
||||
if ( cpu_has_page1gb &&
|
||||
@@ -5543,6 +5612,7 @@ int map_pages_to_xen(
|
||||
rc = 0;
|
||||
|
||||
out:
|
||||
+ L3T_UNLOCK(current_l3page);
|
||||
return rc;
|
||||
}
|
||||
|
||||
@@ -5571,6 +5641,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
|
||||
unsigned int i;
|
||||
unsigned long v = s;
|
||||
int rc = -ENOMEM;
|
||||
+ struct page_info *current_l3page;
|
||||
|
||||
/* Set of valid PTE bits which may be altered. */
|
||||
#define FLAGS_MASK (_PAGE_NX|_PAGE_RW|_PAGE_PRESENT)
|
||||
@@ -5579,11 +5650,22 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
|
||||
ASSERT(IS_ALIGNED(s, PAGE_SIZE));
|
||||
ASSERT(IS_ALIGNED(e, PAGE_SIZE));
|
||||
|
||||
+ L3T_INIT(current_l3page);
|
||||
+
|
||||
while ( v < e )
|
||||
{
|
||||
- l3_pgentry_t *pl3e = virt_to_xen_l3e(v);
|
||||
+ l3_pgentry_t *pl3e;
|
||||
+
|
||||
+ L3T_UNLOCK(current_l3page);
|
||||
|
||||
- if ( !pl3e || !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
|
||||
+ pl3e = virt_to_xen_l3e(v);
|
||||
+ if ( !pl3e )
|
||||
+ goto out;
|
||||
+
|
||||
+ current_l3page = virt_to_page(pl3e);
|
||||
+ L3T_LOCK(current_l3page);
|
||||
+
|
||||
+ if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
|
||||
{
|
||||
/* Confirm the caller isn't trying to create new mappings. */
|
||||
ASSERT(!(nf & _PAGE_PRESENT));
|
||||
@@ -5801,9 +5883,13 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
|
||||
rc = 0;
|
||||
|
||||
out:
|
||||
+ L3T_UNLOCK(current_l3page);
|
||||
return rc;
|
||||
}
|
||||
|
||||
+#undef L3T_LOCK
|
||||
+#undef L3T_UNLOCK
|
||||
+
|
||||
#undef flush_area
|
||||
|
||||
int destroy_xen_mappings(unsigned long s, unsigned long e)
|
||||
--
|
||||
2.25.1
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: IOMMU: suppress "iommu_dont_flush_iotlb" when about to free a page
|
||||
|
||||
Deferring flushes to a single, wide range one - as is done when
|
||||
handling XENMAPSPACE_gmfn_range - is okay only as long as
|
||||
pages don't get freed ahead of the eventual flush. While the only
|
||||
function setting the flag (xenmem_add_to_physmap()) suggests by its name
|
||||
that it's only mapping new entries, in reality the way
|
||||
xenmem_add_to_physmap_one() works means an unmap would happen not only
|
||||
for the page being moved (but not freed) but, if the destination GFN is
|
||||
populated, also for the page being displaced from that GFN. Collapsing
|
||||
the two flushes for this GFN into just one (end even more so deferring
|
||||
it to a batched invocation) is not correct.
|
||||
|
||||
This is part of XSA-346.
|
||||
|
||||
Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Paul Durrant <paul@xen.org>
|
||||
Acked-by: Julien Grall <jgrall@amazon.com>
|
||||
|
||||
--- a/xen/common/memory.c
|
||||
+++ b/xen/common/memory.c
|
||||
@@ -292,6 +292,7 @@ int guest_remove_page(struct domain *d,
|
||||
p2m_type_t p2mt;
|
||||
#endif
|
||||
mfn_t mfn;
|
||||
+ bool *dont_flush_p, dont_flush;
|
||||
int rc;
|
||||
|
||||
#ifdef CONFIG_X86
|
||||
@@ -378,8 +379,18 @@ int guest_remove_page(struct domain *d,
|
||||
return -ENXIO;
|
||||
}
|
||||
|
||||
+ /*
|
||||
+ * Since we're likely to free the page below, we need to suspend
|
||||
+ * xenmem_add_to_physmap()'s suppressing of IOMMU TLB flushes.
|
||||
+ */
|
||||
+ dont_flush_p = &this_cpu(iommu_dont_flush_iotlb);
|
||||
+ dont_flush = *dont_flush_p;
|
||||
+ *dont_flush_p = false;
|
||||
+
|
||||
rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0);
|
||||
|
||||
+ *dont_flush_p = dont_flush;
|
||||
+
|
||||
/*
|
||||
* With the lack of an IOMMU on some platforms, domains with DMA-capable
|
||||
* device must retrieve the same pfn when the hypercall populate_physmap
|
|
@ -1,204 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: IOMMU: hold page ref until after deferred TLB flush
|
||||
|
||||
When moving around a page via XENMAPSPACE_gmfn_range, deferring the TLB
|
||||
flush for the "from" GFN range requires that the page remains allocated
|
||||
to the guest until the TLB flush has actually occurred. Otherwise a
|
||||
parallel hypercall to remove the page would only flush the TLB for the
|
||||
GFN it has been moved to, but not the one is was mapped at originally.
|
||||
|
||||
This is part of XSA-346.
|
||||
|
||||
Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
|
||||
Reported-by: Julien Grall <jgrall@amazon.com>
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Acked-by: Julien Grall <jgrall@amazon.com>
|
||||
|
||||
--- a/xen/arch/arm/mm.c
|
||||
+++ b/xen/arch/arm/mm.c
|
||||
@@ -1407,7 +1407,7 @@ void share_xen_page_with_guest(struct pa
|
||||
int xenmem_add_to_physmap_one(
|
||||
struct domain *d,
|
||||
unsigned int space,
|
||||
- union xen_add_to_physmap_batch_extra extra,
|
||||
+ union add_to_physmap_extra extra,
|
||||
unsigned long idx,
|
||||
gfn_t gfn)
|
||||
{
|
||||
@@ -1480,10 +1480,6 @@ int xenmem_add_to_physmap_one(
|
||||
break;
|
||||
}
|
||||
case XENMAPSPACE_dev_mmio:
|
||||
- /* extra should be 0. Reserved for future use. */
|
||||
- if ( extra.res0 )
|
||||
- return -EOPNOTSUPP;
|
||||
-
|
||||
rc = map_dev_mmio_region(d, gfn, 1, _mfn(idx));
|
||||
return rc;
|
||||
|
||||
--- a/xen/arch/x86/mm.c
|
||||
+++ b/xen/arch/x86/mm.c
|
||||
@@ -4617,7 +4617,7 @@ static int handle_iomem_range(unsigned l
|
||||
int xenmem_add_to_physmap_one(
|
||||
struct domain *d,
|
||||
unsigned int space,
|
||||
- union xen_add_to_physmap_batch_extra extra,
|
||||
+ union add_to_physmap_extra extra,
|
||||
unsigned long idx,
|
||||
gfn_t gpfn)
|
||||
{
|
||||
@@ -4701,9 +4701,20 @@ int xenmem_add_to_physmap_one(
|
||||
rc = guest_physmap_add_page(d, gpfn, mfn, PAGE_ORDER_4K);
|
||||
|
||||
put_both:
|
||||
- /* In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top. */
|
||||
+ /*
|
||||
+ * In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top.
|
||||
+ * We also may need to transfer ownership of the page reference to our
|
||||
+ * caller.
|
||||
+ */
|
||||
if ( space == XENMAPSPACE_gmfn )
|
||||
+ {
|
||||
put_gfn(d, gfn);
|
||||
+ if ( !rc && extra.ppage )
|
||||
+ {
|
||||
+ *extra.ppage = page;
|
||||
+ page = NULL;
|
||||
+ }
|
||||
+ }
|
||||
|
||||
if ( page )
|
||||
put_page(page);
|
||||
--- a/xen/common/memory.c
|
||||
+++ b/xen/common/memory.c
|
||||
@@ -814,13 +814,12 @@ int xenmem_add_to_physmap(struct domain
|
||||
{
|
||||
unsigned int done = 0;
|
||||
long rc = 0;
|
||||
- union xen_add_to_physmap_batch_extra extra;
|
||||
+ union add_to_physmap_extra extra = {};
|
||||
+ struct page_info *pages[16];
|
||||
|
||||
ASSERT(paging_mode_translate(d));
|
||||
|
||||
- if ( xatp->space != XENMAPSPACE_gmfn_foreign )
|
||||
- extra.res0 = 0;
|
||||
- else
|
||||
+ if ( xatp->space == XENMAPSPACE_gmfn_foreign )
|
||||
extra.foreign_domid = DOMID_INVALID;
|
||||
|
||||
if ( xatp->space != XENMAPSPACE_gmfn_range )
|
||||
@@ -835,7 +834,10 @@ int xenmem_add_to_physmap(struct domain
|
||||
xatp->size -= start;
|
||||
|
||||
if ( is_iommu_enabled(d) )
|
||||
+ {
|
||||
this_cpu(iommu_dont_flush_iotlb) = 1;
|
||||
+ extra.ppage = &pages[0];
|
||||
+ }
|
||||
|
||||
while ( xatp->size > done )
|
||||
{
|
||||
@@ -847,8 +849,12 @@ int xenmem_add_to_physmap(struct domain
|
||||
xatp->idx++;
|
||||
xatp->gpfn++;
|
||||
|
||||
+ if ( extra.ppage )
|
||||
+ ++extra.ppage;
|
||||
+
|
||||
/* Check for continuation if it's not the last iteration. */
|
||||
- if ( xatp->size > ++done && hypercall_preempt_check() )
|
||||
+ if ( (++done > ARRAY_SIZE(pages) && extra.ppage) ||
|
||||
+ (xatp->size > done && hypercall_preempt_check()) )
|
||||
{
|
||||
rc = start + done;
|
||||
break;
|
||||
@@ -858,6 +864,7 @@ int xenmem_add_to_physmap(struct domain
|
||||
if ( is_iommu_enabled(d) )
|
||||
{
|
||||
int ret;
|
||||
+ unsigned int i;
|
||||
|
||||
this_cpu(iommu_dont_flush_iotlb) = 0;
|
||||
|
||||
@@ -866,6 +873,15 @@ int xenmem_add_to_physmap(struct domain
|
||||
if ( unlikely(ret) && rc >= 0 )
|
||||
rc = ret;
|
||||
|
||||
+ /*
|
||||
+ * Now that the IOMMU TLB flush was done for the original GFN, drop
|
||||
+ * the page references. The 2nd flush below is fine to make later, as
|
||||
+ * whoever removes the page again from its new GFN will have to do
|
||||
+ * another flush anyway.
|
||||
+ */
|
||||
+ for ( i = 0; i < done; ++i )
|
||||
+ put_page(pages[i]);
|
||||
+
|
||||
ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
|
||||
IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
|
||||
if ( unlikely(ret) && rc >= 0 )
|
||||
@@ -879,6 +895,8 @@ static int xenmem_add_to_physmap_batch(s
|
||||
struct xen_add_to_physmap_batch *xatpb,
|
||||
unsigned int extent)
|
||||
{
|
||||
+ union add_to_physmap_extra extra = {};
|
||||
+
|
||||
if ( unlikely(xatpb->size < extent) )
|
||||
return -EILSEQ;
|
||||
|
||||
@@ -890,6 +908,19 @@ static int xenmem_add_to_physmap_batch(s
|
||||
!guest_handle_subrange_okay(xatpb->errs, extent, xatpb->size - 1) )
|
||||
return -EFAULT;
|
||||
|
||||
+ switch ( xatpb->space )
|
||||
+ {
|
||||
+ case XENMAPSPACE_dev_mmio:
|
||||
+ /* res0 is reserved for future use. */
|
||||
+ if ( xatpb->u.res0 )
|
||||
+ return -EOPNOTSUPP;
|
||||
+ break;
|
||||
+
|
||||
+ case XENMAPSPACE_gmfn_foreign:
|
||||
+ extra.foreign_domid = xatpb->u.foreign_domid;
|
||||
+ break;
|
||||
+ }
|
||||
+
|
||||
while ( xatpb->size > extent )
|
||||
{
|
||||
xen_ulong_t idx;
|
||||
@@ -902,8 +933,7 @@ static int xenmem_add_to_physmap_batch(s
|
||||
extent, 1)) )
|
||||
return -EFAULT;
|
||||
|
||||
- rc = xenmem_add_to_physmap_one(d, xatpb->space,
|
||||
- xatpb->u,
|
||||
+ rc = xenmem_add_to_physmap_one(d, xatpb->space, extra,
|
||||
idx, _gfn(gpfn));
|
||||
|
||||
if ( unlikely(__copy_to_guest_offset(xatpb->errs, extent, &rc, 1)) )
|
||||
--- a/xen/include/xen/mm.h
|
||||
+++ b/xen/include/xen/mm.h
|
||||
@@ -588,8 +588,22 @@ void scrub_one_page(struct page_info *);
|
||||
&(d)->xenpage_list : &(d)->page_list)
|
||||
#endif
|
||||
|
||||
+union add_to_physmap_extra {
|
||||
+ /*
|
||||
+ * XENMAPSPACE_gmfn: When deferring TLB flushes, a page reference needs
|
||||
+ * to be kept until after the flush, so the page can't get removed from
|
||||
+ * the domain (and re-used for another purpose) beforehand. By passing
|
||||
+ * non-NULL, the caller of xenmem_add_to_physmap_one() indicates it wants
|
||||
+ * to have ownership of such a reference transferred in the success case.
|
||||
+ */
|
||||
+ struct page_info **ppage;
|
||||
+
|
||||
+ /* XENMAPSPACE_gmfn_foreign */
|
||||
+ domid_t foreign_domid;
|
||||
+};
|
||||
+
|
||||
int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
|
||||
- union xen_add_to_physmap_batch_extra extra,
|
||||
+ union add_to_physmap_extra extra,
|
||||
unsigned long idx, gfn_t gfn);
|
||||
|
||||
int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
|
|
@ -1,149 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: AMD/IOMMU: convert amd_iommu_pte from struct to union
|
||||
|
||||
This is to add a "raw" counterpart to the bitfield equivalent. Take the
|
||||
opportunity and
|
||||
- convert fields to bool / unsigned int,
|
||||
- drop the naming of the reserved field,
|
||||
- shorten the names of the ignored ones.
|
||||
|
||||
This is part of XSA-347.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Reviewed-by: Paul Durrant <paul@xen.org>
|
||||
|
||||
--- a/xen/drivers/passthrough/amd/iommu_map.c
|
||||
+++ b/xen/drivers/passthrough/amd/iommu_map.c
|
||||
@@ -38,7 +38,7 @@ static unsigned int pfn_to_pde_idx(unsig
|
||||
static unsigned int clear_iommu_pte_present(unsigned long l1_mfn,
|
||||
unsigned long dfn)
|
||||
{
|
||||
- struct amd_iommu_pte *table, *pte;
|
||||
+ union amd_iommu_pte *table, *pte;
|
||||
unsigned int flush_flags;
|
||||
|
||||
table = map_domain_page(_mfn(l1_mfn));
|
||||
@@ -52,7 +52,7 @@ static unsigned int clear_iommu_pte_pres
|
||||
return flush_flags;
|
||||
}
|
||||
|
||||
-static unsigned int set_iommu_pde_present(struct amd_iommu_pte *pte,
|
||||
+static unsigned int set_iommu_pde_present(union amd_iommu_pte *pte,
|
||||
unsigned long next_mfn,
|
||||
unsigned int next_level, bool iw,
|
||||
bool ir)
|
||||
@@ -87,7 +87,7 @@ static unsigned int set_iommu_pte_presen
|
||||
int pde_level,
|
||||
bool iw, bool ir)
|
||||
{
|
||||
- struct amd_iommu_pte *table, *pde;
|
||||
+ union amd_iommu_pte *table, *pde;
|
||||
unsigned int flush_flags;
|
||||
|
||||
table = map_domain_page(_mfn(pt_mfn));
|
||||
@@ -178,7 +178,7 @@ void iommu_dte_set_guest_cr3(struct amd_
|
||||
static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
|
||||
unsigned long pt_mfn[], bool map)
|
||||
{
|
||||
- struct amd_iommu_pte *pde, *next_table_vaddr;
|
||||
+ union amd_iommu_pte *pde, *next_table_vaddr;
|
||||
unsigned long next_table_mfn;
|
||||
unsigned int level;
|
||||
struct page_info *table;
|
||||
@@ -458,7 +458,7 @@ int __init amd_iommu_quarantine_init(str
|
||||
unsigned long end_gfn =
|
||||
1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT);
|
||||
unsigned int level = amd_iommu_get_paging_mode(end_gfn);
|
||||
- struct amd_iommu_pte *table;
|
||||
+ union amd_iommu_pte *table;
|
||||
|
||||
if ( hd->arch.root_table )
|
||||
{
|
||||
@@ -489,7 +489,7 @@ int __init amd_iommu_quarantine_init(str
|
||||
|
||||
for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
|
||||
{
|
||||
- struct amd_iommu_pte *pde = &table[i];
|
||||
+ union amd_iommu_pte *pde = &table[i];
|
||||
|
||||
/*
|
||||
* PDEs are essentially a subset of PTEs, so this function
|
||||
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
|
||||
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
|
||||
@@ -390,7 +390,7 @@ static void deallocate_next_page_table(s
|
||||
|
||||
static void deallocate_page_table(struct page_info *pg)
|
||||
{
|
||||
- struct amd_iommu_pte *table_vaddr;
|
||||
+ union amd_iommu_pte *table_vaddr;
|
||||
unsigned int index, level = PFN_ORDER(pg);
|
||||
|
||||
PFN_ORDER(pg) = 0;
|
||||
@@ -405,7 +405,7 @@ static void deallocate_page_table(struct
|
||||
|
||||
for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
|
||||
{
|
||||
- struct amd_iommu_pte *pde = &table_vaddr[index];
|
||||
+ union amd_iommu_pte *pde = &table_vaddr[index];
|
||||
|
||||
if ( pde->mfn && pde->next_level && pde->pr )
|
||||
{
|
||||
@@ -557,7 +557,7 @@ static void amd_dump_p2m_table_level(str
|
||||
paddr_t gpa, int indent)
|
||||
{
|
||||
paddr_t address;
|
||||
- struct amd_iommu_pte *table_vaddr;
|
||||
+ const union amd_iommu_pte *table_vaddr;
|
||||
int index;
|
||||
|
||||
if ( level < 1 )
|
||||
@@ -573,7 +573,7 @@ static void amd_dump_p2m_table_level(str
|
||||
|
||||
for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
|
||||
{
|
||||
- struct amd_iommu_pte *pde = &table_vaddr[index];
|
||||
+ const union amd_iommu_pte *pde = &table_vaddr[index];
|
||||
|
||||
if ( !(index % 2) )
|
||||
process_pending_softirqs();
|
||||
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
|
||||
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
|
||||
@@ -465,20 +465,23 @@ union amd_iommu_x2apic_control {
|
||||
#define IOMMU_PAGE_TABLE_U32_PER_ENTRY (IOMMU_PAGE_TABLE_ENTRY_SIZE / 4)
|
||||
#define IOMMU_PAGE_TABLE_ALIGNMENT 4096
|
||||
|
||||
-struct amd_iommu_pte {
|
||||
- uint64_t pr:1;
|
||||
- uint64_t ignored0:4;
|
||||
- uint64_t a:1;
|
||||
- uint64_t d:1;
|
||||
- uint64_t ignored1:2;
|
||||
- uint64_t next_level:3;
|
||||
- uint64_t mfn:40;
|
||||
- uint64_t reserved:7;
|
||||
- uint64_t u:1;
|
||||
- uint64_t fc:1;
|
||||
- uint64_t ir:1;
|
||||
- uint64_t iw:1;
|
||||
- uint64_t ignored2:1;
|
||||
+union amd_iommu_pte {
|
||||
+ uint64_t raw;
|
||||
+ struct {
|
||||
+ bool pr:1;
|
||||
+ unsigned int ign0:4;
|
||||
+ bool a:1;
|
||||
+ bool d:1;
|
||||
+ unsigned int ign1:2;
|
||||
+ unsigned int next_level:3;
|
||||
+ uint64_t mfn:40;
|
||||
+ unsigned int :7;
|
||||
+ bool u:1;
|
||||
+ bool fc:1;
|
||||
+ bool ir:1;
|
||||
+ bool iw:1;
|
||||
+ unsigned int ign2:1;
|
||||
+ };
|
||||
};
|
||||
|
||||
/* Paging modes */
|
|
@ -1,72 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: AMD/IOMMU: update live PTEs atomically
|
||||
|
||||
Updating a live PTE bitfield by bitfield risks the compiler re-ordering
|
||||
the individual updates as well as splitting individual updates into
|
||||
multiple memory writes. Construct the new entry fully in a local
|
||||
variable, do the check to determine the flushing needs on the thus
|
||||
established new entry, and then write the new entry by a single insn.
|
||||
|
||||
Similarly using memset() to clear a PTE is unsafe, as the order of
|
||||
writes the function does is, at least in principle, undefined.
|
||||
|
||||
This is part of XSA-347.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Paul Durrant <paul@xen.org>
|
||||
|
||||
--- a/xen/drivers/passthrough/amd/iommu_map.c
|
||||
+++ b/xen/drivers/passthrough/amd/iommu_map.c
|
||||
@@ -45,7 +45,7 @@ static unsigned int clear_iommu_pte_pres
|
||||
pte = &table[pfn_to_pde_idx(dfn, 1)];
|
||||
|
||||
flush_flags = pte->pr ? IOMMU_FLUSHF_modified : 0;
|
||||
- memset(pte, 0, sizeof(*pte));
|
||||
+ write_atomic(&pte->raw, 0);
|
||||
|
||||
unmap_domain_page(table);
|
||||
|
||||
@@ -57,26 +57,30 @@ static unsigned int set_iommu_pde_presen
|
||||
unsigned int next_level, bool iw,
|
||||
bool ir)
|
||||
{
|
||||
+ union amd_iommu_pte new = {}, old;
|
||||
unsigned int flush_flags = IOMMU_FLUSHF_added;
|
||||
|
||||
- if ( pte->pr &&
|
||||
- (pte->mfn != next_mfn ||
|
||||
- pte->iw != iw ||
|
||||
- pte->ir != ir ||
|
||||
- pte->next_level != next_level) )
|
||||
- flush_flags |= IOMMU_FLUSHF_modified;
|
||||
-
|
||||
/*
|
||||
* FC bit should be enabled in PTE, this helps to solve potential
|
||||
* issues with ATS devices
|
||||
*/
|
||||
- pte->fc = !next_level;
|
||||
+ new.fc = !next_level;
|
||||
+
|
||||
+ new.mfn = next_mfn;
|
||||
+ new.iw = iw;
|
||||
+ new.ir = ir;
|
||||
+ new.next_level = next_level;
|
||||
+ new.pr = true;
|
||||
+
|
||||
+ old.raw = read_atomic(&pte->raw);
|
||||
+ old.ign0 = 0;
|
||||
+ old.ign1 = 0;
|
||||
+ old.ign2 = 0;
|
||||
+
|
||||
+ if ( old.pr && old.raw != new.raw )
|
||||
+ flush_flags |= IOMMU_FLUSHF_modified;
|
||||
|
||||
- pte->mfn = next_mfn;
|
||||
- pte->iw = iw;
|
||||
- pte->ir = ir;
|
||||
- pte->next_level = next_level;
|
||||
- pte->pr = 1;
|
||||
+ write_atomic(&pte->raw, new.raw);
|
||||
|
||||
return flush_flags;
|
||||
}
|
|
@ -1,59 +0,0 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: AMD/IOMMU: ensure suitable ordering of DTE modifications
|
||||
|
||||
DMA and interrupt translation should be enabled only after other
|
||||
applicable DTE fields have been written. Similarly when disabling
|
||||
translation or when moving a device between domains, translation should
|
||||
first be disabled, before other entry fields get modified. Note however
|
||||
that the "moving" aspect doesn't apply to the interrupt remapping side,
|
||||
as domain specifics are maintained in the IRTEs here, not the DTE. We
|
||||
also never disable interrupt remapping once it got enabled for a device
|
||||
(the respective argument passed is always the immutable iommu_intremap).
|
||||
|
||||
This is part of XSA-347.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Paul Durrant <paul@xen.org>
|
||||
|
||||
--- a/xen/drivers/passthrough/amd/iommu_map.c
|
||||
+++ b/xen/drivers/passthrough/amd/iommu_map.c
|
||||
@@ -107,11 +107,18 @@ void amd_iommu_set_root_page_table(struc
|
||||
uint64_t root_ptr, uint16_t domain_id,
|
||||
uint8_t paging_mode, bool valid)
|
||||
{
|
||||
+ if ( valid || dte->v )
|
||||
+ {
|
||||
+ dte->tv = false;
|
||||
+ dte->v = true;
|
||||
+ smp_wmb();
|
||||
+ }
|
||||
dte->domain_id = domain_id;
|
||||
dte->pt_root = paddr_to_pfn(root_ptr);
|
||||
dte->iw = true;
|
||||
dte->ir = true;
|
||||
dte->paging_mode = paging_mode;
|
||||
+ smp_wmb();
|
||||
dte->tv = true;
|
||||
dte->v = valid;
|
||||
}
|
||||
@@ -134,6 +141,7 @@ void amd_iommu_set_intremap_table(
|
||||
}
|
||||
|
||||
dte->ig = false; /* unmapped interrupts result in i/o page faults */
|
||||
+ smp_wmb();
|
||||
dte->iv = valid;
|
||||
}
|
||||
|
||||
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
|
||||
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
|
||||
@@ -120,7 +120,10 @@ static void amd_iommu_setup_domain_devic
|
||||
/* Undo what amd_iommu_disable_domain_device() may have done. */
|
||||
ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
|
||||
if ( dte->it_root )
|
||||
+ {
|
||||
dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
|
||||
+ smp_wmb();
|
||||
+ }
|
||||
dte->iv = iommu_intremap;
|
||||
dte->ex = ivrs_dev->dte_allow_exclusion;
|
||||
dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);
|
|
@ -0,0 +1,85 @@
|
|||
From b1e5a89f19d9919c3eae17ab9c6a663b0801ad9c Mon Sep 17 00:00:00 2001
|
||||
From: Julien Grall <jgrall@amazon.com>
|
||||
Date: Mon, 17 May 2021 17:47:13 +0100
|
||||
Subject: [PATCH 1/2] xen/arm: Create dom0less domUs earlier
|
||||
|
||||
In a follow-up patch we will need to unallocate the boot modules
|
||||
before heap_init_late() is called.
|
||||
|
||||
The modules will contain the domUs kernel and initramfs. Therefore Xen
|
||||
will need to create extra domUs (used by dom0less) before heap_init_late().
|
||||
|
||||
This has two consequences on dom0less:
|
||||
1) Domains will not be unpaused as soon as they are created but
|
||||
once all have been created. However, Xen doesn't guarantee an order
|
||||
to unpause, so this is not something one could rely on.
|
||||
|
||||
2) The memory allocated for a domU will not be scrubbed anymore when an
|
||||
admin select bootscrub=on. This is not something we advertised, but if
|
||||
this is a concern we can introduce either force scrub for all domUs or
|
||||
a per-domain flag in the DT. The behavior for bootscrub=off and
|
||||
bootscrub=idle (default) has not changed.
|
||||
|
||||
This is part of XSA-372 / CVE-2021-28693.
|
||||
|
||||
Signed-off-by: Julien Grall <jgrall@amazon.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
---
|
||||
xen/arch/arm/domain_build.c | 2 --
|
||||
xen/arch/arm/setup.c | 11 ++++++-----
|
||||
2 files changed, 6 insertions(+), 7 deletions(-)
|
||||
|
||||
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
|
||||
index 374bf655ee34..4203ddcca0e3 100644
|
||||
--- a/xen/arch/arm/domain_build.c
|
||||
+++ b/xen/arch/arm/domain_build.c
|
||||
@@ -2515,8 +2515,6 @@ void __init create_domUs(void)
|
||||
|
||||
if ( construct_domU(d, node) != 0 )
|
||||
panic("Could not set up domain %s\n", dt_node_name(node));
|
||||
-
|
||||
- domain_unpause_by_systemcontroller(d);
|
||||
}
|
||||
}
|
||||
|
||||
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
|
||||
index 2532ec973913..441e0e16e9f0 100644
|
||||
--- a/xen/arch/arm/setup.c
|
||||
+++ b/xen/arch/arm/setup.c
|
||||
@@ -804,7 +804,7 @@ void __init start_xen(unsigned long boot_phys_offset,
|
||||
int cpus, i;
|
||||
const char *cmdline;
|
||||
struct bootmodule *xen_bootmodule;
|
||||
- struct domain *dom0;
|
||||
+ struct domain *dom0, *d;
|
||||
struct xen_domctl_createdomain dom0_cfg = {
|
||||
.flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
|
||||
.max_evtchn_port = -1,
|
||||
@@ -987,6 +987,9 @@ void __init start_xen(unsigned long boot_phys_offset,
|
||||
if ( construct_dom0(dom0) != 0)
|
||||
panic("Could not set up DOM0 guest OS\n");
|
||||
|
||||
+ if ( acpi_disabled )
|
||||
+ create_domUs();
|
||||
+
|
||||
heap_init_late();
|
||||
|
||||
init_trace_bufs();
|
||||
@@ -1000,10 +1003,8 @@ void __init start_xen(unsigned long boot_phys_offset,
|
||||
|
||||
system_state = SYS_STATE_active;
|
||||
|
||||
- if ( acpi_disabled )
|
||||
- create_domUs();
|
||||
-
|
||||
- domain_unpause_by_systemcontroller(dom0);
|
||||
+ for_each_domain( d )
|
||||
+ domain_unpause_by_systemcontroller(d);
|
||||
|
||||
/* Switch on to the dynamically allocated stack for the idle vcpu
|
||||
* since the static one we're running on is about to be freed. */
|
||||
--
|
||||
2.17.1
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
From 09bb28bdef3fb5e7d08bdd641601ca0c0d4d82b4 Mon Sep 17 00:00:00 2001
|
||||
From: Julien Grall <jgrall@amazon.com>
|
||||
Date: Sat, 17 Apr 2021 17:38:28 +0100
|
||||
Subject: [PATCH 2/2] xen/arm: Boot modules should always be scrubbed if
|
||||
bootscrub={on, idle}
|
||||
|
||||
The function to initialize the pages (see init_heap_pages()) will request
|
||||
scrub when the admin request idle bootscrub (default) and state ==
|
||||
SYS_STATE_active. When bootscrub=on, Xen will scrub any free pages in
|
||||
heap_init_late().
|
||||
|
||||
Currently, the boot modules (e.g. kernels, initramfs) will be discarded/
|
||||
freed after heap_init_late() is called and system_state switched to
|
||||
SYS_STATE_active. This means the pages associated with the boot modules
|
||||
will not get scrubbed before getting re-purposed.
|
||||
|
||||
If the memory is assigned to an untrusted domU, it may be able to
|
||||
retrieve secrets from the modules.
|
||||
|
||||
This is part of XSA-372 / CVE-2021-28693.
|
||||
|
||||
Fixes: 1774e9b1df27 ("xen/arm: introduce create_domUs")
|
||||
Signed-off-by: Julien Grall <jgrall@amazon.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
|
||||
---
|
||||
xen/arch/arm/setup.c | 8 ++++++--
|
||||
1 file changed, 6 insertions(+), 2 deletions(-)
|
||||
|
||||
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
|
||||
index 441e0e16e9f0..8afb78f2c985 100644
|
||||
--- a/xen/arch/arm/setup.c
|
||||
+++ b/xen/arch/arm/setup.c
|
||||
@@ -72,8 +72,6 @@ domid_t __read_mostly max_init_domid;
|
||||
|
||||
static __used void init_done(void)
|
||||
{
|
||||
- discard_initial_modules();
|
||||
-
|
||||
/* Must be done past setting system_state. */
|
||||
unregister_init_virtual_region();
|
||||
|
||||
@@ -990,6 +988,12 @@ void __init start_xen(unsigned long boot_phys_offset,
|
||||
if ( acpi_disabled )
|
||||
create_domUs();
|
||||
|
||||
+ /*
|
||||
+ * This needs to be called **before** heap_init_late() so modules
|
||||
+ * will be scrubbed (unless suppressed).
|
||||
+ */
|
||||
+ discard_initial_modules();
|
||||
+
|
||||
heap_init_late();
|
||||
|
||||
init_trace_bufs();
|
||||
--
|
||||
2.17.1
|
||||
|
|
@ -0,0 +1,120 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: VT-d: size qinval queue dynamically
|
||||
|
||||
With the present synchronous model, we need two slots for every
|
||||
operation (the operation itself and a wait descriptor). There can be
|
||||
one such pair of requests pending per CPU. To ensure that under all
|
||||
normal circumstances a slot is always available when one is requested,
|
||||
size the queue ring according to the number of present CPUs.
|
||||
|
||||
This is part of XSA-373 / CVE-2021-28692.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Paul Durrant <paul@xen.org>
|
||||
|
||||
--- a/xen/drivers/passthrough/vtd/iommu.h
|
||||
+++ b/xen/drivers/passthrough/vtd/iommu.h
|
||||
@@ -450,17 +450,9 @@ struct qinval_entry {
|
||||
}q;
|
||||
};
|
||||
|
||||
-/* Order of queue invalidation pages(max is 8) */
|
||||
-#define QINVAL_PAGE_ORDER 2
|
||||
-
|
||||
-#define QINVAL_ARCH_PAGE_ORDER (QINVAL_PAGE_ORDER + PAGE_SHIFT_4K - PAGE_SHIFT)
|
||||
-#define QINVAL_ARCH_PAGE_NR ( QINVAL_ARCH_PAGE_ORDER < 0 ? \
|
||||
- 1 : \
|
||||
- 1 << QINVAL_ARCH_PAGE_ORDER )
|
||||
-
|
||||
/* Each entry is 16 bytes, so 2^8 entries per page */
|
||||
#define QINVAL_ENTRY_ORDER ( PAGE_SHIFT - 4 )
|
||||
-#define QINVAL_ENTRY_NR (1 << (QINVAL_PAGE_ORDER + 8))
|
||||
+#define QINVAL_MAX_ENTRY_NR (1u << (7 + QINVAL_ENTRY_ORDER))
|
||||
|
||||
/* Status data flag */
|
||||
#define QINVAL_STAT_INIT 0
|
||||
--- a/xen/drivers/passthrough/vtd/qinval.c
|
||||
+++ b/xen/drivers/passthrough/vtd/qinval.c
|
||||
@@ -31,6 +31,9 @@
|
||||
|
||||
#define VTD_QI_TIMEOUT 1
|
||||
|
||||
+static unsigned int __read_mostly qi_pg_order;
|
||||
+static unsigned int __read_mostly qi_entry_nr;
|
||||
+
|
||||
static int __must_check invalidate_sync(struct vtd_iommu *iommu);
|
||||
|
||||
static void print_qi_regs(struct vtd_iommu *iommu)
|
||||
@@ -55,7 +58,7 @@ static unsigned int qinval_next_index(st
|
||||
tail >>= QINVAL_INDEX_SHIFT;
|
||||
|
||||
/* (tail+1 == head) indicates a full queue, wait for HW */
|
||||
- while ( ( tail + 1 ) % QINVAL_ENTRY_NR ==
|
||||
+ while ( ((tail + 1) & (qi_entry_nr - 1)) ==
|
||||
( dmar_readq(iommu->reg, DMAR_IQH_REG) >> QINVAL_INDEX_SHIFT ) )
|
||||
cpu_relax();
|
||||
|
||||
@@ -68,7 +71,7 @@ static void qinval_update_qtail(struct v
|
||||
|
||||
/* Need hold register lock when update tail */
|
||||
ASSERT( spin_is_locked(&iommu->register_lock) );
|
||||
- val = (index + 1) % QINVAL_ENTRY_NR;
|
||||
+ val = (index + 1) & (qi_entry_nr - 1);
|
||||
dmar_writeq(iommu->reg, DMAR_IQT_REG, (val << QINVAL_INDEX_SHIFT));
|
||||
}
|
||||
|
||||
@@ -403,8 +406,28 @@ int enable_qinval(struct vtd_iommu *iomm
|
||||
|
||||
if ( iommu->qinval_maddr == 0 )
|
||||
{
|
||||
- iommu->qinval_maddr = alloc_pgtable_maddr(QINVAL_ARCH_PAGE_NR,
|
||||
- iommu->node);
|
||||
+ if ( !qi_entry_nr )
|
||||
+ {
|
||||
+ /*
|
||||
+ * With the present synchronous model, we need two slots for every
|
||||
+ * operation (the operation itself and a wait descriptor). There
|
||||
+ * can be one such pair of requests pending per CPU. One extra
|
||||
+ * entry is needed as the ring is considered full when there's
|
||||
+ * only one entry left.
|
||||
+ */
|
||||
+ BUILD_BUG_ON(CONFIG_NR_CPUS * 2 >= QINVAL_MAX_ENTRY_NR);
|
||||
+ qi_pg_order = get_order_from_bytes((num_present_cpus() * 2 + 1) <<
|
||||
+ (PAGE_SHIFT -
|
||||
+ QINVAL_ENTRY_ORDER));
|
||||
+ qi_entry_nr = 1u << (qi_pg_order + QINVAL_ENTRY_ORDER);
|
||||
+
|
||||
+ dprintk(XENLOG_INFO VTDPREFIX,
|
||||
+ "QI: using %u-entry ring(s)\n", qi_entry_nr);
|
||||
+ }
|
||||
+
|
||||
+ iommu->qinval_maddr =
|
||||
+ alloc_pgtable_maddr(qi_entry_nr >> QINVAL_ENTRY_ORDER,
|
||||
+ iommu->node);
|
||||
if ( iommu->qinval_maddr == 0 )
|
||||
{
|
||||
dprintk(XENLOG_WARNING VTDPREFIX,
|
||||
@@ -418,15 +441,16 @@ int enable_qinval(struct vtd_iommu *iomm
|
||||
|
||||
spin_lock_irqsave(&iommu->register_lock, flags);
|
||||
|
||||
- /* Setup Invalidation Queue Address(IQA) register with the
|
||||
- * address of the page we just allocated. QS field at
|
||||
- * bits[2:0] to indicate size of queue is one 4KB page.
|
||||
- * That's 256 entries. Queued Head (IQH) and Queue Tail (IQT)
|
||||
- * registers are automatically reset to 0 with write
|
||||
- * to IQA register.
|
||||
+ /*
|
||||
+ * Setup Invalidation Queue Address (IQA) register with the address of the
|
||||
+ * pages we just allocated. The QS field at bits[2:0] indicates the size
|
||||
+ * (page order) of the queue.
|
||||
+ *
|
||||
+ * Queued Head (IQH) and Queue Tail (IQT) registers are automatically
|
||||
+ * reset to 0 with write to IQA register.
|
||||
*/
|
||||
dmar_writeq(iommu->reg, DMAR_IQA_REG,
|
||||
- iommu->qinval_maddr | QINVAL_PAGE_ORDER);
|
||||
+ iommu->qinval_maddr | qi_pg_order);
|
||||
|
||||
dmar_writeq(iommu->reg, DMAR_IQT_REG, 0);
|
||||
|
|
@ -0,0 +1,102 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: AMD/IOMMU: size command buffer dynamically
|
||||
|
||||
With the present synchronous model, we need two slots for every
|
||||
operation (the operation itself and a wait command). There can be one
|
||||
such pair of commands pending per CPU. To ensure that under all normal
|
||||
circumstances a slot is always available when one is requested, size the
|
||||
command ring according to the number of present CPUs.
|
||||
|
||||
This is part of XSA-373 / CVE-2021-28692.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Paul Durrant <paul@xen.org>
|
||||
|
||||
--- a/xen/drivers/passthrough/amd/iommu-defs.h
|
||||
+++ b/xen/drivers/passthrough/amd/iommu-defs.h
|
||||
@@ -20,9 +20,6 @@
|
||||
#ifndef AMD_IOMMU_DEFS_H
|
||||
#define AMD_IOMMU_DEFS_H
|
||||
|
||||
-/* IOMMU Command Buffer entries: in power of 2 increments, minimum of 256 */
|
||||
-#define IOMMU_CMD_BUFFER_DEFAULT_ENTRIES 512
|
||||
-
|
||||
/* IOMMU Event Log entries: in power of 2 increments, minimum of 256 */
|
||||
#define IOMMU_EVENT_LOG_DEFAULT_ENTRIES 512
|
||||
|
||||
@@ -164,8 +161,8 @@ struct amd_iommu_dte {
|
||||
#define IOMMU_CMD_BUFFER_LENGTH_MASK 0x0F000000
|
||||
#define IOMMU_CMD_BUFFER_LENGTH_SHIFT 24
|
||||
|
||||
-#define IOMMU_CMD_BUFFER_ENTRY_SIZE 16
|
||||
-#define IOMMU_CMD_BUFFER_POWER_OF2_ENTRIES_PER_PAGE 8
|
||||
+#define IOMMU_CMD_BUFFER_ENTRY_ORDER 4
|
||||
+#define IOMMU_CMD_BUFFER_MAX_ENTRIES (1u << 15)
|
||||
|
||||
#define IOMMU_CMD_OPCODE_MASK 0xF0000000
|
||||
#define IOMMU_CMD_OPCODE_SHIFT 28
|
||||
--- a/xen/drivers/passthrough/amd/iommu_cmd.c
|
||||
+++ b/xen/drivers/passthrough/amd/iommu_cmd.c
|
||||
@@ -24,7 +24,7 @@ static int queue_iommu_command(struct am
|
||||
{
|
||||
uint32_t tail, head;
|
||||
|
||||
- tail = iommu->cmd_buffer.tail + IOMMU_CMD_BUFFER_ENTRY_SIZE;
|
||||
+ tail = iommu->cmd_buffer.tail + sizeof(cmd_entry_t);
|
||||
if ( tail == iommu->cmd_buffer.size )
|
||||
tail = 0;
|
||||
|
||||
@@ -33,7 +33,7 @@ static int queue_iommu_command(struct am
|
||||
if ( head != tail )
|
||||
{
|
||||
memcpy(iommu->cmd_buffer.buffer + iommu->cmd_buffer.tail,
|
||||
- cmd, IOMMU_CMD_BUFFER_ENTRY_SIZE);
|
||||
+ cmd, sizeof(cmd_entry_t));
|
||||
|
||||
iommu->cmd_buffer.tail = tail;
|
||||
return 1;
|
||||
--- a/xen/drivers/passthrough/amd/iommu_init.c
|
||||
+++ b/xen/drivers/passthrough/amd/iommu_init.c
|
||||
@@ -118,7 +118,7 @@ static void register_iommu_cmd_buffer_in
|
||||
writel(entry, iommu->mmio_base + IOMMU_CMD_BUFFER_BASE_LOW_OFFSET);
|
||||
|
||||
power_of2_entries = get_order_from_bytes(iommu->cmd_buffer.size) +
|
||||
- IOMMU_CMD_BUFFER_POWER_OF2_ENTRIES_PER_PAGE;
|
||||
+ PAGE_SHIFT - IOMMU_CMD_BUFFER_ENTRY_ORDER;
|
||||
|
||||
entry = 0;
|
||||
iommu_set_addr_hi_to_reg(&entry, addr_hi);
|
||||
@@ -1018,9 +1018,31 @@ static void *__init allocate_ring_buffer
|
||||
static void * __init allocate_cmd_buffer(struct amd_iommu *iommu)
|
||||
{
|
||||
/* allocate 'command buffer' in power of 2 increments of 4K */
|
||||
+ static unsigned int __read_mostly nr_ents;
|
||||
+
|
||||
+ if ( !nr_ents )
|
||||
+ {
|
||||
+ unsigned int order;
|
||||
+
|
||||
+ /*
|
||||
+ * With the present synchronous model, we need two slots for every
|
||||
+ * operation (the operation itself and a wait command). There can be
|
||||
+ * one such pair of requests pending per CPU. One extra entry is
|
||||
+ * needed as the ring is considered full when there's only one entry
|
||||
+ * left.
|
||||
+ */
|
||||
+ BUILD_BUG_ON(CONFIG_NR_CPUS * 2 >= IOMMU_CMD_BUFFER_MAX_ENTRIES);
|
||||
+ order = get_order_from_bytes((num_present_cpus() * 2 + 1) <<
|
||||
+ IOMMU_CMD_BUFFER_ENTRY_ORDER);
|
||||
+ nr_ents = 1u << (order + PAGE_SHIFT - IOMMU_CMD_BUFFER_ENTRY_ORDER);
|
||||
+
|
||||
+ AMD_IOMMU_DEBUG("using %u-entry cmd ring(s)\n", nr_ents);
|
||||
+ }
|
||||
+
|
||||
+ BUILD_BUG_ON(sizeof(cmd_entry_t) != (1u << IOMMU_CMD_BUFFER_ENTRY_ORDER));
|
||||
+
|
||||
return allocate_ring_buffer(&iommu->cmd_buffer, sizeof(cmd_entry_t),
|
||||
- IOMMU_CMD_BUFFER_DEFAULT_ENTRIES,
|
||||
- "Command Buffer", false);
|
||||
+ nr_ents, "Command Buffer", false);
|
||||
}
|
||||
|
||||
static void * __init allocate_event_log(struct amd_iommu *iommu)
|
|
@ -0,0 +1,163 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: VT-d: eliminate flush related timeouts
|
||||
|
||||
Leaving an in-progress operation pending when it appears to take too
|
||||
long is problematic: If e.g. a QI command completed later, the write to
|
||||
the "poll slot" may instead be understood to signal a subsequently
|
||||
started command's completion. Also our accounting of the timeout period
|
||||
was actually wrong: We included the time it took for the command to
|
||||
actually make it to the front of the queue, which could be heavily
|
||||
affected by guests other than the one for which the flush is being
|
||||
performed.
|
||||
|
||||
Do away with all timeout detection on all flush related code paths.
|
||||
Log excessively long processing times (with a progressive threshold) to
|
||||
have some indication of problems in this area.
|
||||
|
||||
Additionally log (once) if qinval_next_index() didn't immediately find
|
||||
an available slot. Together with the earlier change sizing the queue(s)
|
||||
dynamically, we should now have a guarantee that with our fully
|
||||
synchronous model any demand for slots can actually be satisfied.
|
||||
|
||||
This is part of XSA-373 / CVE-2021-28692.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Paul Durrant <paul@xen.org>
|
||||
|
||||
--- a/xen/drivers/passthrough/vtd/dmar.h
|
||||
+++ b/xen/drivers/passthrough/vtd/dmar.h
|
||||
@@ -127,6 +127,34 @@ do {
|
||||
} \
|
||||
} while (0)
|
||||
|
||||
+#define IOMMU_FLUSH_WAIT(what, iommu, offset, op, cond, sts) \
|
||||
+do { \
|
||||
+ static unsigned int __read_mostly threshold = 1; \
|
||||
+ s_time_t start = NOW(); \
|
||||
+ s_time_t timeout = start + DMAR_OPERATION_TIMEOUT * threshold; \
|
||||
+ \
|
||||
+ for ( ; ; ) \
|
||||
+ { \
|
||||
+ sts = op(iommu->reg, offset); \
|
||||
+ if ( cond ) \
|
||||
+ break; \
|
||||
+ if ( timeout && NOW() > timeout ) \
|
||||
+ { \
|
||||
+ threshold |= threshold << 1; \
|
||||
+ printk(XENLOG_WARNING VTDPREFIX \
|
||||
+ " IOMMU#%u: %s flush taking too long\n", \
|
||||
+ iommu->index, what); \
|
||||
+ timeout = 0; \
|
||||
+ } \
|
||||
+ cpu_relax(); \
|
||||
+ } \
|
||||
+ \
|
||||
+ if ( !timeout ) \
|
||||
+ printk(XENLOG_WARNING VTDPREFIX \
|
||||
+ " IOMMU#%u: %s flush took %lums\n", \
|
||||
+ iommu->index, what, (NOW() - start) / 10000000); \
|
||||
+} while ( false )
|
||||
+
|
||||
int vtd_hw_check(void);
|
||||
void disable_pmr(struct vtd_iommu *iommu);
|
||||
int is_igd_drhd(struct acpi_drhd_unit *drhd);
|
||||
--- a/xen/drivers/passthrough/vtd/iommu.c
|
||||
+++ b/xen/drivers/passthrough/vtd/iommu.c
|
||||
@@ -373,8 +373,8 @@ static void iommu_flush_write_buffer(str
|
||||
dmar_writel(iommu->reg, DMAR_GCMD_REG, val | DMA_GCMD_WBF);
|
||||
|
||||
/* Make sure hardware complete it */
|
||||
- IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG, dmar_readl,
|
||||
- !(val & DMA_GSTS_WBFS), val);
|
||||
+ IOMMU_FLUSH_WAIT("write buffer", iommu, DMAR_GSTS_REG, dmar_readl,
|
||||
+ !(val & DMA_GSTS_WBFS), val);
|
||||
|
||||
spin_unlock_irqrestore(&iommu->register_lock, flags);
|
||||
}
|
||||
@@ -423,8 +423,8 @@ int vtd_flush_context_reg(struct vtd_iom
|
||||
dmar_writeq(iommu->reg, DMAR_CCMD_REG, val);
|
||||
|
||||
/* Make sure hardware complete it */
|
||||
- IOMMU_WAIT_OP(iommu, DMAR_CCMD_REG, dmar_readq,
|
||||
- !(val & DMA_CCMD_ICC), val);
|
||||
+ IOMMU_FLUSH_WAIT("context", iommu, DMAR_CCMD_REG, dmar_readq,
|
||||
+ !(val & DMA_CCMD_ICC), val);
|
||||
|
||||
spin_unlock_irqrestore(&iommu->register_lock, flags);
|
||||
/* flush context entry will implicitly flush write buffer */
|
||||
@@ -501,8 +501,8 @@ int vtd_flush_iotlb_reg(struct vtd_iommu
|
||||
dmar_writeq(iommu->reg, tlb_offset + 8, val);
|
||||
|
||||
/* Make sure hardware complete it */
|
||||
- IOMMU_WAIT_OP(iommu, (tlb_offset + 8), dmar_readq,
|
||||
- !(val & DMA_TLB_IVT), val);
|
||||
+ IOMMU_FLUSH_WAIT("iotlb", iommu, (tlb_offset + 8), dmar_readq,
|
||||
+ !(val & DMA_TLB_IVT), val);
|
||||
spin_unlock_irqrestore(&iommu->register_lock, flags);
|
||||
|
||||
/* check IOTLB invalidation granularity */
|
||||
--- a/xen/drivers/passthrough/vtd/qinval.c
|
||||
+++ b/xen/drivers/passthrough/vtd/qinval.c
|
||||
@@ -29,8 +29,6 @@
|
||||
#include "extern.h"
|
||||
#include "../ats.h"
|
||||
|
||||
-#define VTD_QI_TIMEOUT 1
|
||||
-
|
||||
static unsigned int __read_mostly qi_pg_order;
|
||||
static unsigned int __read_mostly qi_entry_nr;
|
||||
|
||||
@@ -60,7 +58,11 @@ static unsigned int qinval_next_index(st
|
||||
/* (tail+1 == head) indicates a full queue, wait for HW */
|
||||
while ( ((tail + 1) & (qi_entry_nr - 1)) ==
|
||||
( dmar_readq(iommu->reg, DMAR_IQH_REG) >> QINVAL_INDEX_SHIFT ) )
|
||||
+ {
|
||||
+ printk_once(XENLOG_ERR VTDPREFIX " IOMMU#%u: no QI slot available\n",
|
||||
+ iommu->index);
|
||||
cpu_relax();
|
||||
+ }
|
||||
|
||||
return tail;
|
||||
}
|
||||
@@ -180,23 +182,32 @@ static int __must_check queue_invalidate
|
||||
/* Now we don't support interrupt method */
|
||||
if ( sw )
|
||||
{
|
||||
- s_time_t timeout;
|
||||
-
|
||||
- /* In case all wait descriptor writes to same addr with same data */
|
||||
- timeout = NOW() + MILLISECS(flush_dev_iotlb ?
|
||||
- iommu_dev_iotlb_timeout : VTD_QI_TIMEOUT);
|
||||
+ static unsigned int __read_mostly threshold = 1;
|
||||
+ s_time_t start = NOW();
|
||||
+ s_time_t timeout = start + (flush_dev_iotlb
|
||||
+ ? iommu_dev_iotlb_timeout
|
||||
+ : 100) * MILLISECS(threshold);
|
||||
|
||||
while ( ACCESS_ONCE(*this_poll_slot) != QINVAL_STAT_DONE )
|
||||
{
|
||||
- if ( NOW() > timeout )
|
||||
+ if ( timeout && NOW() > timeout )
|
||||
{
|
||||
- print_qi_regs(iommu);
|
||||
+ threshold |= threshold << 1;
|
||||
printk(XENLOG_WARNING VTDPREFIX
|
||||
- " Queue invalidate wait descriptor timed out\n");
|
||||
- return -ETIMEDOUT;
|
||||
+ " IOMMU#%u: QI%s wait descriptor taking too long\n",
|
||||
+ iommu->index, flush_dev_iotlb ? " dev" : "");
|
||||
+ print_qi_regs(iommu);
|
||||
+ timeout = 0;
|
||||
}
|
||||
cpu_relax();
|
||||
}
|
||||
+
|
||||
+ if ( !timeout )
|
||||
+ printk(XENLOG_WARNING VTDPREFIX
|
||||
+ " IOMMU#%u: QI%s wait descriptor took %lums\n",
|
||||
+ iommu->index, flush_dev_iotlb ? " dev" : "",
|
||||
+ (NOW() - start) / 10000000);
|
||||
+
|
||||
return 0;
|
||||
}
|
||||
|
|
@ -0,0 +1,79 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: AMD/IOMMU: wait for command slot to be available
|
||||
|
||||
No caller cared about send_iommu_command() indicating unavailability of
|
||||
a slot. Hence if a sufficient number prior commands timed out, we did
|
||||
blindly assume that the requested command was submitted to the IOMMU
|
||||
when really it wasn't. This could mean both a hanging system (waiting
|
||||
for a command to complete that was never seen by the IOMMU) or blindly
|
||||
propagating success back to callers, making them believe they're fine
|
||||
to e.g. free previously unmapped pages.
|
||||
|
||||
Fold the three involved functions into one, add spin waiting for an
|
||||
available slot along the lines of VT-d's qinval_next_index(), and as a
|
||||
consequence drop all error indicator return types/values.
|
||||
|
||||
This is part of XSA-373 / CVE-2021-28692.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Paul Durrant <paul@xen.org>
|
||||
|
||||
--- a/xen/drivers/passthrough/amd/iommu_cmd.c
|
||||
+++ b/xen/drivers/passthrough/amd/iommu_cmd.c
|
||||
@@ -20,43 +20,30 @@
|
||||
#include "iommu.h"
|
||||
#include "../ats.h"
|
||||
|
||||
-static int queue_iommu_command(struct amd_iommu *iommu, u32 cmd[])
|
||||
+static void send_iommu_command(struct amd_iommu *iommu,
|
||||
+ const uint32_t cmd[4])
|
||||
{
|
||||
- uint32_t tail, head;
|
||||
+ uint32_t tail;
|
||||
|
||||
tail = iommu->cmd_buffer.tail + sizeof(cmd_entry_t);
|
||||
if ( tail == iommu->cmd_buffer.size )
|
||||
tail = 0;
|
||||
|
||||
- head = readl(iommu->mmio_base +
|
||||
- IOMMU_CMD_BUFFER_HEAD_OFFSET) & IOMMU_RING_BUFFER_PTR_MASK;
|
||||
- if ( head != tail )
|
||||
+ while ( tail == (readl(iommu->mmio_base +
|
||||
+ IOMMU_CMD_BUFFER_HEAD_OFFSET) &
|
||||
+ IOMMU_RING_BUFFER_PTR_MASK) )
|
||||
{
|
||||
- memcpy(iommu->cmd_buffer.buffer + iommu->cmd_buffer.tail,
|
||||
- cmd, sizeof(cmd_entry_t));
|
||||
-
|
||||
- iommu->cmd_buffer.tail = tail;
|
||||
- return 1;
|
||||
+ printk_once(XENLOG_ERR "AMD IOMMU %pp: no cmd slot available\n",
|
||||
+ &PCI_SBDF2(iommu->seg, iommu->bdf));
|
||||
+ cpu_relax();
|
||||
}
|
||||
|
||||
- return 0;
|
||||
-}
|
||||
-
|
||||
-static void commit_iommu_command_buffer(struct amd_iommu *iommu)
|
||||
-{
|
||||
- writel(iommu->cmd_buffer.tail,
|
||||
- iommu->mmio_base + IOMMU_CMD_BUFFER_TAIL_OFFSET);
|
||||
-}
|
||||
+ memcpy(iommu->cmd_buffer.buffer + iommu->cmd_buffer.tail,
|
||||
+ cmd, sizeof(cmd_entry_t));
|
||||
|
||||
-static int send_iommu_command(struct amd_iommu *iommu, u32 cmd[])
|
||||
-{
|
||||
- if ( queue_iommu_command(iommu, cmd) )
|
||||
- {
|
||||
- commit_iommu_command_buffer(iommu);
|
||||
- return 1;
|
||||
- }
|
||||
+ iommu->cmd_buffer.tail = tail;
|
||||
|
||||
- return 0;
|
||||
+ writel(tail, iommu->mmio_base + IOMMU_CMD_BUFFER_TAIL_OFFSET);
|
||||
}
|
||||
|
||||
static void flush_command_buffer(struct amd_iommu *iommu)
|
|
@ -0,0 +1,141 @@
|
|||
From: Jan Beulich <jbeulich@suse.com>
|
||||
Subject: AMD/IOMMU: drop command completion timeout
|
||||
|
||||
First and foremost - such timeouts were not signaled to callers, making
|
||||
them believe they're fine to e.g. free previously unmapped pages.
|
||||
|
||||
Mirror VT-d's behavior: A fixed number of loop iterations is not a
|
||||
suitable way to detect timeouts in an environment (CPU and bus speeds)
|
||||
independent manner anyway. Furthermore, leaving an in-progress operation
|
||||
pending when it appears to take too long is problematic: If a command
|
||||
completed later, the signaling of its completion may instead be
|
||||
understood to signal a subsequently started command's completion.
|
||||
|
||||
Log excessively long processing times (with a progressive threshold) to
|
||||
have some indication of problems in this area. Allow callers to specify
|
||||
a non-default timeout bias for this logging, using the same values as
|
||||
VT-d does, which in particular means a (by default) much larger value
|
||||
for device IO TLB invalidation.
|
||||
|
||||
This is part of XSA-373 / CVE-2021-28692.
|
||||
|
||||
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
||||
Reviewed-by: Paul Durrant <paul@xen.org>
|
||||
|
||||
--- a/xen/drivers/passthrough/amd/iommu_cmd.c
|
||||
+++ b/xen/drivers/passthrough/amd/iommu_cmd.c
|
||||
@@ -46,10 +46,12 @@ static void send_iommu_command(struct am
|
||||
writel(tail, iommu->mmio_base + IOMMU_CMD_BUFFER_TAIL_OFFSET);
|
||||
}
|
||||
|
||||
-static void flush_command_buffer(struct amd_iommu *iommu)
|
||||
+static void flush_command_buffer(struct amd_iommu *iommu,
|
||||
+ unsigned int timeout_base)
|
||||
{
|
||||
- unsigned int cmd[4], status, loop_count;
|
||||
- bool comp_wait;
|
||||
+ uint32_t cmd[4];
|
||||
+ s_time_t start, timeout;
|
||||
+ static unsigned int __read_mostly threshold = 1;
|
||||
|
||||
/* RW1C 'ComWaitInt' in status register */
|
||||
writel(IOMMU_STATUS_COMP_WAIT_INT,
|
||||
@@ -65,22 +67,29 @@ static void flush_command_buffer(struct
|
||||
IOMMU_COMP_WAIT_I_FLAG_SHIFT, &cmd[0]);
|
||||
send_iommu_command(iommu, cmd);
|
||||
|
||||
- /* Make loop_count long enough for polling completion wait bit */
|
||||
- loop_count = 1000;
|
||||
- do {
|
||||
- status = readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
|
||||
- comp_wait = status & IOMMU_STATUS_COMP_WAIT_INT;
|
||||
- --loop_count;
|
||||
- } while ( !comp_wait && loop_count );
|
||||
-
|
||||
- if ( comp_wait )
|
||||
+ start = NOW();
|
||||
+ timeout = start + (timeout_base ?: 100) * MILLISECS(threshold);
|
||||
+ while ( !(readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET) &
|
||||
+ IOMMU_STATUS_COMP_WAIT_INT) )
|
||||
{
|
||||
- /* RW1C 'ComWaitInt' in status register */
|
||||
- writel(IOMMU_STATUS_COMP_WAIT_INT,
|
||||
- iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
|
||||
- return;
|
||||
+ if ( timeout && NOW() > timeout )
|
||||
+ {
|
||||
+ threshold |= threshold << 1;
|
||||
+ printk(XENLOG_WARNING
|
||||
+ "AMD IOMMU %pp: %scompletion wait taking too long\n",
|
||||
+ &PCI_SBDF2(iommu->seg, iommu->bdf),
|
||||
+ timeout_base ? "iotlb " : "");
|
||||
+ timeout = 0;
|
||||
+ }
|
||||
+ cpu_relax();
|
||||
}
|
||||
- AMD_IOMMU_DEBUG("Warning: ComWaitInt bit did not assert!\n");
|
||||
+
|
||||
+ if ( !timeout )
|
||||
+ printk(XENLOG_WARNING
|
||||
+ "AMD IOMMU %pp: %scompletion wait took %lums\n",
|
||||
+ &PCI_SBDF2(iommu->seg, iommu->bdf),
|
||||
+ timeout_base ? "iotlb " : "",
|
||||
+ (NOW() - start) / 10000000);
|
||||
}
|
||||
|
||||
/* Build low level iommu command messages */
|
||||
@@ -291,7 +300,7 @@ void amd_iommu_flush_iotlb(u8 devfn, con
|
||||
/* send INVALIDATE_IOTLB_PAGES command */
|
||||
spin_lock_irqsave(&iommu->lock, flags);
|
||||
invalidate_iotlb_pages(iommu, maxpend, 0, queueid, daddr, req_id, order);
|
||||
- flush_command_buffer(iommu);
|
||||
+ flush_command_buffer(iommu, iommu_dev_iotlb_timeout);
|
||||
spin_unlock_irqrestore(&iommu->lock, flags);
|
||||
}
|
||||
|
||||
@@ -328,7 +337,7 @@ static void _amd_iommu_flush_pages(struc
|
||||
{
|
||||
spin_lock_irqsave(&iommu->lock, flags);
|
||||
invalidate_iommu_pages(iommu, daddr, dom_id, order);
|
||||
- flush_command_buffer(iommu);
|
||||
+ flush_command_buffer(iommu, 0);
|
||||
spin_unlock_irqrestore(&iommu->lock, flags);
|
||||
}
|
||||
|
||||
@@ -352,7 +361,7 @@ void amd_iommu_flush_device(struct amd_i
|
||||
ASSERT( spin_is_locked(&iommu->lock) );
|
||||
|
||||
invalidate_dev_table_entry(iommu, bdf);
|
||||
- flush_command_buffer(iommu);
|
||||
+ flush_command_buffer(iommu, 0);
|
||||
}
|
||||
|
||||
void amd_iommu_flush_intremap(struct amd_iommu *iommu, uint16_t bdf)
|
||||
@@ -360,7 +369,7 @@ void amd_iommu_flush_intremap(struct amd
|
||||
ASSERT( spin_is_locked(&iommu->lock) );
|
||||
|
||||
invalidate_interrupt_table(iommu, bdf);
|
||||
- flush_command_buffer(iommu);
|
||||
+ flush_command_buffer(iommu, 0);
|
||||
}
|
||||
|
||||
void amd_iommu_flush_all_caches(struct amd_iommu *iommu)
|
||||
@@ -368,7 +377,7 @@ void amd_iommu_flush_all_caches(struct a
|
||||
ASSERT( spin_is_locked(&iommu->lock) );
|
||||
|
||||
invalidate_iommu_all(iommu);
|
||||
- flush_command_buffer(iommu);
|
||||
+ flush_command_buffer(iommu, 0);
|
||||
}
|
||||
|
||||
void amd_iommu_send_guest_cmd(struct amd_iommu *iommu, u32 cmd[])
|
||||
@@ -378,7 +387,8 @@ void amd_iommu_send_guest_cmd(struct amd
|
||||
spin_lock_irqsave(&iommu->lock, flags);
|
||||
|
||||
send_iommu_command(iommu, cmd);
|
||||
- flush_command_buffer(iommu);
|
||||
+ /* TBD: Timeout selection may require peeking into cmd[]. */
|
||||
+ flush_command_buffer(iommu, 0);
|
||||
|
||||
spin_unlock_irqrestore(&iommu->lock, flags);
|
||||
}
|
|
@ -0,0 +1,50 @@
|
|||
From: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Subject: x86/spec-ctrl: Protect against Speculative Code Store Bypass
|
||||
|
||||
Modern x86 processors have far-better-than-architecturally-guaranteed self
|
||||
modifying code detection. Typically, when a write hits an instruction in
|
||||
flight, a Machine Clear occurs to flush stale content in the frontend and
|
||||
backend.
|
||||
|
||||
For self modifying code, before a write which hits an instruction in flight
|
||||
retires, the frontend can speculatively decode and execute the old instruction
|
||||
stream. Speculation of this form can suffer from type confusion in registers,
|
||||
and potentially leak data.
|
||||
|
||||
Furthermore, updates are typically byte-wise, rather than atomic. Depending
|
||||
on timing, speculation can race ahead multiple times between individual
|
||||
writes, and execute the transiently-malformed instruction stream.
|
||||
|
||||
Xen has stubs which are used in certain cases for emulation purposes. Inhibit
|
||||
speculation between updating the stub and executing it.
|
||||
|
||||
This is XSA-375 / CVE-2021-0089.
|
||||
|
||||
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
|
||||
index 8889509d2a..11467a1e3a 100644
|
||||
--- a/xen/arch/x86/pv/emul-priv-op.c
|
||||
+++ b/xen/arch/x86/pv/emul-priv-op.c
|
||||
@@ -138,6 +138,8 @@ static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
|
||||
/* Runtime confirmation that we haven't clobbered an adjacent stub. */
|
||||
BUG_ON(STUB_BUF_SIZE / 2 < (p - ctxt->io_emul_stub));
|
||||
|
||||
+ block_speculation(); /* SCSB */
|
||||
+
|
||||
/* Handy function-typed pointer to the stub. */
|
||||
return (void *)stub_va;
|
||||
|
||||
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
|
||||
index c25d88d0d8..f42ff2a837 100644
|
||||
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
|
||||
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
|
||||
@@ -1257,6 +1257,7 @@ static inline int mkec(uint8_t e, int32_t ec, ...)
|
||||
# define invoke_stub(pre, post, constraints...) do { \
|
||||
stub_exn.info = (union stub_exception_token) { .raw = ~0 }; \
|
||||
stub_exn.line = __LINE__; /* Utility outweighs livepatching cost */ \
|
||||
+ block_speculation(); /* SCSB */ \
|
||||
asm volatile ( pre "\n\tINDIRECT_CALL %[stub]\n\t" post "\n" \
|
||||
".Lret%=:\n\t" \
|
||||
".pushsection .fixup,\"ax\"\n" \
|
|
@ -0,0 +1,27 @@
|
|||
From: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Subject: x86/spec-ctrl: Mitigate TAA after S3 resume
|
||||
|
||||
The user chosen setting for MSR_TSX_CTRL needs restoring after S3.
|
||||
|
||||
All APs get the correct setting via start_secondary(), but the BSP was missed
|
||||
out.
|
||||
|
||||
This is XSA-377 / CVE-2021-28690.
|
||||
|
||||
Fixes: 8c4330818f6 ("x86/spec-ctrl: Mitigate the TSX Asynchronous Abort sidechannel")
|
||||
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
||||
Reviewed-by: Jan Beulich <jbeulich@suse.com>
|
||||
|
||||
diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
|
||||
index 91a8c4d0bd..31a56f02d0 100644
|
||||
--- a/xen/arch/x86/acpi/power.c
|
||||
+++ b/xen/arch/x86/acpi/power.c
|
||||
@@ -288,6 +288,8 @@ static int enter_state(u32 state)
|
||||
|
||||
microcode_update_one();
|
||||
|
||||
+ tsx_init(); /* Needs microcode. May change HLE/RTM feature bits. */
|
||||
+
|
||||
if ( !recheck_cpu_features(0) )
|
||||
panic("Missing previously available feature(s)\n");
|
||||
|
Loading…
Reference in New Issue