Commit Graph

875845 Commits

Author SHA1 Message Date
Reinette Chatre a232b35483 selftests/sgx: Enable multiple thread support
commit 26e688f126 upstream.

Each thread executing in an enclave is associated with a Thread Control
Structure (TCS). The test enclave contains two hardcoded TCS. Each TCS
contains meta-data used by the hardware to save and restore thread specific
information when entering/exiting the enclave.

The two TCS structures within the test enclave share their SSA (State Save
Area) resulting in the threads clobbering each other's data. Fix this by
providing each TCS their own SSA area.

Additionally, there is an 8K stack space and its address is
computed from the enclave entry point which is correctly done for
TCS #1 that starts on the first address inside the enclave but
results in out of bounds memory when entering as TCS #2. Split 8K
stack space into two separate pages with offset symbol between to ensure
the current enclave entry calculation can continue to be used for both
threads.

While using the enclave with multiple threads requires these fixes the
impact is not apparent because every test up to this point enters the
enclave from the first TCS.

More detail about the stack fix:
-------------------------------
Before this change the test enclave (test_encl) looks as follows:

.tcs (2 pages):
(page 1) TCS #1
(page 2) TCS #2

.text (1 page)
One page of code

.data (5 pages)
(page 1) encl_buffer
(page 2) encl_buffer
(page 3) SSA
(page 4 and 5) STACK
encl_stack:

As shown above there is a symbol, encl_stack, that points to the end of the
.data segment (pointing to the end of page 5 in .data) which is also the
end of the enclave.

The enclave entry code computes the stack address by adding encl_stack to
the pointer to the TCS that entered the enclave. When entering at TCS #1
the stack is computed correctly but when entering at TCS #2 the stack
pointer would point to one page beyond the end of the enclave and a #PF
would result when TCS #2 attempts to enter the enclave.

The fix involves moving the encl_stack symbol between the two stack pages.
Doing so enables the stack address computation in the entry code to compute
the correct stack address for each TCS.

Intel-SIG: commit 26e688f126 selftests/sgx: Enable multiple thread
support.
Backport for SGX EDMM support.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/a49dc0d85401db788a0a3f0d795e848abf3b1f44.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:06 +08:00
Reinette Chatre 589c092c85 selftests/sgx: Add page permission and exception test
commit abc5cec473 upstream.

The Enclave Page Cache Map (EPCM) is a secure structure used by the
processor to track the contents of the enclave page cache. The EPCM
contains permissions with which enclave pages can be accessed. SGX
support allows EPCM and PTE page permissions to differ - as long as
the PTE permissions do not exceed the EPCM permissions.

Add a test that:
(1) Creates an SGX enclave page with writable EPCM permission.
(2) Changes the PTE permission on the page to read-only. This should
    be permitted because the permission does not exceed the EPCM
    permission.
(3) Attempts a write to the page. This should generate a page fault
    (#PF) because of the read-only PTE even though the EPCM
    permissions allow the page to be written to.

This introduces the first test of SGX exception handling. In this test
the issue that caused the exception (PTE page permissions) can be fixed
from outside the enclave and after doing so it is possible to re-enter
enclave at original entrypoint with ERESUME.

Intel-SIG: commit abc5cec473 selftests/sgx: Add page permission and
exception test.
Backport for SGX EDMM support.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/3bcc73a4b9fe8780bdb40571805e7ced59e01df7.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:05 +08:00
Reinette Chatre 1a8bdc5061 selftests/sgx: Rename test properties in preparation for more enclave tests
commit c085dfc768 upstream.

SGX selftests prepares a data structure outside of the enclave with
the type of and data for the operation that needs to be run within
the enclave. At this time only two complementary operations are supported
by the enclave: copying a value from outside the enclave into a default
buffer within the enclave and reading a value from the enclave's default
buffer into a variable accessible outside the enclave.

In preparation for more operations supported by the enclave the names of the
current enclave operations are changed to more accurately reflect the
operations and more easily distinguish it from future operations:

* The enums ENCL_OP_PUT and ENCL_OP_GET are renamed to ENCL_OP_PUT_TO_BUFFER
  and ENCL_OP_GET_FROM_BUFFER respectively.
* The structs encl_op_put and encl_op_get are renamed to encl_op_put_to_buf
  and encl_op_get_from_buf respectively.
* The enclave functions do_encl_op_put and do_encl_op_get are renamed to
  do_encl_op_put_to_buf and do_encl_op_get_from_buf respectively.

No functional changes.

Intel-SIG: commit c085dfc768 selftests/sgx: Rename test properties in
preparation for more enclave tests.
Backport for SGX EDMM support.

Suggested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/023fda047c787cf330b88ed9337705edae6a0078.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:05 +08:00
Jarkko Sakkinen 27e18d52f6 selftests/sgx: Provide per-op parameter structs for the test enclave
commit 41493a095e upstream.

To add more operations to the test enclave, the protocol needs to allow
to have operations with varying parameters. Create a separate parameter
struct for each existing operation, with the shared parameters in struct
encl_op_header.

Intel-SIG: commit 41493a095e selftests/sgx: Provide per-op parameter
structs for the test enclave.
Backport for SGX EDMM support.

[reinette: rebased to apply on top of oversubscription test series]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/f9a4a8c436b538003b8ebddaa66083992053cef1.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:05 +08:00
Jarkko Sakkinen 0d7d49d9bd selftests/sgx: Fix corrupted cpuid macro invocation
commit 572a0a647b upstream.

The SGX selftest fails to build on tip/x86/sgx:

	main.c: In function ‘get_total_epc_mem’:
	main.c:296:17: error: implicit declaration of function ‘__cpuid’ [-Werror=implicit-function-declaration]
	  296 |                 __cpuid(&eax, &ebx, &ecx, &edx);
	      |                 ^~~~~~~

Include cpuid.h and use __cpuid_count() macro in order to fix the
compilation issue.

[ dhansen: tweak commit message ]

Intel-SIG: commit 572a0a647b selftests/sgx: Fix corrupted cpuid macro
invocation.
Backport for SGX EDMM support.

Fixes: f0ff2447b8 ("selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211204202355.23005-1-jarkko@kernel.org
Cc: Shuah Khan <shuah@kernel.org>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:04 +08:00
Jarkko Sakkinen c59086a59a selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed
commit f0ff2447b8 upstream.

Add a variation of the unclobbered_vdso test.

In the new test, create a heap for the test enclave, which has the same
size as all available Enclave Page Cache (EPC) pages in the system. This
will guarantee that all test_encl.elf pages *and* SGX Enclave Control
Structure (SECS) have been swapped out by the page reclaimer during the
load time.

This test will trigger both the page reclaimer and the page fault handler.
The page reclaimer triggered, while the heap is being created during the
load time. The page fault handler is triggered for all the required pages,
while the test case is executing.

Intel-SIG: commit f0ff2447b8 selftests/sgx: Add a new kselftest:
Unclobbered_vdso_oversubscribed.
Backport for SGX EDMM support.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/41f7c508eea79a3198b5014d7691903be08f9ff1.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:04 +08:00
Jarkko Sakkinen 319148e6ed selftests/sgx: Move setup_test_encl() to each TEST_F()
commit 065825db1f upstream.

Create the test enclave inside each TEST_F(), instead of FIXTURE_SETUP(),
so that the heap size can be defined per test.

Intel-SIG: commit 065825db1f selftests/sgx: Move setup_test_encl() to
each TEST_F().
Backport for SGX EDMM support.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/70ca264535d2ca0dc8dcaf2281e7d6965f8d4a24.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:03 +08:00
Jarkko Sakkinen 2d57863265 selftests/sgx: Encpsulate the test enclave creation
commit 1b35eb7195 upstream.

Introduce setup_test_encl() so that the enclave creation can be moved to
TEST_F()'s. This is required for a reclaimer test where the heap size needs
to be set large enough to triger the page reclaimer.

Intel-SIG: commit 1b35eb7195 selftests/sgx: Encpsulate the test
enclave creation.
Backport for SGX EDMM support.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/bee0ca867a95828a569c1ba2a8e443a44047dc71.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:03 +08:00
Jarkko Sakkinen 88c61332b5 selftests/sgx: Dump segments and /proc/self/maps only on failure
commit 1471721489 upstream.

Logging is always a compromise between clarity and detail. The main use
case for dumping VMA's is when FIXTURE_SETUP() fails, and is less important
for enclaves that do initialize correctly. Therefore, print the segments
and /proc/self/maps only in the error case.

Finally, if a single test ever creates multiple enclaves, the amount of
log lines would become enormous.

Intel-SIG: commit 1471721489 selftests/sgx: Dump segments and
/proc/self/maps only on failure.
Backport for SGX EDMM support.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/23cef0ae1de3a8a74cbfbbe74eca48ca3f300fde.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:03 +08:00
Jarkko Sakkinen f47c1b16e7 selftests/sgx: Create a heap for the test enclave
commit 3200505d4d upstream.

Create a heap for the test enclave, which is allocated from /dev/null,
and left unmeasured. This is beneficial by its own because it verifies
that an enclave built from multiple choices, works properly. If LSM
hooks are added for SGX some day, a multi source enclave has higher
probability to trigger bugs on access control checks.

The immediate need comes from the need to implement page reclaim tests.
In order to trigger the page reclaimer, one can just set the size of
the heap to high enough.

Intel-SIG: commit 3200505d4d selftests/sgx: Create a heap for the test
enclave.
Backport for SGX EDMM support.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/e070c5f23578c29608051cab879b1d276963a27a.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:02 +08:00
Jarkko Sakkinen 6069d0ed3b selftests/sgx: Make data measurement for an enclave segment optional
commit 5f0ce664d8 upstream.

For a heap makes sense to leave its contents "unmeasured" in the SGX
enclave build process, meaning that they won't contribute to the
cryptographic signature (a RSA-3072 signed SHA56 hash) of the enclave.

Enclaves are signed blobs where the signature is calculated both from
page data and also from "structural properties" of the pages.  For
instance a page offset of *every* page added to the enclave is hashed.

For data, this is optional, not least because hashing a page has a
significant contribution to the enclave load time. Thus, where there is
no reason to hash, do not. The SGX ioctl interface supports this with
SGX_PAGE_MEASURE flag. Only when the flag is *set*, data is measured.

Add seg->measure boolean flag to struct encl_segment. Only when the
flag is set, include the segment data to the signature (represented
by SIGSTRUCT architectural structure).

Intel-SIG: commit 5f0ce664d8 selftests/sgx: Make data measurement for
an enclave segment optional.
Backport for SGX EDMM support.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/625b6fe28fed76275e9238ec4e15ec3c0d87de81.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:02 +08:00
Jarkko Sakkinen d97e62407b selftests/sgx: Assign source for each segment
commit 39f62536be upstream.

Define source per segment so that enclave pages can be added from different
sources, e.g. anonymous VMA for zero pages. In other words, add 'src' field
to struct encl_segment, and assign it to 'encl->src' for pages inherited
from the enclave binary.

Intel-SIG: commit 39f62536be selftests/sgx: Assign source for each
segment.
Backport for SGX EDMM support.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/7850709c3089fe20e4bcecb8295ba87c54cc2b4a.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:02 +08:00
Sean Christopherson bbc2567b6c selftests/sgx: Fix a benign linker warning
commit 5064343fb1 upstream.

The enclave binary (test_encl.elf) is built with only three sections (tcs,
text, and data) as controlled by its custom linker script.

If gcc is built with "--enable-linker-build-id" (this appears to be a
common configuration even if it is by default off) then gcc
will pass "--build-id" to the linker that will prompt it (the linker) to
write unique bits identifying the linked file to a ".note.gnu.build-id"
section.

The section ".note.gnu.build-id" does not exist in the test enclave
resulting in the following warning emitted by the linker:

/usr/bin/ld: warning: .note.gnu.build-id section discarded, --build-id
ignored

The test enclave does not use the build id within the binary so fix the
warning by passing a build id of "none" to the linker that will disable the
setting from any earlier "--build-id" options and thus disable the attempt
to write the build id to a ".note.gnu.build-id" section that does not
exist.

Intel-SIG: commit 5064343fb1 selftests/sgx: Fix a benign linker
warning.
Backport for SGX EDMM support.

Link: https://lore.kernel.org/linux-sgx/20191017030340.18301-2-sean.j.christopherson@intel.com/
Suggested-by: Cedric Xing <cedric.xing@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/ca0f8a81fc1e78af9bdbc6a88e0f9c37d82e53f2.1636997631.git.reinette.chatre@intel.com
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:01 +08:00
Tony Luck f5f9636ca8 x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
commit 3ad6fd77a2 upstream.

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Intel-SIG: commit 3ad6fd77a2 x86/sgx: Add check for SGX pages to
ghes_do_memory_failure().
Backport for SGX MCA recovery co-existence support.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-8-tony.luck@intel.com
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:01 +08:00
Tony Luck 75a042336e x86/sgx: Add hook to error injection address validation
commit c6acb1e7bf upstream.

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Intel-SIG: commit c6acb1e7bf x86/sgx: Add hook to error injection
address validation.
Backport for SGX MCA recovery co-existence support.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-7-tony.luck@intel.com
[ Zhiquan: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:01 +08:00
Tony Luck 487d5438cd x86/sgx: Hook arch_memory_failure() into mainline code
commit 03b122da74 upstream.

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Intel-SIG: commit 03b122da74 x86/sgx: Hook arch_memory_failure()
into mainline code.
Backport for SGX MCA recovery co-existence support.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-6-tony.luck@intel.com
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:00 +08:00
Tony Luck f8c8adb6ca x86/sgx: Add SGX infrastructure to recover from poison
commit a495cbdffa upstream.

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the per-node poison page list.

Intel-SIG: commit a495cbdffa x86/sgx: Add SGX infrastructure to
recover from poison.
Backport for SGX MCA recovery co-existence support.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-5-tony.luck@intel.com
[ Zhiquan: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:00 +08:00
Tony Luck 3792e69713 x86/sgx: Initial poison handling for dirty and free pages
commit 992801ae92 upstream.

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make all
updates to flags atomic, or integrate poison state changes into some
other locking scheme to protect flags (Currently just sgx_reclaimer_lock
which protects the SGX_EPC_PAGE_RECLAIMER_TRACKED bit in page->flags).

In both cases place the poisoned page on a per-node list of poisoned
epc pages to make sure it will not be reallocated.

Intel-SIG: commit 992801ae92 x86/sgx: Initial poison handling for
dirty and free pages.
Backport for SGX MCA recovery co-existence support.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-4-tony.luck@intel.com
[ Zhiquan: amend commit log and adapt the code ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:24:00 +08:00
Tony Luck 58f83be5d4 x86/sgx: Add infrastructure to identify SGX EPC pages
commit 40e0e7843e upstream.

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray. N.B. adds CONFIG_XARRAY_MULTI to
the SGX dependecies. So "select" that in arch/x86/Kconfig for X86/SGX.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page".  If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Intel-SIG: commit 40e0e7843e x86/sgx: Add infrastructure to identify
SGX EPC pages.
Backport for SGX MCA recovery co-existence support.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-3-tony.luck@intel.com
[ Zhiquan: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:59 +08:00
Tony Luck 649ecbcdf0 x86/sgx: Add new sgx_epc_page flag bit to mark free pages
commit d6d261bded upstream.

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IS_FREE that is set when a page
is added to a free list and cleared when the page is allocated.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IS_FREE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IS_FREE bit is cleared.

Intel-SIG: commit d6d261bded x86/sgx: Add new sgx_epc_page flag
bit to mark free pages.
Backport for SGX MCA recovery co-existence support.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-2-tony.luck@intel.com
[ Zhiquan: amend commit log and adapt the code ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:59 +08:00
Sean Christopherson 2c7e165f6c KVM: x86: Fix implicit enum conversion goof in scattered reverse CPUID code
commit 462f8ddebc upstream.

Take "enum kvm_only_cpuid_leafs" in scattered specific CPUID helpers
(which is obvious in hindsight), and use "unsigned int" for leafs that
can be the kernel's standard "enum cpuid_leaf" or the aforementioned
KVM-only variant.  Loss of the enum params is a bit disapponting, but
gcc obviously isn't providing any extra sanity checks, and the various
BUILD_BUG_ON() assertions ensure the input is in range.

This fixes implicit enum conversions that are detected by clang-11:

arch/x86/kvm/cpuid.c:499:29: warning: implicit conversion from enumeration type 'enum kvm_only_cpuid_leafs' to different enumeration type 'enum cpuid_leafs' [-Wenum-conversion]
        kvm_cpu_cap_init_scattered(CPUID_12_EAX,
        ~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~
arch/x86/kvm/cpuid.c:837:31: warning: implicit conversion from enumeration type 'enum kvm_only_cpuid_leafs' to different enumeration type 'enum cpuid_leafs' [-Wenum-conversion]
                cpuid_entry_override(entry, CPUID_12_EAX);
                ~~~~~~~~~~~~~~~~~~~~        ^~~~~~~~~~~~
2 warnings generated.

Intel-SIG: commit 462f8ddebc KVM: x86: Fix implicit enum conversion
goof in scattered reverse CPUID code.
Backport for SGX virtualization support.

Fixes: 4e66c0cb79 ("KVM: x86: Add support for reverse CPUID lookup of scattered features")
Cc: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210421010850.3009718-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:58 +08:00
Mauro Carvalho Chehab e1677961e8 docs: virt: api.rst: fix a pointer to SGX documentation
commit 0a5fab9f08 upstream.

The document which describes the SGX kernel architecture was added at
commit 3fa97bf001 ("Documentation/x86: Document SGX kernel architecture")

but the reference at virt/kvm/api.rst is pointing to some
non-existing document.

Intel-SIG: commit 0a5fab9f08 docs: virt: api.rst: fix a pointer to SGX
documentation
Backport for SGX virtualization support on Intel Xeon platform.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/138c24633c6e4edf862a2b4d77033c603fc10406.1621413933.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:58 +08:00
Reinette Chatre 63114f3b50 x86/sgx: Silence softlockup detection when releasing large enclaves
commit 8795359e35 upstream.

Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
triggers the softlockup detector.

Actual SGX systems have 128GB of enclave memory or more.  The
"unclobbered_vdso_oversubscribed" selftest creates one enclave which
consumes all of the enclave memory on the system. Tearing down such a
large enclave takes around a minute, most of it in the loop where
the EREMOVE instruction is applied to each individual 4k enclave page.

Spending one minute in a loop triggers the softlockup detector.

Add a cond_resched() to give other tasks a chance to run and placate
the softlockup detector.

Intel-SIG: commit 8795359e35 x86/sgx: Silence softlockup detection
when releasing large enclaves.
Backport for SGX virtualization support.

Cc: stable@vger.kernel.org
Fixes: 1728ab54b4 ("x86/sgx: Add a page reclaimer")
Reported-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>  (kselftest as sanity check)
Link: https://lkml.kernel.org/r/ced01cac1e75f900251b0a4ae1150aa8ebd295ec.1644345232.git.reinette.chatre@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict.
  The issue has been fixed by commit d8c6333085 ("sgx: fix softlockup
  when sgx_encl_release"), just align the code to upstream here.
]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:58 +08:00
Reinette Chatre a0e0443a2b x86/sgx: Fix free page accounting
commit ac5d272a0a upstream.

The SGX driver maintains a single global free page counter,
sgx_nr_free_pages, that reflects the number of free pages available
across all NUMA nodes. Correspondingly, a list of free pages is
associated with each NUMA node and sgx_nr_free_pages is updated
every time a page is added or removed from any of the free page
lists. The main usage of sgx_nr_free_pages is by the reclaimer
that runs when it (sgx_nr_free_pages) goes below a watermark
to ensure that there are always some free pages available to, for
example, support efficient page faults.

With sgx_nr_free_pages accessed and modified from a few places
it is essential to ensure that these accesses are done safely but
this is not the case. sgx_nr_free_pages is read without any
protection and updated with inconsistent protection by any one
of the spin locks associated with the individual NUMA nodes.
For example:

      CPU_A                                 CPU_B
      -----                                 -----
 spin_lock(&nodeA->lock);              spin_lock(&nodeB->lock);
 ...                                   ...
 sgx_nr_free_pages--;  /* NOT SAFE */  sgx_nr_free_pages--;

 spin_unlock(&nodeA->lock);            spin_unlock(&nodeB->lock);

Since sgx_nr_free_pages may be protected by different spin locks
while being modified from different CPUs, the following scenario
is possible:

      CPU_A                                CPU_B
      -----                                -----
{sgx_nr_free_pages = 100}
 spin_lock(&nodeA->lock);              spin_lock(&nodeB->lock);
 sgx_nr_free_pages--;                  sgx_nr_free_pages--;
 /* LOAD sgx_nr_free_pages = 100 */    /* LOAD sgx_nr_free_pages = 100 */
 /* sgx_nr_free_pages--          */    /* sgx_nr_free_pages--          */
 /* STORE sgx_nr_free_pages = 99 */    /* STORE sgx_nr_free_pages = 99 */
 spin_unlock(&nodeA->lock);            spin_unlock(&nodeB->lock);

In the above scenario, sgx_nr_free_pages is decremented from two CPUs
but instead of sgx_nr_free_pages ending with a value that is two less
than it started with, it was only decremented by one while the number
of free pages were actually reduced by two. The consequence of
sgx_nr_free_pages not being protected is that its value may not
accurately reflect the actual number of free pages on the system,
impacting the availability of free pages in support of many flows.

The problematic scenario is when the reclaimer does not run because it
believes there to be sufficient free pages while any attempt to allocate
a page fails because there are no free pages available. In the SGX driver
the reclaimer's watermark is only 32 pages so after encountering the
above example scenario 32 times a user space hang is possible when there
are no more free pages because of repeated page faults caused by no
free pages made available.

The following flow was encountered:
asm_exc_page_fault
 ...
   sgx_vma_fault()
     sgx_encl_load_page()
       sgx_encl_eldu() // Encrypted page needs to be loaded from backing
                       // storage into newly allocated SGX memory page
         sgx_alloc_epc_page() // Allocate a page of SGX memory
           __sgx_alloc_epc_page() // Fails, no free SGX memory
           ...
           if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) // Wake reclaimer
             wake_up(&ksgxd_waitq);
           return -EBUSY; // Return -EBUSY giving reclaimer time to run
       return -EBUSY;
     return -EBUSY;
   return VM_FAULT_NOPAGE;

The reclaimer is triggered in above flow with the following code:

static bool sgx_should_reclaim(unsigned long watermark)
{
        return sgx_nr_free_pages < watermark &&
               !list_empty(&sgx_active_page_list);
}

In the problematic scenario there were no free pages available yet the
value of sgx_nr_free_pages was above the watermark. The allocation of
SGX memory thus always failed because of a lack of free pages while no
free pages were made available because the reclaimer is never started
because of sgx_nr_free_pages' incorrect value. The consequence was that
user space kept encountering VM_FAULT_NOPAGE that caused the same
address to be accessed repeatedly with the same result.

Change the global free page counter to an atomic type that
ensures simultaneous updates are done safely. While doing so, move
the updating of the variable outside of the spin lock critical
section to which it does not belong.

Intel-SIG: commit ac5d272a0a x86/sgx: Fix free page accounting
Backport for SGX virtualization support.

Cc: stable@vger.kernel.org
Fixes: 901ddbb9ec ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/a95a40743bbd3f795b465f30922dde7f1ea9e0eb.1637004094.git.reinette.chatre@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:57 +08:00
Paolo Bonzini 984d2b5033 x86/sgx/virt: implement SGX_IOC_VEPC_REMOVE ioctl
commit ae095b16fc upstream.

For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

Some userspace implementations of virtual SGX would rather avoid having
to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the
virtual EPC.  For example, they could sandbox themselves after the guest
starts and forbid further calls to open(), in order to mitigate exploits
from untrusted guests.

Therefore, add a ioctl that does this with EREMOVE.  Userspace can
invoke the ioctl to bring its vEPC pages back to uninitialized state.
There is a possibility that some pages fail to be removed if they are
SECS pages, and the child and SECS pages could be in separate vEPC
regions.  Therefore, the ioctl returns the number of EREMOVE failures,
telling userspace to try the ioctl again after it's done with all
vEPC regions.  A more verbose description of the correct usage and
the possible error conditions is documented in sgx.rst.

Intel-SIG: commit ae095b16fc x86/sgx/virt: implement
SGX_IOC_VEPC_REMOVE ioctl.
Backport for SGX virtualization support.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211021201155.1523989-3-pbonzini@redhat.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:57 +08:00
Paolo Bonzini d4b3139ba7 x86/sgx/virt: extract sgx_vepc_remove_page
commit fd5128e622 upstream.

For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

One way to do this is to simply close and reopen the /dev/sgx_vepc file
descriptor and re-mmap the virtual EPC.  However, this is problematic
because it prevents sandboxing the userspace (for example forbidding
open() after the guest starts; this is doable with heavy use of SCM_RIGHTS
file descriptor passing).

In order to implement this, we will need a ioctl that performs
EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
other possibilities, such as closing and reopening the device,
are racy.

Start the implementation by creating a separate function with just
the __eremove wrapper.

Intel-SIG: commit fd5128e622 x86/sgx/virt: extract
sgx_vepc_remove_page.
Backport for SGX virtualization support.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211021201155.1523989-2-pbonzini@redhat.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:57 +08:00
Liam Howlett 024277f378 x86/sgx: use vma_lookup() in sgx_encl_find()
commit 9ce2c3fc0b upstream.

Use vma_lookup() to find the VMA at a specific address.  As vma_lookup()
will return NULL if the address is not within any VMA, the start address
no longer needs to be validated.

Intel-SIG: commit 9ce2c3fc0b x86/sgx: use vma_lookup() in
sgx_encl_find().
Backport for SGX virtualization support.

Link: https://lkml.kernel.org/r/20210521174745.2219620-10-Liam.Howlett@Oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:56 +08:00
Kai Huang 0aa1db9b22 x86/sgx: Add missing xa_destroy() when virtual EPC is destroyed
commit 4692bc775d upstream.

xa_destroy() needs to be called to destroy a virtual EPC's page array
before calling kfree() to free the virtual EPC. Currently it is not
called so add the missing xa_destroy().

Intel-SIG: commit 4692bc775d x86/sgx: Add missing xa_destroy() when
virtual EPC is destroyed.
Backport for SGX virtualization support.

Fixes: 540745ddbc ("x86/sgx: Introduce virtual EPC for use by KVM guests")
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Yang Zhong <yang.zhong@intel.com>
Link: https://lkml.kernel.org/r/20210615101639.291929-1-kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:56 +08:00
Jarkko Sakkinen c85942cb1c x86/sgx: Do not update sgx_nr_free_pages in sgx_setup_epc_section()
commit ae40aaf6bd upstream.

The commit in Fixes: changed the SGX EPC page sanitization to end up in
sgx_free_epc_page() which puts clean and sanitized pages on the free
list.

This was done for the reason that it is best to keep the logic to assign
available-for-use EPC pages to the correct NUMA lists in a single
location.

sgx_nr_free_pages is also incremented by sgx_free_epc_pages() but those
pages which are being added there per EPC section do not belong to the
free list yet because they haven't been sanitized yet - they land on the
dirty list first and the sanitization happens later when ksgxd starts
massaging them.

So remove that addition there and have sgx_free_epc_page() do that
solely.

 [ bp: Sanitize commit message too. ]

Intel-SIG: commit ae40aaf6bd x86/sgx: Do not update sgx_nr_free_pages
in sgx_setup_epc_section().
Backport for SGX virtualization support.

Fixes: 51ab30eb2a ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210408092924.7032-1-jarkko@kernel.org
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:56 +08:00
Sean Christopherson 1ca19405c2 KVM: x86: Add capability to grant VM access to privileged SGX attribute
commit fe7e948837 upstream.

Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by userspace
to grant a VM access to a priveleged attribute, with args[0] holding a
file handle to a valid SGX attribute file.

The SGX subsystem restricts access to a subset of enclave attributes to
provide additional security for an uncompromised kernel, e.g. to prevent
malware from using the PROVISIONKEY to ensure its nodes are running
inside a geniune SGX enclave and/or to obtain a stable fingerprint.

To prevent userspace from circumventing such restrictions by running an
enclave in a VM, KVM restricts guest access to privileged attributes by
default.

Intel-SIG: commit fe7e948837 KVM: x86: Add capability to grant VM
access to privileged SGX attribute.
Backport for SGX virtualization support

Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <0b099d65e933e068e3ea934b0523bab070cb8cea.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:55 +08:00
Sean Christopherson 2f673d4403 KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC
commit 72add915fb upstream.

Enable SGX virtualization now that KVM has the VM-Exit handlers needed
to trap-and-execute ENCLS to ensure correctness and/or enforce the CPU
model exposed to the guest.  Add a KVM module param, "sgx", to allow an
admin to disable SGX virtualization independent of the kernel.

When supported in hardware and the kernel, advertise SGX1, SGX2 and SGX
LC to userspace via CPUID and wire up the ENCLS_EXITING bitmap based on
the guest's SGX capabilities, i.e. to allow ENCLS to be executed in an
SGX-enabled guest.  With the exception of the provision key, all SGX
attribute bits may be exposed to the guest.  Guest access to the
provision key, which is controlled via securityfs, will be added in a
future patch.

Note, KVM does not yet support exposing ENCLS_C leafs or ENCLV leafs.

Intel-SIG: commit 72add915fb KVM: VMX: Enable SGX virtualization for
SGX1, SGX2 and LC.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <a99e9c23310c79f2f4175c1af4c4cbcef913c3e5.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict.
  - When the changes were made at v5.13, we have call path as below:

    kvm_vcpu_ioctl_set_cpuid2()
    -> kvm_set_cpuid()
       -> kvm_update_cpuid_runtime()
       -> kvm_vcpu_after_set_cpuid()
             chunk 1)
          -> static_call(kvm_x86_vcpu_after_set_cpuid)()
          => vmx_vcpu_after_set_cpuid()
                chunk 2)

    Now we have call path as below at v5.4:

    kvm_vcpu_ioctl_set_cpuid2()
    -> kvm_x86_ops->cpuid_update()
    => vmx_cpuid_update()
          chunk 2)
    -> kvm_update_cpuid()
          chunk 1)

  1. Move chunk 1) in function kvm_vcpu_after_set_cpuid() to
     kvm_update_cpuid().
  2. Move chunk 2) in function vmx_vcpu_after_set_cpuid() to
     vmx_cpuid_update().

  - In function nested_vmx_exit_reflected(), invert the return value of
    nested_vmx_exit_handled_encls() as per the old logic.
  - The commit 5a085326d5 ("KVM: VMX: Rename ops.h to vmx_ops.h") has
    not been backported, so including old ops.h.
]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:55 +08:00
Sean Christopherson 4cbbea4d55 KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)
commit b6f084ca55 upstream.

Add a VM-Exit handler to trap-and-execute EINIT when SGX LC is enabled
in the host.  When SGX LC is enabled, the host kernel may rewrite the
hardware values at will, e.g. to launch enclaves with different signers,
thus KVM needs to intercept EINIT to ensure it is executed with the
correct LE hash (even if the guest sees a hardwired hash).

Switching the LE hash MSRs on VM-Enter/VM-Exit is not a viable option as
writing the MSRs is prohibitively expensive, e.g. on SKL hardware each
WRMSR is ~400 cycles.  And because EINIT takes tens of thousands of
cycles to execute, the ~1500 cycle overhead to trap-and-execute EINIT is
unlikely to be noticed by the guest, let alone impact its overall SGX
performance.

Intel-SIG: commit b6f084ca55 KVM: VMX: Add ENCLS[EINIT] handler to
support SGX Launch Control (LC).
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <57c92fa4d2083eb3be9e6355e3882fc90cffea87.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:54 +08:00
Sean Christopherson 2a51bfc16b KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs
commit 8f102445d4 upstream.

Emulate the four Launch Enclave public key hash MSRs (LE hash MSRs) that
exist on CPUs that support SGX Launch Control (LC).  SGX LC modifies the
behavior of ENCLS[EINIT] to use the LE hash MSRs when verifying the key
used to sign an enclave.  On CPUs without LC support, the LE hash is
hardwired into the CPU to an Intel controlled key (the Intel key is also
the reset value of the LE hash MSRs). Track the guest's desired hash so
that a future patch can stuff the hash into the hardware MSRs when
executing EINIT on behalf of the guest, when those MSRs are writable in
host.

Intel-SIG: commit 8f102445d4 KVM: VMX: Add emulation of SGX Launch
Control LE hash MSRs.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <c58ef601ddf88f3a113add837969533099b1364a.1618196135.git.kai.huang@intel.com>
[Add a comment regarding the MSRs being available until SGX is locked.
 - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:54 +08:00
Sean Christopherson 439463947f KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions
commit 70210c044b upstream.

Add an ECREATE handler that will be used to intercept ECREATE for the
purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e.
to allow userspace to restrict SGX features via CPUID.  ECREATE will be
intercepted when any of the aforementioned masks diverges from hardware
in order to enforce the desired CPUID model, i.e. inject #GP if the
guest attempts to set a bit that hasn't been enumerated as allowed-1 in
CPUID.

Note, access to the PROVISIONKEY is not yet supported.

Intel-SIG: commit 70210c044b KVM: VMX: Add SGX ENCLS[ECREATE] handler
to enforce CPUID restrictions.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <c3a97684f1b71b4f4626a1fc3879472a95651725.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:54 +08:00
Sean Christopherson 679d595826 KVM: VMX: Frame in ENCLS handler for SGX virtualization
commit 9798adbc04 upstream.

Introduce sgx.c and sgx.h, along with the framework for handling ENCLS
VM-Exits.  Add a bool, enable_sgx, that will eventually be wired up to a
module param to control whether or not SGX virtualization is enabled at
runtime.

Intel-SIG: commit 9798adbc04 KVM: VMX: Frame in ENCLS handler for SGX
virtualization.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <1c782269608b2f5e1034be450f375a8432fb705d.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:53 +08:00
Sean Christopherson 412af65f9c KVM: VMX: Add basic handling of VM-Exit from SGX enclave
commit 3c0c2ad1ae upstream.

Add support for handling VM-Exits that originate from a guest SGX
enclave.  In SGX, an "enclave" is a new CPL3-only execution environment,
wherein the CPU and memory state is protected by hardware to make the
state inaccesible to code running outside of the enclave.  When exiting
an enclave due to an asynchronous event (from the perspective of the
enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state
is automatically saved and scrubbed (the CPU loads synthetic state), and
then reloaded when re-entering the enclave.  E.g. after an instruction
based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP
of the enclave instruction that trigered VM-Exit, but will instead point
to a RIP in the enclave's untrusted runtime (the guest userspace code
that coordinates entry/exit to/from the enclave).

To help a VMM recognize and handle exits from enclaves, SGX adds bits to
existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and
GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR.  Define the
new architectural bits, and add a boolean to struct vcpu_vmx to cache
VMX_EXIT_REASON_FROM_ENCLAVE.  Clear the bit in exit_reason so that
checks against exit_reason do not need to account for SGX, e.g.
"if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work.

KVM is a largely a passive observer of the new bits, e.g. KVM needs to
account for the bits when propagating information to a nested VMM, but
otherwise doesn't need to act differently for the majority of VM-Exits
from enclaves.

The one scenario that is directly impacted is emulation, which is for
all intents and purposes impossible[1] since KVM does not have access to
the RIP or instruction stream that triggered the VM-Exit.  The inability
to emulate is a non-issue for KVM, as most instructions that might
trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit
check.  For the few instruction that conditionally #UD, KVM either never
sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only
if the feature is not exposed to the guest in order to inject a #UD,
e.g. RDRAND_EXITING.

But, because it is still possible for a guest to trigger emulation,
e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit
from an enclave.  This is architecturally accurate for instruction
VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable
to killing the VM.  In practice, only broken or particularly stupid
guests should ever encounter this behavior.

Add a WARN in skip_emulated_instruction to detect any attempt to
modify the guest's RIP during an SGX enclave VM-Exit as all such flows
should either be unreachable or must handle exits from enclaves before
getting to skip_emulated_instruction.

[1] Impossible for all practical purposes.  Not truly impossible
    since KVM could implement some form of para-virtualization scheme.

[2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at
    CPL3, so we also don't need to worry about that interaction.

Intel-SIG: commit 3c0c2ad1ae KVM: VMX: Add basic handling of VM-Exit
from SGX enclave
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <315f54a8507d09c292463ef29104e1d4c62e9090.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:53 +08:00
Sean Christopherson b59f25207d KVM: x86: Add reverse-CPUID lookup support for scattered SGX features
commit 01de8682b3 upstream.

Define a new KVM-only feature word for advertising and querying SGX
sub-features in CPUID.0x12.0x0.EAX.  Because SGX1 and SGX2 are scattered
in the kernel's feature word, they need to be translated so that the
bit numbers match those of hardware.

Intel-SIG: commit 01de8682b3 KVM: x86: Add reverse-CPUID lookup
support for scattered SGX features.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <e797c533f4c71ae89265bbb15a02aef86b67cbec.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:53 +08:00
Sean Christopherson b90f0185c7 KVM: x86: Define new #PF SGX error code bit
commit 00e7646c35 upstream.

Page faults that are signaled by the SGX Enclave Page Cache Map (EPCM),
as opposed to the traditional IA32/EPT page tables, set an SGX bit in
the error code to indicate that the #PF was induced by SGX.  KVM will
need to emulate this behavior as part of its trap-and-execute scheme for
virtualizing SGX Launch Control, e.g. to inject SGX-induced #PFs if
EINIT faults in the host, and to support live migration.

Intel-SIG: commit 00e7646c35 KVM: x86: Define new #PF SGX error code
bit.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <e170c5175cb9f35f53218a7512c9e3db972b97a2.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:52 +08:00
Sean Christopherson 81b1a77341 KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX)
commit 54f958cdaa upstream.

Export the gva_to_gpa() helpers for use by SGX virtualization when
executing ENCLS[ECREATE] and ENCLS[EINIT] on behalf of the guest.
To execute ECREATE and EINIT, KVM must obtain the GPA of the target
Secure Enclave Control Structure (SECS) in order to get its
corresponding HVA.

Because the SECS must reside in the Enclave Page Cache (EPC), copying
the SECS's data to a host-controlled buffer via existing exported
helpers is not a viable option as the EPC is not readable or writable
by the kernel.

SGX virtualization will also use gva_to_gpa() to obtain HVAs for
non-EPC pages in order to pass user pointers directly to ECREATE and
EINIT, which avoids having to copy pages worth of data into the kernel.

Intel-SIG: commit 54f958cdaa KVM: x86: Export
kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX).
Backport for SGX virtualization support

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <02f37708321bcdfaa2f9d41c8478affa6e84b04d.1618196135.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:52 +08:00
Sean Christopherson b5a31aa4d3 x86/sgx: Move provisioning device creation out of SGX driver
commit b3754e5d3d upstream.

And extract sgx_set_attribute() out of sgx_ioc_enclave_provision() and
export it as symbol for KVM to use.

The provisioning key is sensitive. The SGX driver only allows to create
an enclave which can access the provisioning key when the enclave
creator has permission to open /dev/sgx_provision. It should apply to
a VM as well, as the provisioning key is platform-specific, thus an
unrestricted VM can also potentially compromise the provisioning key.

Move the provisioning device creation out of sgx_drv_init() to
sgx_init() as a preparation for adding SGX virtualization support,
so that even if the SGX driver is not enabled due to flexible launch
control not being available, SGX virtualization can still be enabled,
and use it to restrict a VM's capability of being able to access the
provisioning key.

 [ bp: Massage commit message. ]

Intel-SIG: commit b3754e5d3d x86/sgx: Move provisioning device
creation out of SGX driver.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/0f4d044d621561f26d5f4ef73e8dc6cd18cc7e79.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:52 +08:00
Sean Christopherson 5bffd94a65 x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
commit d155030b1e upstream.

The host kernel must intercept ECREATE to impose policies on guests, and
intercept EINIT to be able to write guest's virtual SGX_LEPUBKEYHASH MSR
values to hardware before running guest's EINIT so it can run correctly
according to hardware behavior.

Provide wrappers around __ecreate() and __einit() to hide the ugliness
of overloading the ENCLS return value to encode multiple error formats
in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
of SGX virtualization, and reflect ENCLS execution result to guest by
setting up guest's GPRs, or on an exception, injecting the correct fault
based on return value of __ecreate() and __einit().

Use host userspace addresses (provided by KVM based on guest physical
address of ENCLS parameters) to execute ENCLS/EINIT when possible.
Accesses to both EPC and memory originating from ENCLS are subject to
segmentation and paging mechanisms.  It's also possible to generate
kernel mappings for ENCLS parameters by resolving PFN but using
__uaccess_xx() is simpler.

 [ bp: Return early if the __user memory accesses fail, use
   cpu_feature_enabled(). ]

Intel-SIG: commit d155030b1e x86/sgx: Add helpers to expose ECREATE and
EINIT to KVM.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20e09daf559aa5e9e680a0b4b5fba940f1bad86e.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:51 +08:00
Kai Huang 01f2dfaa47 x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs
commit 73916b6a0c upstream.

Add a helper to update SGX_LEPUBKEYHASHn MSRs.  SGX virtualization also
needs to update those MSRs based on guest's "virtual" SGX_LEPUBKEYHASHn
before EINIT from guest.

Intel-SIG: commit 73916b6a0c x86/sgx: Add helper to update
SGX_LEPUBKEYHASHn MSRs.
Backport for SGX virtualization support.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/dfb7cd39d4dd62ea27703b64afdd8bccb579f623.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:51 +08:00
Sean Christopherson e456ee71e7 x86/sgx: Add encls_faulted() helper
commit a67136b458 upstream.

Add a helper to extract the fault indicator from an encoded ENCLS return
value.  SGX virtualization will also need to detect ENCLS faults.

Intel-SIG: commit a67136b458 x86/sgx: Add encls_faulted() helper.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/c1f955898110de2f669da536fc6cf62e003dff88.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:50 +08:00
Sean Christopherson 3d7c41aabd x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT)
commit 32ddda8e44 upstream.

Define the ENCLS leafs that are available with SGX2, also referred to as
Enclave Dynamic Memory Management (EDMM).  The leafs will be used by KVM
to conditionally expose SGX2 capabilities to guests.

Intel-SIG: commit 32ddda8e44 x86/sgx: Add SGX2 ENCLS leaf definitions
(EAUG, EMODPR and EMODT).
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/5f0970c251ebcc6d5add132f0d750cc753b7060f.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:50 +08:00
Sean Christopherson 85e9629da7 x86/sgx: Move ENCLS leaf definitions to sgx.h
commit 9c55c78a73 upstream.

Move the ENCLS leaf definitions to sgx.h so that they can be used by
KVM.

Intel-SIG: commit 9c55c78a73 x86/sgx: Move ENCLS leaf definitions to
sgx.h.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/2e6cd7c5c1ced620cfcd292c3c6c382827fde6b2.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:50 +08:00
Kai Huang 0480ed73c0 x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
commit faa7d3e6f3 upstream.

Modify sgx_init() to always try to initialize the virtual EPC driver,
even if the SGX driver is disabled.  The SGX driver might be disabled
if SGX Launch Control is in locked mode, or not supported in the
hardware at all.  This allows (non-Linux) guests that support non-LC
configurations to use SGX.

 [ bp: De-silli-fy the test. ]

Intel-SIG: commit faa7d3e6f3 x86/sgx: Initialize virtual EPC driver
even when SGX driver is disabled.
Backport for SGX virtualization support.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/d35d17a02bbf8feef83a536cec8b43746d4ea557.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:49 +08:00
Sean Christopherson 6b21ebc4dd x86/cpu/intel: Allow SGX virtualization without Launch Control support
commit 332bfc7bec upstream.

The kernel will currently disable all SGX support if the hardware does
not support launch control.  Make it more permissive to allow SGX
virtualization on systems without Launch Control support.  This will
allow KVM to expose SGX to guests that have less-strict requirements on
the availability of flexible launch control.

Improve error message to distinguish between three cases.  There are two
cases where SGX support is completely disabled:
1) SGX has been disabled completely by the BIOS
2) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
   of LC unavailability.  SGX virtualization is unavailable (because of
   Kconfig).
One where it is partially available:
3) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
   of LC unavailability.  SGX virtualization is supported.

Intel-SIG: commit 332bfc7bec x86/cpu/intel: Allow SGX virtualization
without Launch Control support.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/b3329777076509b3b601550da288c8f3c406a865.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:49 +08:00
Sean Christopherson 94045e80c7 x86/sgx: Introduce virtual EPC for use by KVM guests
commit 540745ddbc upstream.

Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw"
Enclave Page Cache (EPC) without an associated enclave. The intended
and only known use case for raw EPC allocation is to expose EPC to a
KVM guest, hence the 'vepc' moniker, virt.{c,h} files and X86_SGX_KVM
Kconfig.

The SGX driver uses the misc device /dev/sgx_enclave to support
userspace in creating an enclave. Each file descriptor returned from
opening /dev/sgx_enclave represents an enclave. Unlike the SGX driver,
KVM doesn't control how the guest uses the EPC, therefore EPC allocated
to a KVM guest is not associated with an enclave, and /dev/sgx_enclave
is not suitable for allocating EPC for a KVM guest.

Having separate device nodes for the SGX driver and KVM virtual EPC also
allows separate permission control for running host SGX enclaves and KVM
SGX guests.

To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
size, the hypervisor opens /dev/sgx_vepc, and uses mmap() with the
intended size to get an address range of virtual EPC. Then it may use
the address range to create one KVM memory slot as virtual EPC for
a guest.

Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_vepc rather than in KVM. Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

The virtual EPC pages allocated to guests are currently not reclaimable.
Reclaiming an EPC page used by enclave requires a special reclaim
mechanism separate from normal page reclaim, and that mechanism is not
supported for virutal EPC pages. Due to the complications of handling
reclaim conflicts between guest and host, reclaiming virtual EPC pages
is significantly more complex than basic support for SGX virtualization.

 [ bp:
   - Massage commit message and comments
   - use cpu_feature_enabled()
   - vertically align struct members init
   - massage Virtual EPC clarification text
   - move Kconfig prompt to Virtualization ]

Intel-SIG: commit 540745ddbc x86/sgx: Introduce virtual EPC for use by
KVM guests.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/0c38ced8c8e5a69872db4d6a1c0dabd01e07cad7.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:49 +08:00
Sean Christopherson 11353b1c93 x86/sgx: Add SGX_CHILD_PRESENT hardware error code
commit 231d3dbdda upstream.

SGX driver can accurately track how enclave pages are used.  This
enables SECS to be specifically targeted and EREMOVE'd only after all
child pages have been EREMOVE'd.  This ensures that SGX driver will
never encounter SGX_CHILD_PRESENT in normal operation.

Virtual EPC is different.  The host does not track how EPC pages are
used by the guest, so it cannot guarantee EREMOVE success.  It might,
for instance, encounter a SECS with a non-zero child count.

Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
the SGX virtualization driver to handle recoverable EREMOVE errors when
saniziting EPC pages after they are freed.

Intel-SIG: commit 231d3dbdda x86/sgx: Add SGX_CHILD_PRESENT hardware
error code.
Backport for SGX virtualization support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/050b198e882afde7e6eba8e6a0d4da39161dbb5a.1616136308.git.kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:48 +08:00
Kai Huang f6164a994e x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
commit b0c7459be0 upstream.

EREMOVE takes a page and removes any association between that page and
an enclave. It must be run on a page before it can be added into another
enclave. Currently, EREMOVE is run as part of pages being freed into the
SGX page allocator. It is not expected to fail, as it would indicate a
use-after-free of EPC pages. Rather than add the page back to the pool
of available EPC pages, the kernel intentionally leaks the page to avoid
additional errors in the future.

However, KVM does not track how guest pages are used, which means that
SGX virtualization use of EREMOVE might fail. Specifically, it is
legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
KVM guest, because KVM/kernel doesn't track SECS pages.

To allow SGX/KVM to introduce a more permissive EREMOVE helper and
to let the SGX virtualization code use the allocator directly, break
out the EREMOVE call from the SGX page allocator. Rename the original
sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that
it is used to free an EPC page assigned to a host enclave. Replace
sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so
there's no functional change.

At the same time, improve the error message when EREMOVE fails, and
add documentation to explain to the user what that failure means and
to suggest to the user what to do when this bug happens in the case it
happens.

 [ bp: Massage commit message, fix typos and sanitize text, simplify. ]

Intel-SIG: commit b0c7459be0 x86/sgx: Wipe out EREMOVE from
sgx_free_epc_page().
Backport for SGX virtualization support.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
[ Zhiquan Li: amend commit log and resolve the conflict. ]
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
2024-06-11 21:23:48 +08:00