2020-11-12 15:01:16 -07:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
|
|
|
#ifndef _X86_SGX_H
|
|
|
|
#define _X86_SGX_H
|
|
|
|
|
|
|
|
#include <linux/bitops.h>
|
|
|
|
#include <linux/err.h>
|
|
|
|
#include <linux/io.h>
|
|
|
|
#include <linux/rwsem.h>
|
|
|
|
#include <linux/types.h>
|
|
|
|
#include <asm/asm.h>
|
2021-03-19 00:23:03 -07:00
|
|
|
#include <asm/sgx.h>
|
2020-11-12 15:01:16 -07:00
|
|
|
|
|
|
|
#undef pr_fmt
|
|
|
|
#define pr_fmt(fmt) "sgx: " fmt
|
|
|
|
|
x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
EREMOVE takes a page and removes any association between that page and
an enclave. It must be run on a page before it can be added into another
enclave. Currently, EREMOVE is run as part of pages being freed into the
SGX page allocator. It is not expected to fail, as it would indicate a
use-after-free of EPC pages. Rather than add the page back to the pool
of available EPC pages, the kernel intentionally leaks the page to avoid
additional errors in the future.
However, KVM does not track how guest pages are used, which means that
SGX virtualization use of EREMOVE might fail. Specifically, it is
legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
KVM guest, because KVM/kernel doesn't track SECS pages.
To allow SGX/KVM to introduce a more permissive EREMOVE helper and
to let the SGX virtualization code use the allocator directly, break
out the EREMOVE call from the SGX page allocator. Rename the original
sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that
it is used to free an EPC page assigned to a host enclave. Replace
sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so
there's no functional change.
At the same time, improve the error message when EREMOVE fails, and
add documentation to explain to the user what that failure means and
to suggest to the user what to do when this bug happens in the case it
happens.
[ bp: Massage commit message, fix typos and sanitize text, simplify. ]
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com
2021-03-25 02:30:57 -07:00
|
|
|
#define EREMOVE_ERROR_MESSAGE \
|
|
|
|
"EREMOVE returned %d (0x%x) and an EPC page was leaked. SGX may become unusable. " \
|
2023-03-14 16:06:44 -07:00
|
|
|
"Refer to Documentation/arch/x86/sgx.rst for more information."
|
x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
EREMOVE takes a page and removes any association between that page and
an enclave. It must be run on a page before it can be added into another
enclave. Currently, EREMOVE is run as part of pages being freed into the
SGX page allocator. It is not expected to fail, as it would indicate a
use-after-free of EPC pages. Rather than add the page back to the pool
of available EPC pages, the kernel intentionally leaks the page to avoid
additional errors in the future.
However, KVM does not track how guest pages are used, which means that
SGX virtualization use of EREMOVE might fail. Specifically, it is
legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
KVM guest, because KVM/kernel doesn't track SECS pages.
To allow SGX/KVM to introduce a more permissive EREMOVE helper and
to let the SGX virtualization code use the allocator directly, break
out the EREMOVE call from the SGX page allocator. Rename the original
sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that
it is used to free an EPC page assigned to a host enclave. Replace
sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so
there's no functional change.
At the same time, improve the error message when EREMOVE fails, and
add documentation to explain to the user what that failure means and
to suggest to the user what to do when this bug happens in the case it
happens.
[ bp: Massage commit message, fix typos and sanitize text, simplify. ]
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com
2021-03-25 02:30:57 -07:00
|
|
|
|
2020-11-12 15:01:16 -07:00
|
|
|
#define SGX_MAX_EPC_SECTIONS 8
|
2020-11-12 15:01:24 -07:00
|
|
|
#define SGX_EEXTEND_BLOCK_SIZE 256
|
x86/sgx: Add a page reclaimer
Just like normal RAM, there is a limited amount of enclave memory available
and overcommitting it is a very valuable tool to reduce resource use.
Introduce a simple reclaim mechanism for enclave pages.
In contrast to normal page reclaim, the kernel cannot directly access
enclave memory. To get around this, the SGX architecture provides a set of
functions to help. Among other things, these functions copy enclave memory
to and from normal memory, encrypting it and protecting its integrity in
the process.
Implement a page reclaimer by using these functions. Picks victim pages in
LRU fashion from all the enclaves running in the system. A new kernel
thread (ksgxswapd) reclaims pages in the background based on watermarks,
similar to normal kswapd.
All enclave pages can be reclaimed, architecturally. But, there are some
limits to this, such as the special SECS metadata page which must be
reclaimed last. The page version array (used to mitigate replaying old
reclaimed pages) is also architecturally reclaimable, but not yet
implemented. The end result is that the vast majority of enclave pages are
currently reclaimable.
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-22-jarkko@kernel.org
2020-11-12 15:01:32 -07:00
|
|
|
#define SGX_NR_TO_SCAN 16
|
|
|
|
#define SGX_NR_LOW_PAGES 32
|
|
|
|
#define SGX_NR_HIGH_PAGES 64
|
|
|
|
|
|
|
|
/* Pages, which are being tracked by the page reclaimer. */
|
|
|
|
#define SGX_EPC_PAGE_RECLAIMER_TRACKED BIT(0)
|
2020-11-12 15:01:16 -07:00
|
|
|
|
2021-10-26 15:00:44 -07:00
|
|
|
/* Pages on free list */
|
|
|
|
#define SGX_EPC_PAGE_IS_FREE BIT(1)
|
|
|
|
|
2020-11-12 15:01:16 -07:00
|
|
|
struct sgx_epc_page {
|
|
|
|
unsigned int section;
|
2021-10-26 15:00:46 -07:00
|
|
|
u16 flags;
|
|
|
|
u16 poison;
|
x86/sgx: Add a page reclaimer
Just like normal RAM, there is a limited amount of enclave memory available
and overcommitting it is a very valuable tool to reduce resource use.
Introduce a simple reclaim mechanism for enclave pages.
In contrast to normal page reclaim, the kernel cannot directly access
enclave memory. To get around this, the SGX architecture provides a set of
functions to help. Among other things, these functions copy enclave memory
to and from normal memory, encrypting it and protecting its integrity in
the process.
Implement a page reclaimer by using these functions. Picks victim pages in
LRU fashion from all the enclaves running in the system. A new kernel
thread (ksgxswapd) reclaims pages in the background based on watermarks,
similar to normal kswapd.
All enclave pages can be reclaimed, architecturally. But, there are some
limits to this, such as the special SECS metadata page which must be
reclaimed last. The page version array (used to mitigate replaying old
reclaimed pages) is also architecturally reclaimable, but not yet
implemented. The end result is that the vast majority of enclave pages are
currently reclaimable.
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-22-jarkko@kernel.org
2020-11-12 15:01:32 -07:00
|
|
|
struct sgx_encl_page *owner;
|
2020-11-12 15:01:16 -07:00
|
|
|
struct list_head list;
|
|
|
|
};
|
|
|
|
|
2021-03-17 16:53:31 -07:00
|
|
|
/*
|
|
|
|
* Contains the tracking data for NUMA nodes having EPC pages. Most importantly,
|
|
|
|
* the free page list local to the node is stored here.
|
|
|
|
*/
|
|
|
|
struct sgx_numa_node {
|
|
|
|
struct list_head free_page_list;
|
2021-10-26 15:00:46 -07:00
|
|
|
struct list_head sgx_poison_page_list;
|
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
== Problem ==
The amount of SGX memory on a system is determined by the BIOS and it
varies wildly between systems. It can be as small as dozens of MB's
and as large as many GB's on servers. Just like how applications need
to know how much regular RAM is available, enclave builders need to
know how much SGX memory an enclave can consume.
== Solution ==
Introduce a new sysfs file:
/sys/devices/system/node/nodeX/x86/sgx_total_bytes
to enumerate the amount of SGX memory available in each NUMA node.
This serves the same function for SGX as /proc/meminfo or
/sys/devices/system/node/nodeX/meminfo does for normal RAM.
'sgx_total_bytes' is needed today to help drive the SGX selftests.
SGX-specific swap code is exercised by creating overcommitted enclaves
which are larger than the physical SGX memory on the system. They
currently use a CPUID-based approach which can diverge from the actual
amount of SGX memory available. 'sgx_total_bytes' ensures that the
selftests can work efficiently and do not attempt stupid things like
creating a 100,000 MB enclave on a system with 128 MB of SGX memory.
== Implementation Details ==
Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
arch specific attribute group, and add an attribute for the amount of
SGX memory in bytes to each NUMA node:
== ABI Design Discussion ==
As opposed to the per-node ABI, a single, global ABI was considered.
However, this would prevent enclaves from being able to size
themselves so that they fit on a single NUMA node. Essentially, a
single value would rule out NUMA optimizations for enclaves.
Create a new "x86/" directory inside each "nodeX/" sysfs directory.
'sgx_total_bytes' is expected to be the first of at least a few
sgx-specific files to be placed in the new directory. Just scanning
/proc/meminfo, these are the no-brainers that we have for RAM, but we
need for SGX:
MemTotal: xxxx kB // sgx_total_bytes (implemented here)
MemFree: yyyy kB // sgx_free_bytes
SwapTotal: zzzz kB // sgx_swapped_bytes
So, at *least* three. I think we will eventually end up needing
something more along the lines of a dozen. A new directory (as
opposed to being in the nodeX/ "root") directory avoids cluttering the
root with several "sgx_*" files.
Place the new file in a new "nodeX/x86/" directory because SGX is
highly x86-specific. It is very unlikely that any other architecture
(or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/"
as opposed to "x86/" was also considered. But, there is a real chance
this can get used for other arch-specific purposes.
[ dhansen: rewrite changelog ]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
2021-11-16 09:21:16 -07:00
|
|
|
unsigned long size;
|
2021-03-17 16:53:31 -07:00
|
|
|
spinlock_t lock;
|
|
|
|
};
|
|
|
|
|
2020-11-12 15:01:16 -07:00
|
|
|
/*
|
|
|
|
* The firmware can define multiple chunks of EPC to the different areas of the
|
|
|
|
* physical memory e.g. for memory areas of the each node. This structure is
|
|
|
|
* used to store EPC pages for one EPC section and virtual memory area where
|
|
|
|
* the pages have been mapped.
|
|
|
|
*/
|
|
|
|
struct sgx_epc_section {
|
|
|
|
unsigned long phys_addr;
|
|
|
|
void *virt_addr;
|
|
|
|
struct sgx_epc_page *pages;
|
2021-03-17 16:53:31 -07:00
|
|
|
struct sgx_numa_node *node;
|
2020-11-12 15:01:16 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
|
|
|
|
|
|
|
|
static inline unsigned long sgx_get_epc_phys_addr(struct sgx_epc_page *page)
|
|
|
|
{
|
|
|
|
struct sgx_epc_section *section = &sgx_epc_sections[page->section];
|
|
|
|
unsigned long index;
|
|
|
|
|
|
|
|
index = ((unsigned long)page - (unsigned long)section->pages) / sizeof(*page);
|
|
|
|
|
|
|
|
return section->phys_addr + index * PAGE_SIZE;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
|
|
|
|
{
|
|
|
|
struct sgx_epc_section *section = &sgx_epc_sections[page->section];
|
|
|
|
unsigned long index;
|
|
|
|
|
|
|
|
index = ((unsigned long)page - (unsigned long)section->pages) / sizeof(*page);
|
|
|
|
|
|
|
|
return section->virt_addr + index * PAGE_SIZE;
|
|
|
|
}
|
|
|
|
|
2020-11-12 15:01:20 -07:00
|
|
|
struct sgx_epc_page *__sgx_alloc_epc_page(void);
|
|
|
|
void sgx_free_epc_page(struct sgx_epc_page *page);
|
|
|
|
|
2022-05-10 11:08:56 -07:00
|
|
|
void sgx_reclaim_direct(void);
|
x86/sgx: Add a page reclaimer
Just like normal RAM, there is a limited amount of enclave memory available
and overcommitting it is a very valuable tool to reduce resource use.
Introduce a simple reclaim mechanism for enclave pages.
In contrast to normal page reclaim, the kernel cannot directly access
enclave memory. To get around this, the SGX architecture provides a set of
functions to help. Among other things, these functions copy enclave memory
to and from normal memory, encrypting it and protecting its integrity in
the process.
Implement a page reclaimer by using these functions. Picks victim pages in
LRU fashion from all the enclaves running in the system. A new kernel
thread (ksgxswapd) reclaims pages in the background based on watermarks,
similar to normal kswapd.
All enclave pages can be reclaimed, architecturally. But, there are some
limits to this, such as the special SECS metadata page which must be
reclaimed last. The page version array (used to mitigate replaying old
reclaimed pages) is also architecturally reclaimable, but not yet
implemented. The end result is that the vast majority of enclave pages are
currently reclaimable.
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-22-jarkko@kernel.org
2020-11-12 15:01:32 -07:00
|
|
|
void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
|
|
|
|
int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
|
|
|
|
struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
|
|
|
|
|
2022-05-10 11:08:45 -07:00
|
|
|
void sgx_ipi_cb(void *info);
|
|
|
|
|
x86/sgx: Introduce virtual EPC for use by KVM guests
Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw"
Enclave Page Cache (EPC) without an associated enclave. The intended
and only known use case for raw EPC allocation is to expose EPC to a
KVM guest, hence the 'vepc' moniker, virt.{c,h} files and X86_SGX_KVM
Kconfig.
The SGX driver uses the misc device /dev/sgx_enclave to support
userspace in creating an enclave. Each file descriptor returned from
opening /dev/sgx_enclave represents an enclave. Unlike the SGX driver,
KVM doesn't control how the guest uses the EPC, therefore EPC allocated
to a KVM guest is not associated with an enclave, and /dev/sgx_enclave
is not suitable for allocating EPC for a KVM guest.
Having separate device nodes for the SGX driver and KVM virtual EPC also
allows separate permission control for running host SGX enclaves and KVM
SGX guests.
To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
size, the hypervisor opens /dev/sgx_vepc, and uses mmap() with the
intended size to get an address range of virtual EPC. Then it may use
the address range to create one KVM memory slot as virtual EPC for
a guest.
Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_vepc rather than in KVM. Doing so has two major advantages:
- Does not require changes to KVM's uAPI, e.g. EPC gets handled as
just another memory backend for guests.
- EPC management is wholly contained in the SGX subsystem, e.g. SGX
does not have to export any symbols, changes to reclaim flows don't
need to be routed through KVM, SGX's dirty laundry doesn't have to
get aired out for the world to see, and so on and so forth.
The virtual EPC pages allocated to guests are currently not reclaimable.
Reclaiming an EPC page used by enclave requires a special reclaim
mechanism separate from normal page reclaim, and that mechanism is not
supported for virutal EPC pages. Due to the complications of handling
reclaim conflicts between guest and host, reclaiming virtual EPC pages
is significantly more complex than basic support for SGX virtualization.
[ bp:
- Massage commit message and comments
- use cpu_feature_enabled()
- vertically align struct members init
- massage Virtual EPC clarification text
- move Kconfig prompt to Virtualization ]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/0c38ced8c8e5a69872db4d6a1c0dabd01e07cad7.1616136308.git.kai.huang@intel.com
2021-03-19 00:22:21 -07:00
|
|
|
#ifdef CONFIG_X86_SGX_KVM
|
|
|
|
int __init sgx_vepc_init(void);
|
|
|
|
#else
|
|
|
|
static inline int __init sgx_vepc_init(void)
|
|
|
|
{
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2021-03-19 00:23:07 -07:00
|
|
|
void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
|
|
|
|
|
2020-11-12 15:01:16 -07:00
|
|
|
#endif /* _X86_SGX_H */
|