1
linux/arch/x86/kernel/cpu/sgx/sgx.h

108 lines
2.8 KiB
C
Raw Permalink Normal View History

x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-12 15:01:16 -07:00
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _X86_SGX_H
#define _X86_SGX_H
#include <linux/bitops.h>
#include <linux/err.h>
#include <linux/io.h>
#include <linux/rwsem.h>
#include <linux/types.h>
#include <asm/asm.h>
#include <asm/sgx.h>
x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-12 15:01:16 -07:00
#undef pr_fmt
#define pr_fmt(fmt) "sgx: " fmt
x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() EREMOVE takes a page and removes any association between that page and an enclave. It must be run on a page before it can be added into another enclave. Currently, EREMOVE is run as part of pages being freed into the SGX page allocator. It is not expected to fail, as it would indicate a use-after-free of EPC pages. Rather than add the page back to the pool of available EPC pages, the kernel intentionally leaks the page to avoid additional errors in the future. However, KVM does not track how guest pages are used, which means that SGX virtualization use of EREMOVE might fail. Specifically, it is legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to KVM guest, because KVM/kernel doesn't track SECS pages. To allow SGX/KVM to introduce a more permissive EREMOVE helper and to let the SGX virtualization code use the allocator directly, break out the EREMOVE call from the SGX page allocator. Rename the original sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that it is used to free an EPC page assigned to a host enclave. Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so there's no functional change. At the same time, improve the error message when EREMOVE fails, and add documentation to explain to the user what that failure means and to suggest to the user what to do when this bug happens in the case it happens. [ bp: Massage commit message, fix typos and sanitize text, simplify. ] Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com
2021-03-25 02:30:57 -07:00
#define EREMOVE_ERROR_MESSAGE \
"EREMOVE returned %d (0x%x) and an EPC page was leaked. SGX may become unusable. " \
"Refer to Documentation/arch/x86/sgx.rst for more information."
x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() EREMOVE takes a page and removes any association between that page and an enclave. It must be run on a page before it can be added into another enclave. Currently, EREMOVE is run as part of pages being freed into the SGX page allocator. It is not expected to fail, as it would indicate a use-after-free of EPC pages. Rather than add the page back to the pool of available EPC pages, the kernel intentionally leaks the page to avoid additional errors in the future. However, KVM does not track how guest pages are used, which means that SGX virtualization use of EREMOVE might fail. Specifically, it is legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to KVM guest, because KVM/kernel doesn't track SECS pages. To allow SGX/KVM to introduce a more permissive EREMOVE helper and to let the SGX virtualization code use the allocator directly, break out the EREMOVE call from the SGX page allocator. Rename the original sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that it is used to free an EPC page assigned to a host enclave. Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so there's no functional change. At the same time, improve the error message when EREMOVE fails, and add documentation to explain to the user what that failure means and to suggest to the user what to do when this bug happens in the case it happens. [ bp: Massage commit message, fix typos and sanitize text, simplify. ] Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com
2021-03-25 02:30:57 -07:00
x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-12 15:01:16 -07:00
#define SGX_MAX_EPC_SECTIONS 8
#define SGX_EEXTEND_BLOCK_SIZE 256
x86/sgx: Add a page reclaimer Just like normal RAM, there is a limited amount of enclave memory available and overcommitting it is a very valuable tool to reduce resource use. Introduce a simple reclaim mechanism for enclave pages. In contrast to normal page reclaim, the kernel cannot directly access enclave memory. To get around this, the SGX architecture provides a set of functions to help. Among other things, these functions copy enclave memory to and from normal memory, encrypting it and protecting its integrity in the process. Implement a page reclaimer by using these functions. Picks victim pages in LRU fashion from all the enclaves running in the system. A new kernel thread (ksgxswapd) reclaims pages in the background based on watermarks, similar to normal kswapd. All enclave pages can be reclaimed, architecturally. But, there are some limits to this, such as the special SECS metadata page which must be reclaimed last. The page version array (used to mitigate replaying old reclaimed pages) is also architecturally reclaimable, but not yet implemented. The end result is that the vast majority of enclave pages are currently reclaimable. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-22-jarkko@kernel.org
2020-11-12 15:01:32 -07:00
#define SGX_NR_TO_SCAN 16
#define SGX_NR_LOW_PAGES 32
#define SGX_NR_HIGH_PAGES 64
/* Pages, which are being tracked by the page reclaimer. */
#define SGX_EPC_PAGE_RECLAIMER_TRACKED BIT(0)
x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-12 15:01:16 -07:00
/* Pages on free list */
#define SGX_EPC_PAGE_IS_FREE BIT(1)
x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-12 15:01:16 -07:00
struct sgx_epc_page {
unsigned int section;
u16 flags;
u16 poison;
x86/sgx: Add a page reclaimer Just like normal RAM, there is a limited amount of enclave memory available and overcommitting it is a very valuable tool to reduce resource use. Introduce a simple reclaim mechanism for enclave pages. In contrast to normal page reclaim, the kernel cannot directly access enclave memory. To get around this, the SGX architecture provides a set of functions to help. Among other things, these functions copy enclave memory to and from normal memory, encrypting it and protecting its integrity in the process. Implement a page reclaimer by using these functions. Picks victim pages in LRU fashion from all the enclaves running in the system. A new kernel thread (ksgxswapd) reclaims pages in the background based on watermarks, similar to normal kswapd. All enclave pages can be reclaimed, architecturally. But, there are some limits to this, such as the special SECS metadata page which must be reclaimed last. The page version array (used to mitigate replaying old reclaimed pages) is also architecturally reclaimable, but not yet implemented. The end result is that the vast majority of enclave pages are currently reclaimable. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-22-jarkko@kernel.org
2020-11-12 15:01:32 -07:00
struct sgx_encl_page *owner;
x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-12 15:01:16 -07:00
struct list_head list;
};
x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Background ========== SGX enclave memory is enumerated by the processor in contiguous physical ranges called Enclave Page Cache (EPC) sections. Currently, there is a free list per section, but allocations simply target the lowest-numbered sections. This is functional, but has no NUMA awareness. Fortunately, EPC sections are covered by entries in the ACPI SRAT table. These entries allow each EPC section to be associated with a NUMA node, just like normal RAM. Solution ======== Implement a NUMA-aware enclave page allocator. Mirror the buddy allocator and maintain a list of enclave pages for each NUMA node. Attempt to allocate enclave memory first from local nodes, then fall back to other nodes. Note that the fallback is not as sophisticated as the buddy allocator and is itself not aware of NUMA distances. When a node's free list is empty, it searches for the next-highest node with enclave pages (and will wrap if necessary). This could be improved in the future. Other ===== NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node(). [ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because callers do not expect that and that leads to a NULL ptr deref. ] [ dhansen: Fix an uninitialized 'nid' variable in __sgx_alloc_epc_page() as Reported-by: kernel test robot <lkp@intel.com> to avoid any potential allocations from the wrong NUMA node or even premature allocation failures. ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/ Link: https://lkml.kernel.org/r/20210319040602.178558-1-kai.huang@intel.com Link: https://lkml.kernel.org/r/20210318214933.29341-1-dave.hansen@intel.com Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com
2021-03-17 16:53:31 -07:00
/*
* Contains the tracking data for NUMA nodes having EPC pages. Most importantly,
* the free page list local to the node is stored here.
*/
struct sgx_numa_node {
struct list_head free_page_list;
struct list_head sgx_poison_page_list;
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node == Problem == The amount of SGX memory on a system is determined by the BIOS and it varies wildly between systems. It can be as small as dozens of MB's and as large as many GB's on servers. Just like how applications need to know how much regular RAM is available, enclave builders need to know how much SGX memory an enclave can consume. == Solution == Introduce a new sysfs file: /sys/devices/system/node/nodeX/x86/sgx_total_bytes to enumerate the amount of SGX memory available in each NUMA node. This serves the same function for SGX as /proc/meminfo or /sys/devices/system/node/nodeX/meminfo does for normal RAM. 'sgx_total_bytes' is needed today to help drive the SGX selftests. SGX-specific swap code is exercised by creating overcommitted enclaves which are larger than the physical SGX memory on the system. They currently use a CPUID-based approach which can diverge from the actual amount of SGX memory available. 'sgx_total_bytes' ensures that the selftests can work efficiently and do not attempt stupid things like creating a 100,000 MB enclave on a system with 128 MB of SGX memory. == Implementation Details == Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an arch specific attribute group, and add an attribute for the amount of SGX memory in bytes to each NUMA node: == ABI Design Discussion == As opposed to the per-node ABI, a single, global ABI was considered. However, this would prevent enclaves from being able to size themselves so that they fit on a single NUMA node. Essentially, a single value would rule out NUMA optimizations for enclaves. Create a new "x86/" directory inside each "nodeX/" sysfs directory. 'sgx_total_bytes' is expected to be the first of at least a few sgx-specific files to be placed in the new directory. Just scanning /proc/meminfo, these are the no-brainers that we have for RAM, but we need for SGX: MemTotal: xxxx kB // sgx_total_bytes (implemented here) MemFree: yyyy kB // sgx_free_bytes SwapTotal: zzzz kB // sgx_swapped_bytes So, at *least* three. I think we will eventually end up needing something more along the lines of a dozen. A new directory (as opposed to being in the nodeX/ "root") directory avoids cluttering the root with several "sgx_*" files. Place the new file in a new "nodeX/x86/" directory because SGX is highly x86-specific. It is very unlikely that any other architecture (or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/" as opposed to "x86/" was also considered. But, there is a real chance this can get used for other arch-specific purposes. [ dhansen: rewrite changelog ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
2021-11-16 09:21:16 -07:00
unsigned long size;
x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Background ========== SGX enclave memory is enumerated by the processor in contiguous physical ranges called Enclave Page Cache (EPC) sections. Currently, there is a free list per section, but allocations simply target the lowest-numbered sections. This is functional, but has no NUMA awareness. Fortunately, EPC sections are covered by entries in the ACPI SRAT table. These entries allow each EPC section to be associated with a NUMA node, just like normal RAM. Solution ======== Implement a NUMA-aware enclave page allocator. Mirror the buddy allocator and maintain a list of enclave pages for each NUMA node. Attempt to allocate enclave memory first from local nodes, then fall back to other nodes. Note that the fallback is not as sophisticated as the buddy allocator and is itself not aware of NUMA distances. When a node's free list is empty, it searches for the next-highest node with enclave pages (and will wrap if necessary). This could be improved in the future. Other ===== NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node(). [ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because callers do not expect that and that leads to a NULL ptr deref. ] [ dhansen: Fix an uninitialized 'nid' variable in __sgx_alloc_epc_page() as Reported-by: kernel test robot <lkp@intel.com> to avoid any potential allocations from the wrong NUMA node or even premature allocation failures. ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/ Link: https://lkml.kernel.org/r/20210319040602.178558-1-kai.huang@intel.com Link: https://lkml.kernel.org/r/20210318214933.29341-1-dave.hansen@intel.com Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com
2021-03-17 16:53:31 -07:00
spinlock_t lock;
};
x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-12 15:01:16 -07:00
/*
* The firmware can define multiple chunks of EPC to the different areas of the
* physical memory e.g. for memory areas of the each node. This structure is
* used to store EPC pages for one EPC section and virtual memory area where
* the pages have been mapped.
*/
struct sgx_epc_section {
unsigned long phys_addr;
void *virt_addr;
struct sgx_epc_page *pages;
x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Background ========== SGX enclave memory is enumerated by the processor in contiguous physical ranges called Enclave Page Cache (EPC) sections. Currently, there is a free list per section, but allocations simply target the lowest-numbered sections. This is functional, but has no NUMA awareness. Fortunately, EPC sections are covered by entries in the ACPI SRAT table. These entries allow each EPC section to be associated with a NUMA node, just like normal RAM. Solution ======== Implement a NUMA-aware enclave page allocator. Mirror the buddy allocator and maintain a list of enclave pages for each NUMA node. Attempt to allocate enclave memory first from local nodes, then fall back to other nodes. Note that the fallback is not as sophisticated as the buddy allocator and is itself not aware of NUMA distances. When a node's free list is empty, it searches for the next-highest node with enclave pages (and will wrap if necessary). This could be improved in the future. Other ===== NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node(). [ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because callers do not expect that and that leads to a NULL ptr deref. ] [ dhansen: Fix an uninitialized 'nid' variable in __sgx_alloc_epc_page() as Reported-by: kernel test robot <lkp@intel.com> to avoid any potential allocations from the wrong NUMA node or even premature allocation failures. ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/ Link: https://lkml.kernel.org/r/20210319040602.178558-1-kai.huang@intel.com Link: https://lkml.kernel.org/r/20210318214933.29341-1-dave.hansen@intel.com Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com
2021-03-17 16:53:31 -07:00
struct sgx_numa_node *node;
x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-12 15:01:16 -07:00
};
extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
static inline unsigned long sgx_get_epc_phys_addr(struct sgx_epc_page *page)
{
struct sgx_epc_section *section = &sgx_epc_sections[page->section];
unsigned long index;
index = ((unsigned long)page - (unsigned long)section->pages) / sizeof(*page);
return section->phys_addr + index * PAGE_SIZE;
}
static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
{
struct sgx_epc_section *section = &sgx_epc_sections[page->section];
unsigned long index;
index = ((unsigned long)page - (unsigned long)section->pages) / sizeof(*page);
return section->virt_addr + index * PAGE_SIZE;
}
struct sgx_epc_page *__sgx_alloc_epc_page(void);
void sgx_free_epc_page(struct sgx_epc_page *page);
void sgx_reclaim_direct(void);
x86/sgx: Add a page reclaimer Just like normal RAM, there is a limited amount of enclave memory available and overcommitting it is a very valuable tool to reduce resource use. Introduce a simple reclaim mechanism for enclave pages. In contrast to normal page reclaim, the kernel cannot directly access enclave memory. To get around this, the SGX architecture provides a set of functions to help. Among other things, these functions copy enclave memory to and from normal memory, encrypting it and protecting its integrity in the process. Implement a page reclaimer by using these functions. Picks victim pages in LRU fashion from all the enclaves running in the system. A new kernel thread (ksgxswapd) reclaims pages in the background based on watermarks, similar to normal kswapd. All enclave pages can be reclaimed, architecturally. But, there are some limits to this, such as the special SECS metadata page which must be reclaimed last. The page version array (used to mitigate replaying old reclaimed pages) is also architecturally reclaimable, but not yet implemented. The end result is that the vast majority of enclave pages are currently reclaimable. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-22-jarkko@kernel.org
2020-11-12 15:01:32 -07:00
void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
void sgx_ipi_cb(void *info);
x86/sgx: Introduce virtual EPC for use by KVM guests Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw" Enclave Page Cache (EPC) without an associated enclave. The intended and only known use case for raw EPC allocation is to expose EPC to a KVM guest, hence the 'vepc' moniker, virt.{c,h} files and X86_SGX_KVM Kconfig. The SGX driver uses the misc device /dev/sgx_enclave to support userspace in creating an enclave. Each file descriptor returned from opening /dev/sgx_enclave represents an enclave. Unlike the SGX driver, KVM doesn't control how the guest uses the EPC, therefore EPC allocated to a KVM guest is not associated with an enclave, and /dev/sgx_enclave is not suitable for allocating EPC for a KVM guest. Having separate device nodes for the SGX driver and KVM virtual EPC also allows separate permission control for running host SGX enclaves and KVM SGX guests. To use /dev/sgx_vepc to allocate a virtual EPC instance with particular size, the hypervisor opens /dev/sgx_vepc, and uses mmap() with the intended size to get an address range of virtual EPC. Then it may use the address range to create one KVM memory slot as virtual EPC for a guest. Implement the "raw" EPC allocation in the x86 core-SGX subsystem via /dev/sgx_vepc rather than in KVM. Doing so has two major advantages: - Does not require changes to KVM's uAPI, e.g. EPC gets handled as just another memory backend for guests. - EPC management is wholly contained in the SGX subsystem, e.g. SGX does not have to export any symbols, changes to reclaim flows don't need to be routed through KVM, SGX's dirty laundry doesn't have to get aired out for the world to see, and so on and so forth. The virtual EPC pages allocated to guests are currently not reclaimable. Reclaiming an EPC page used by enclave requires a special reclaim mechanism separate from normal page reclaim, and that mechanism is not supported for virutal EPC pages. Due to the complications of handling reclaim conflicts between guest and host, reclaiming virtual EPC pages is significantly more complex than basic support for SGX virtualization. [ bp: - Massage commit message and comments - use cpu_feature_enabled() - vertically align struct members init - massage Virtual EPC clarification text - move Kconfig prompt to Virtualization ] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/0c38ced8c8e5a69872db4d6a1c0dabd01e07cad7.1616136308.git.kai.huang@intel.com
2021-03-19 00:22:21 -07:00
#ifdef CONFIG_X86_SGX_KVM
int __init sgx_vepc_init(void);
#else
static inline int __init sgx_vepc_init(void)
{
return -ENODEV;
}
#endif
void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-12 15:01:16 -07:00
#endif /* _X86_SGX_H */