2021-10-25 05:21:08 -07:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
|
|
|
/*
|
|
|
|
* Hyper-V Isolation VM interface with paravisor and hypervisor
|
|
|
|
*
|
|
|
|
* Author:
|
|
|
|
* Tianyu Lan <Tianyu.Lan@microsoft.com>
|
|
|
|
*/
|
|
|
|
|
2021-10-25 05:21:11 -07:00
|
|
|
#include <linux/bitfield.h>
|
2021-10-25 05:21:08 -07:00
|
|
|
#include <linux/hyperv.h>
|
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/slab.h>
|
2021-10-25 05:21:11 -07:00
|
|
|
#include <asm/svm.h>
|
|
|
|
#include <asm/sev.h>
|
2021-10-25 05:21:08 -07:00
|
|
|
#include <asm/io.h>
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
#include <asm/coco.h>
|
|
|
|
#include <asm/mem_encrypt.h>
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
#include <asm/set_memory.h>
|
2021-10-25 05:21:08 -07:00
|
|
|
#include <asm/mshyperv.h>
|
2021-10-25 05:21:11 -07:00
|
|
|
#include <asm/hypervisor.h>
|
2023-05-02 05:09:19 -07:00
|
|
|
#include <asm/mtrr.h>
|
2023-08-18 03:29:17 -07:00
|
|
|
#include <asm/io_apic.h>
|
|
|
|
#include <asm/realmode.h>
|
|
|
|
#include <asm/e820/api.h>
|
|
|
|
#include <asm/desc.h>
|
2023-08-24 01:07:10 -07:00
|
|
|
#include <uapi/asm/vmx.h>
|
2021-10-25 05:21:11 -07:00
|
|
|
|
|
|
|
#ifdef CONFIG_AMD_MEM_ENCRYPT
|
2021-10-25 05:21:12 -07:00
|
|
|
|
|
|
|
#define GHCB_USAGE_HYPERV_CALL 1
|
|
|
|
|
2021-10-25 05:21:11 -07:00
|
|
|
union hv_ghcb {
|
|
|
|
struct ghcb ghcb;
|
2021-10-25 05:21:12 -07:00
|
|
|
struct {
|
|
|
|
u64 hypercalldata[509];
|
|
|
|
u64 outputgpa;
|
|
|
|
union {
|
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
u32 callcode : 16;
|
|
|
|
u32 isfast : 1;
|
|
|
|
u32 reserved1 : 14;
|
|
|
|
u32 isnested : 1;
|
|
|
|
u32 countofelements : 12;
|
|
|
|
u32 reserved2 : 4;
|
|
|
|
u32 repstartindex : 12;
|
|
|
|
u32 reserved3 : 4;
|
|
|
|
};
|
|
|
|
u64 asuint64;
|
|
|
|
} hypercallinput;
|
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
u16 callstatus;
|
|
|
|
u16 reserved1;
|
|
|
|
u32 elementsprocessed : 12;
|
|
|
|
u32 reserved2 : 20;
|
|
|
|
};
|
|
|
|
u64 asunit64;
|
|
|
|
} hypercalloutput;
|
|
|
|
};
|
|
|
|
u64 reserved2;
|
|
|
|
} hypercall;
|
2021-10-25 05:21:11 -07:00
|
|
|
} __packed __aligned(HV_HYP_PAGE_SIZE);
|
|
|
|
|
2023-08-24 01:07:12 -07:00
|
|
|
/* Only used in an SNP VM with the paravisor */
|
2022-06-13 18:45:53 -07:00
|
|
|
static u16 hv_ghcb_version __ro_after_init;
|
|
|
|
|
2023-08-24 01:07:12 -07:00
|
|
|
/* Functions only used in an SNP VM with the paravisor go here. */
|
2021-10-25 05:21:12 -07:00
|
|
|
u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
|
|
|
|
{
|
|
|
|
union hv_ghcb *hv_ghcb;
|
|
|
|
void **ghcb_base;
|
|
|
|
unsigned long flags;
|
|
|
|
u64 status;
|
|
|
|
|
|
|
|
if (!hv_ghcb_pg)
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
WARN_ON(in_nmi());
|
|
|
|
|
|
|
|
local_irq_save(flags);
|
|
|
|
ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
|
|
|
|
hv_ghcb = (union hv_ghcb *)*ghcb_base;
|
|
|
|
if (!hv_ghcb) {
|
|
|
|
local_irq_restore(flags);
|
|
|
|
return -EFAULT;
|
|
|
|
}
|
|
|
|
|
|
|
|
hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
|
|
|
|
hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
|
|
|
|
|
|
|
|
hv_ghcb->hypercall.outputgpa = (u64)output;
|
|
|
|
hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
|
|
|
|
hv_ghcb->hypercall.hypercallinput.callcode = control;
|
|
|
|
|
|
|
|
if (input_size)
|
|
|
|
memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
|
|
|
|
|
|
|
|
VMGEXIT();
|
|
|
|
|
|
|
|
hv_ghcb->ghcb.ghcb_usage = 0xffffffff;
|
|
|
|
memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
|
|
|
|
sizeof(hv_ghcb->ghcb.save.valid_bitmap));
|
|
|
|
|
|
|
|
status = hv_ghcb->hypercall.hypercalloutput.callstatus;
|
|
|
|
|
|
|
|
local_irq_restore(flags);
|
|
|
|
|
|
|
|
return status;
|
|
|
|
}
|
|
|
|
|
2022-06-13 18:45:53 -07:00
|
|
|
static inline u64 rd_ghcb_msr(void)
|
|
|
|
{
|
|
|
|
return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void wr_ghcb_msr(u64 val)
|
|
|
|
{
|
|
|
|
native_wrmsrl(MSR_AMD64_SEV_ES_GHCB, val);
|
|
|
|
}
|
|
|
|
|
|
|
|
static enum es_result hv_ghcb_hv_call(struct ghcb *ghcb, u64 exit_code,
|
|
|
|
u64 exit_info_1, u64 exit_info_2)
|
|
|
|
{
|
|
|
|
/* Fill in protocol and format specifiers */
|
|
|
|
ghcb->protocol_version = hv_ghcb_version;
|
|
|
|
ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
|
|
|
|
|
|
|
|
ghcb_set_sw_exit_code(ghcb, exit_code);
|
|
|
|
ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
|
|
|
|
ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
|
|
|
|
|
|
|
|
VMGEXIT();
|
|
|
|
|
|
|
|
if (ghcb->save.sw_exit_info_1 & GENMASK_ULL(31, 0))
|
|
|
|
return ES_VMM_ERROR;
|
|
|
|
else
|
|
|
|
return ES_OK;
|
|
|
|
}
|
|
|
|
|
x86/hyperv: Mark hv_ghcb_terminate() as noreturn
Annotate the function prototype and definition as noreturn to prevent
objtool warnings like:
vmlinux.o: warning: objtool: hyperv_init+0x55c: unreachable instruction
Also, as per Josh's suggestion, add it to the global_noreturns list.
As a comparison, an objdump output without the annotation:
[...]
1b63: mov $0x1,%esi
1b68: xor %edi,%edi
1b6a: callq ffffffff8102f680 <hv_ghcb_terminate>
1b6f: jmpq ffffffff82f217ec <hyperv_init+0x9c> # unreachable
1b74: cmpq $0xffffffffffffffff,-0x702a24(%rip)
[...]
Now, after adding the __noreturn to the function prototype:
[...]
17df: callq ffffffff8102f6d0 <hv_ghcb_negotiate_protocol>
17e4: test %al,%al
17e6: je ffffffff82f21bb9 <hyperv_init+0x469>
[...] <many insns>
1bb9: mov $0x1,%esi
1bbe: xor %edi,%edi
1bc0: callq ffffffff8102f680 <hv_ghcb_terminate>
1bc5: nopw %cs:0x0(%rax,%rax,1) # end of function
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/32453a703dfcf0d007b473c9acbf70718222b74b.1681342859.git.jpoimboe@kernel.org
2023-04-12 16:49:41 -07:00
|
|
|
void __noreturn hv_ghcb_terminate(unsigned int set, unsigned int reason)
|
2022-06-13 18:45:53 -07:00
|
|
|
{
|
|
|
|
u64 val = GHCB_MSR_TERM_REQ;
|
|
|
|
|
|
|
|
/* Tell the hypervisor what went wrong. */
|
|
|
|
val |= GHCB_SEV_TERM_REASON(set, reason);
|
|
|
|
|
2024-01-02 17:40:11 -07:00
|
|
|
/* Request Guest Termination from Hypervisor */
|
2022-06-13 18:45:53 -07:00
|
|
|
wr_ghcb_msr(val);
|
|
|
|
VMGEXIT();
|
|
|
|
|
|
|
|
while (true)
|
|
|
|
asm volatile("hlt\n" : : : "memory");
|
|
|
|
}
|
|
|
|
|
|
|
|
bool hv_ghcb_negotiate_protocol(void)
|
|
|
|
{
|
|
|
|
u64 ghcb_gpa;
|
|
|
|
u64 val;
|
|
|
|
|
|
|
|
/* Save ghcb page gpa. */
|
|
|
|
ghcb_gpa = rd_ghcb_msr();
|
|
|
|
|
|
|
|
/* Do the GHCB protocol version negotiation */
|
|
|
|
wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
|
|
|
|
VMGEXIT();
|
|
|
|
val = rd_ghcb_msr();
|
|
|
|
|
|
|
|
if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
|
|
|
|
GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
hv_ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val),
|
|
|
|
GHCB_PROTOCOL_MAX);
|
|
|
|
|
|
|
|
/* Write ghcb page back after negotiating protocol. */
|
|
|
|
wr_ghcb_msr(ghcb_gpa);
|
|
|
|
VMGEXIT();
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2023-08-24 01:07:10 -07:00
|
|
|
static void hv_ghcb_msr_write(u64 msr, u64 value)
|
2021-10-25 05:21:11 -07:00
|
|
|
{
|
|
|
|
union hv_ghcb *hv_ghcb;
|
|
|
|
void **ghcb_base;
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
if (!hv_ghcb_pg)
|
|
|
|
return;
|
|
|
|
|
|
|
|
WARN_ON(in_nmi());
|
|
|
|
|
|
|
|
local_irq_save(flags);
|
|
|
|
ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
|
|
|
|
hv_ghcb = (union hv_ghcb *)*ghcb_base;
|
|
|
|
if (!hv_ghcb) {
|
|
|
|
local_irq_restore(flags);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ghcb_set_rcx(&hv_ghcb->ghcb, msr);
|
|
|
|
ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
|
|
|
|
ghcb_set_rdx(&hv_ghcb->ghcb, upper_32_bits(value));
|
|
|
|
|
2022-06-13 18:45:53 -07:00
|
|
|
if (hv_ghcb_hv_call(&hv_ghcb->ghcb, SVM_EXIT_MSR, 1, 0))
|
2021-10-25 05:21:11 -07:00
|
|
|
pr_warn("Fail to write msr via ghcb %llx.\n", msr);
|
|
|
|
|
|
|
|
local_irq_restore(flags);
|
|
|
|
}
|
|
|
|
|
2023-08-24 01:07:10 -07:00
|
|
|
static void hv_ghcb_msr_read(u64 msr, u64 *value)
|
2021-10-25 05:21:11 -07:00
|
|
|
{
|
|
|
|
union hv_ghcb *hv_ghcb;
|
|
|
|
void **ghcb_base;
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
/* Check size of union hv_ghcb here. */
|
|
|
|
BUILD_BUG_ON(sizeof(union hv_ghcb) != HV_HYP_PAGE_SIZE);
|
|
|
|
|
|
|
|
if (!hv_ghcb_pg)
|
|
|
|
return;
|
|
|
|
|
|
|
|
WARN_ON(in_nmi());
|
|
|
|
|
|
|
|
local_irq_save(flags);
|
|
|
|
ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
|
|
|
|
hv_ghcb = (union hv_ghcb *)*ghcb_base;
|
|
|
|
if (!hv_ghcb) {
|
|
|
|
local_irq_restore(flags);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ghcb_set_rcx(&hv_ghcb->ghcb, msr);
|
2022-06-13 18:45:53 -07:00
|
|
|
if (hv_ghcb_hv_call(&hv_ghcb->ghcb, SVM_EXIT_MSR, 0, 0))
|
2021-10-25 05:21:11 -07:00
|
|
|
pr_warn("Fail to read msr via ghcb %llx.\n", msr);
|
|
|
|
else
|
|
|
|
*value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
|
|
|
|
| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
|
|
|
|
local_irq_restore(flags);
|
|
|
|
}
|
|
|
|
|
2023-08-24 01:07:12 -07:00
|
|
|
/* Only used in a fully enlightened SNP VM, i.e. without the paravisor */
|
|
|
|
static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
|
|
|
|
static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
|
|
|
|
static DEFINE_PER_CPU(struct sev_es_save_area *, hv_sev_vmsa);
|
|
|
|
|
|
|
|
/* Functions only used in an SNP VM without the paravisor go here. */
|
|
|
|
|
|
|
|
#define hv_populate_vmcb_seg(seg, gdtr_base) \
|
|
|
|
do { \
|
|
|
|
if (seg.selector) { \
|
|
|
|
seg.base = 0; \
|
|
|
|
seg.limit = HV_AP_SEGMENT_LIMIT; \
|
|
|
|
seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5); \
|
|
|
|
seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
|
|
|
|
} \
|
|
|
|
} while (0) \
|
|
|
|
|
|
|
|
static int snp_set_vmsa(void *va, bool vmsa)
|
|
|
|
{
|
|
|
|
u64 attrs;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Running at VMPL0 allows the kernel to change the VMSA bit for a page
|
|
|
|
* using the RMPADJUST instruction. However, for the instruction to
|
|
|
|
* succeed it must target the permissions of a lesser privileged
|
|
|
|
* (higher numbered) VMPL level, so use VMPL1 (refer to the RMPADJUST
|
|
|
|
* instruction in the AMD64 APM Volume 3).
|
|
|
|
*/
|
|
|
|
attrs = 1;
|
|
|
|
if (vmsa)
|
|
|
|
attrs |= RMPADJUST_VMSA_PAGE_BIT;
|
|
|
|
|
|
|
|
return rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = snp_set_vmsa(vmsa, false);
|
|
|
|
if (err)
|
|
|
|
pr_err("clear VMSA page failed (%u), leaking page\n", err);
|
|
|
|
else
|
|
|
|
free_page((unsigned long)vmsa);
|
|
|
|
}
|
|
|
|
|
2023-10-13 03:14:27 -07:00
|
|
|
int hv_snp_boot_ap(u32 cpu, unsigned long start_ip)
|
2023-08-24 01:07:12 -07:00
|
|
|
{
|
|
|
|
struct sev_es_save_area *vmsa = (struct sev_es_save_area *)
|
|
|
|
__get_free_page(GFP_KERNEL | __GFP_ZERO);
|
|
|
|
struct sev_es_save_area *cur_vmsa;
|
|
|
|
struct desc_ptr gdtr;
|
|
|
|
u64 ret, retry = 5;
|
|
|
|
struct hv_enable_vp_vtl *start_vp_input;
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
if (!vmsa)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
native_store_gdt(&gdtr);
|
|
|
|
|
|
|
|
vmsa->gdtr.base = gdtr.address;
|
|
|
|
vmsa->gdtr.limit = gdtr.size;
|
|
|
|
|
|
|
|
asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
|
|
|
|
hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
|
|
|
|
|
|
|
|
asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
|
|
|
|
hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
|
|
|
|
|
|
|
|
asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
|
|
|
|
hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
|
|
|
|
|
|
|
|
asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
|
|
|
|
hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
|
|
|
|
|
|
|
|
vmsa->efer = native_read_msr(MSR_EFER);
|
|
|
|
|
2024-08-05 13:12:47 -07:00
|
|
|
vmsa->cr4 = native_read_cr4();
|
|
|
|
vmsa->cr3 = __native_read_cr3();
|
|
|
|
vmsa->cr0 = native_read_cr0();
|
2023-08-24 01:07:12 -07:00
|
|
|
|
|
|
|
vmsa->xcr0 = 1;
|
|
|
|
vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
|
|
|
|
vmsa->rip = (u64)secondary_startup_64_no_verify;
|
|
|
|
vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set the SNP-specific fields for this VMSA:
|
|
|
|
* VMPL level
|
|
|
|
* SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
|
|
|
|
*/
|
|
|
|
vmsa->vmpl = 0;
|
|
|
|
vmsa->sev_features = sev_status >> 2;
|
|
|
|
|
|
|
|
ret = snp_set_vmsa(vmsa, true);
|
|
|
|
if (!ret) {
|
|
|
|
pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
|
|
|
|
free_page((u64)vmsa);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
local_irq_save(flags);
|
|
|
|
start_vp_input = (struct hv_enable_vp_vtl *)ap_start_input_arg;
|
|
|
|
memset(start_vp_input, 0, sizeof(*start_vp_input));
|
|
|
|
start_vp_input->partition_id = -1;
|
|
|
|
start_vp_input->vp_index = cpu;
|
|
|
|
start_vp_input->target_vtl.target_vtl = ms_hyperv.vtl;
|
|
|
|
*(u64 *)&start_vp_input->vp_context = __pa(vmsa) | 1;
|
|
|
|
|
|
|
|
do {
|
|
|
|
ret = hv_do_hypercall(HVCALL_START_VP,
|
|
|
|
start_vp_input, NULL);
|
|
|
|
} while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
|
|
|
|
|
|
|
|
local_irq_restore(flags);
|
|
|
|
|
|
|
|
if (!hv_result_success(ret)) {
|
|
|
|
pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
|
|
|
|
snp_cleanup_vmsa(vmsa);
|
|
|
|
vmsa = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
cur_vmsa = per_cpu(hv_sev_vmsa, cpu);
|
|
|
|
/* Free up any previous VMSA page */
|
|
|
|
if (cur_vmsa)
|
|
|
|
snp_cleanup_vmsa(cur_vmsa);
|
|
|
|
|
|
|
|
/* Record the current VMSA page */
|
|
|
|
per_cpu(hv_sev_vmsa, cpu) = vmsa;
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2023-08-24 01:07:10 -07:00
|
|
|
#else
|
|
|
|
static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
|
|
|
|
static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
|
x86/hyperv: Introduce a global variable hyperv_paravisor_present
The new variable hyperv_paravisor_present is set only when the VM
is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform().
We introduce hyperv_paravisor_present because we can not use
ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:
struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which
is included at the end of arch/x86/include/asm/mshyperv.h, but at the
beginning of arch/x86/include/asm/mshyperv.h, we would already need to use
struct ms_hyperv_info in hv_do_hypercall().
We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h,
and use ms_hyperv.paravisor_present elsewhere. In the future, we'll
introduce a hypercall function structure for different VM types, and
at boot time, the right function pointers would be written into the
structure so that runtime testing of TDX vs. SNP vs. normal will be
avoided and hyperv_paravisor_present will no longer be needed.
Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present
is true, i.e. the VM is a SNP VM or TDX VM with the paravisor.
Enhance hv_vtom_init() for a TDX VM with the paravisor.
In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
for a TDX VM with the paravisor, just like we don't decrypt the page
for a SNP VM with the paravisor.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-7-decui@microsoft.com
2023-08-24 01:07:08 -07:00
|
|
|
#endif /* CONFIG_AMD_MEM_ENCRYPT */
|
|
|
|
|
2023-08-24 01:07:10 -07:00
|
|
|
#ifdef CONFIG_INTEL_TDX_GUEST
|
|
|
|
static void hv_tdx_msr_write(u64 msr, u64 val)
|
|
|
|
{
|
2023-08-15 04:02:03 -07:00
|
|
|
struct tdx_module_args args = {
|
2023-08-24 01:07:10 -07:00
|
|
|
.r10 = TDX_HYPERCALL_STANDARD,
|
|
|
|
.r11 = EXIT_REASON_MSR_WRITE,
|
|
|
|
.r12 = msr,
|
|
|
|
.r13 = val,
|
|
|
|
};
|
|
|
|
|
|
|
|
u64 ret = __tdx_hypercall(&args);
|
|
|
|
|
|
|
|
WARN_ONCE(ret, "Failed to emulate MSR write: %lld\n", ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void hv_tdx_msr_read(u64 msr, u64 *val)
|
|
|
|
{
|
2023-08-15 04:02:03 -07:00
|
|
|
struct tdx_module_args args = {
|
2023-08-24 01:07:10 -07:00
|
|
|
.r10 = TDX_HYPERCALL_STANDARD,
|
|
|
|
.r11 = EXIT_REASON_MSR_READ,
|
|
|
|
.r12 = msr,
|
|
|
|
};
|
|
|
|
|
x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL
Now the 'struct tdx_hypercall_args' and 'struct tdx_module_args' are
almost the same, and the TDX_HYPERCALL and TDX_MODULE_CALL asm macro
share similar code pattern too. The __tdx_hypercall() and __tdcall()
should be unified to use the same assembly code.
As a preparation to unify them, simplify the TDX_HYPERCALL to make it
more like the TDX_MODULE_CALL.
The TDX_HYPERCALL takes the pointer of 'struct tdx_hypercall_args' as
function call argument, and does below extra things comparing to the
TDX_MODULE_CALL:
1) It sets RAX to 0 (TDG.VP.VMCALL leaf) internally;
2) It sets RCX to the (fixed) bitmap of shared registers internally;
3) It calls __tdx_hypercall_failed() internally (and panics) when the
TDCALL instruction itself fails;
4) After TDCALL, it moves R10 to RAX to return the return code of the
VMCALL leaf, regardless the '\ret' asm macro argument;
Firstly, change the TDX_HYPERCALL to take the same function call
arguments as the TDX_MODULE_CALL does: TDCALL leaf ID, and the pointer
to 'struct tdx_module_args'. Then 1) and 2) can be moved to the
caller:
- TDG.VP.VMCALL leaf ID can be passed via the function call argument;
- 'struct tdx_module_args' is 'struct tdx_hypercall_args' + RCX, thus
the bitmap of shared registers can be passed via RCX in the
structure.
Secondly, to move 3) and 4) out of assembly, make the TDX_HYPERCALL
always save output registers to the structure. The caller then can:
- Call __tdx_hypercall_failed() when TDX_HYPERCALL returns error;
- Return R10 in the structure as the return code of the VMCALL leaf;
With above changes, change the asm function from __tdx_hypercall() to
__tdcall_hypercall(), and reimplement __tdx_hypercall() as the C wrapper
of it. This avoids having to add another wrapper of __tdx_hypercall()
(_tdx_hypercall() is already taken).
The __tdcall_hypercall() will be replaced with a __tdcall() variant
using TDX_MODULE_CALL in a later commit as the final goal is to have one
assembly to handle both TDCALL and TDVMCALL.
Currently, the __tdx_hypercall() asm is in '.noinstr.text'. To keep
this unchanged, annotate __tdx_hypercall(), which is a C function now,
as 'noinstr'.
Remove the __tdx_hypercall_ret() as __tdx_hypercall() already does so.
Implement __tdx_hypercall() in tdx-shared.c so it can be shared with the
compressed code.
Opportunistically fix a checkpatch error complaining using space around
parenthesis '(' and ')' while moving the bitmap of shared registers to
<asm/shared/tdx.h>.
[ dhansen: quash new calls of __tdx_hypercall_ret() that showed up ]
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/0cbf25e7aee3256288045023a31f65f0cef90af4.1692096753.git.kai.huang%40intel.com
2023-08-15 04:02:01 -07:00
|
|
|
u64 ret = __tdx_hypercall(&args);
|
2023-08-24 01:07:10 -07:00
|
|
|
|
|
|
|
if (WARN_ONCE(ret, "Failed to emulate MSR read: %lld\n", ret))
|
|
|
|
*val = 0;
|
|
|
|
else
|
|
|
|
*val = args.r11;
|
|
|
|
}
|
2023-08-24 01:07:12 -07:00
|
|
|
|
|
|
|
u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
|
|
|
|
{
|
2023-08-15 04:02:03 -07:00
|
|
|
struct tdx_module_args args = { };
|
2023-08-24 01:07:12 -07:00
|
|
|
|
|
|
|
args.r10 = control;
|
|
|
|
args.rdx = param1;
|
|
|
|
args.r8 = param2;
|
|
|
|
|
x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL
Now the 'struct tdx_hypercall_args' and 'struct tdx_module_args' are
almost the same, and the TDX_HYPERCALL and TDX_MODULE_CALL asm macro
share similar code pattern too. The __tdx_hypercall() and __tdcall()
should be unified to use the same assembly code.
As a preparation to unify them, simplify the TDX_HYPERCALL to make it
more like the TDX_MODULE_CALL.
The TDX_HYPERCALL takes the pointer of 'struct tdx_hypercall_args' as
function call argument, and does below extra things comparing to the
TDX_MODULE_CALL:
1) It sets RAX to 0 (TDG.VP.VMCALL leaf) internally;
2) It sets RCX to the (fixed) bitmap of shared registers internally;
3) It calls __tdx_hypercall_failed() internally (and panics) when the
TDCALL instruction itself fails;
4) After TDCALL, it moves R10 to RAX to return the return code of the
VMCALL leaf, regardless the '\ret' asm macro argument;
Firstly, change the TDX_HYPERCALL to take the same function call
arguments as the TDX_MODULE_CALL does: TDCALL leaf ID, and the pointer
to 'struct tdx_module_args'. Then 1) and 2) can be moved to the
caller:
- TDG.VP.VMCALL leaf ID can be passed via the function call argument;
- 'struct tdx_module_args' is 'struct tdx_hypercall_args' + RCX, thus
the bitmap of shared registers can be passed via RCX in the
structure.
Secondly, to move 3) and 4) out of assembly, make the TDX_HYPERCALL
always save output registers to the structure. The caller then can:
- Call __tdx_hypercall_failed() when TDX_HYPERCALL returns error;
- Return R10 in the structure as the return code of the VMCALL leaf;
With above changes, change the asm function from __tdx_hypercall() to
__tdcall_hypercall(), and reimplement __tdx_hypercall() as the C wrapper
of it. This avoids having to add another wrapper of __tdx_hypercall()
(_tdx_hypercall() is already taken).
The __tdcall_hypercall() will be replaced with a __tdcall() variant
using TDX_MODULE_CALL in a later commit as the final goal is to have one
assembly to handle both TDCALL and TDVMCALL.
Currently, the __tdx_hypercall() asm is in '.noinstr.text'. To keep
this unchanged, annotate __tdx_hypercall(), which is a C function now,
as 'noinstr'.
Remove the __tdx_hypercall_ret() as __tdx_hypercall() already does so.
Implement __tdx_hypercall() in tdx-shared.c so it can be shared with the
compressed code.
Opportunistically fix a checkpatch error complaining using space around
parenthesis '(' and ')' while moving the bitmap of shared registers to
<asm/shared/tdx.h>.
[ dhansen: quash new calls of __tdx_hypercall_ret() that showed up ]
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/0cbf25e7aee3256288045023a31f65f0cef90af4.1692096753.git.kai.huang%40intel.com
2023-08-15 04:02:01 -07:00
|
|
|
(void)__tdx_hypercall(&args);
|
2023-08-24 01:07:12 -07:00
|
|
|
|
|
|
|
return args.r11;
|
|
|
|
}
|
|
|
|
|
2023-08-24 01:07:10 -07:00
|
|
|
#else
|
|
|
|
static inline void hv_tdx_msr_write(u64 msr, u64 value) {}
|
|
|
|
static inline void hv_tdx_msr_read(u64 msr, u64 *value) {}
|
|
|
|
#endif /* CONFIG_INTEL_TDX_GUEST */
|
|
|
|
|
|
|
|
#if defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_INTEL_TDX_GUEST)
|
|
|
|
void hv_ivm_msr_write(u64 msr, u64 value)
|
|
|
|
{
|
|
|
|
if (!ms_hyperv.paravisor_present)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (hv_isolation_type_tdx())
|
|
|
|
hv_tdx_msr_write(msr, value);
|
|
|
|
else if (hv_isolation_type_snp())
|
|
|
|
hv_ghcb_msr_write(msr, value);
|
|
|
|
}
|
|
|
|
|
|
|
|
void hv_ivm_msr_read(u64 msr, u64 *value)
|
|
|
|
{
|
|
|
|
if (!ms_hyperv.paravisor_present)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (hv_isolation_type_tdx())
|
|
|
|
hv_tdx_msr_read(msr, value);
|
|
|
|
else if (hv_isolation_type_snp())
|
|
|
|
hv_ghcb_msr_read(msr, value);
|
|
|
|
}
|
2021-10-25 05:21:11 -07:00
|
|
|
|
2021-10-25 05:21:08 -07:00
|
|
|
/*
|
|
|
|
* hv_mark_gpa_visibility - Set pages visible to host via hvcall.
|
|
|
|
*
|
|
|
|
* In Isolation VM, all guest memory is encrypted from host and guest
|
|
|
|
* needs to set memory visible to host via hvcall before sharing memory
|
|
|
|
* with host.
|
|
|
|
*/
|
|
|
|
static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
|
|
|
|
enum hv_mem_host_visibility visibility)
|
|
|
|
{
|
2023-06-20 11:40:38 -07:00
|
|
|
struct hv_gpa_range_for_visibility *input;
|
2021-10-25 05:21:08 -07:00
|
|
|
u16 pages_processed;
|
|
|
|
u64 hv_status;
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
/* no-op if partition isolation is not enabled */
|
|
|
|
if (!hv_is_isolation_supported())
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
|
|
|
|
pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
|
|
|
|
HV_MAX_MODIFY_GPA_REP_COUNT);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
local_irq_save(flags);
|
2023-06-20 11:40:38 -07:00
|
|
|
input = *this_cpu_ptr(hyperv_pcpu_input_arg);
|
|
|
|
|
2021-10-25 05:21:08 -07:00
|
|
|
if (unlikely(!input)) {
|
|
|
|
local_irq_restore(flags);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
input->partition_id = HV_PARTITION_ID_SELF;
|
|
|
|
input->host_visibility = visibility;
|
|
|
|
input->reserved0 = 0;
|
|
|
|
input->reserved1 = 0;
|
|
|
|
memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
|
|
|
|
hv_status = hv_do_rep_hypercall(
|
|
|
|
HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
|
|
|
|
0, input, &pages_processed);
|
|
|
|
local_irq_restore(flags);
|
|
|
|
|
|
|
|
if (hv_result_success(hv_status))
|
|
|
|
return 0;
|
|
|
|
else
|
|
|
|
return -EFAULT;
|
|
|
|
}
|
|
|
|
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
/*
|
|
|
|
* When transitioning memory between encrypted and decrypted, the caller
|
|
|
|
* of set_memory_encrypted() or set_memory_decrypted() is responsible for
|
|
|
|
* ensuring that the memory isn't in use and isn't referenced while the
|
|
|
|
* transition is in progress. The transition has multiple steps, and the
|
|
|
|
* memory is in an inconsistent state until all steps are complete. A
|
|
|
|
* reference while the state is inconsistent could result in an exception
|
|
|
|
* that can't be cleanly fixed up.
|
|
|
|
*
|
|
|
|
* But the Linux kernel load_unaligned_zeropad() mechanism could cause a
|
|
|
|
* stray reference that can't be prevented by the caller, so Linux has
|
|
|
|
* specific code to handle this case. But when the #VC and #VE exceptions
|
|
|
|
* routed to a paravisor, the specific code doesn't work. To avoid this
|
|
|
|
* problem, mark the pages as "not present" while the transition is in
|
|
|
|
* progress. If load_unaligned_zeropad() causes a stray reference, a normal
|
|
|
|
* page fault is generated instead of #VC or #VE, and the page-fault-based
|
|
|
|
* handlers for load_unaligned_zeropad() resolve the reference. When the
|
|
|
|
* transition is complete, hv_vtom_set_host_visibility() marks the pages
|
|
|
|
* as "present" again.
|
|
|
|
*/
|
2024-06-14 02:58:52 -07:00
|
|
|
static int hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc)
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
{
|
2024-06-14 02:58:52 -07:00
|
|
|
return set_memory_np(kbuffer, pagecount);
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
}
|
|
|
|
|
2021-10-25 05:21:08 -07:00
|
|
|
/*
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
* hv_vtom_set_host_visibility - Set specified memory visible to host.
|
2021-10-25 05:21:08 -07:00
|
|
|
*
|
|
|
|
* In Isolation VM, all guest memory is encrypted from host and guest
|
|
|
|
* needs to set memory visible to host via hvcall before sharing memory
|
|
|
|
* with host. This function works as wrap of hv_mark_gpa_visibility()
|
|
|
|
* with memory base and size.
|
|
|
|
*/
|
2024-06-14 02:58:52 -07:00
|
|
|
static int hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bool enc)
|
2021-10-25 05:21:08 -07:00
|
|
|
{
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
enum hv_mem_host_visibility visibility = enc ?
|
|
|
|
VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE;
|
2021-10-25 05:21:08 -07:00
|
|
|
u64 *pfn_array;
|
2024-01-15 19:20:06 -07:00
|
|
|
phys_addr_t paddr;
|
2024-06-14 02:58:52 -07:00
|
|
|
int i, pfn, err;
|
2024-01-15 19:20:06 -07:00
|
|
|
void *vaddr;
|
2021-10-25 05:21:08 -07:00
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
if (!pfn_array) {
|
2024-06-14 02:58:52 -07:00
|
|
|
ret = -ENOMEM;
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
goto err_set_memory_p;
|
|
|
|
}
|
2021-10-25 05:21:08 -07:00
|
|
|
|
|
|
|
for (i = 0, pfn = 0; i < pagecount; i++) {
|
2024-01-15 19:20:06 -07:00
|
|
|
/*
|
|
|
|
* Use slow_virt_to_phys() because the PRESENT bit has been
|
|
|
|
* temporarily cleared in the PTEs. slow_virt_to_phys() works
|
|
|
|
* without the PRESENT bit while virt_to_hvpfn() or similar
|
|
|
|
* does not.
|
|
|
|
*/
|
|
|
|
vaddr = (void *)kbuffer + (i * HV_HYP_PAGE_SIZE);
|
|
|
|
paddr = slow_virt_to_phys(vaddr);
|
|
|
|
pfn_array[pfn] = paddr >> HV_HYP_PAGE_SHIFT;
|
2021-10-25 05:21:08 -07:00
|
|
|
pfn++;
|
|
|
|
|
|
|
|
if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
|
|
|
|
ret = hv_mark_gpa_visibility(pfn, pfn_array,
|
|
|
|
visibility);
|
2024-06-14 02:58:52 -07:00
|
|
|
if (ret)
|
2021-10-25 05:21:08 -07:00
|
|
|
goto err_free_pfn_array;
|
|
|
|
pfn = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
err_free_pfn_array:
|
2021-10-25 05:21:08 -07:00
|
|
|
kfree(pfn_array);
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
|
|
|
|
err_set_memory_p:
|
|
|
|
/*
|
|
|
|
* Set the PTE PRESENT bits again to revert what hv_vtom_clear_present()
|
|
|
|
* did. Do this even if there is an error earlier in this function in
|
|
|
|
* order to avoid leaving the memory range in a "broken" state. Setting
|
|
|
|
* the PRESENT bits shouldn't fail, but return an error if it does.
|
|
|
|
*/
|
2024-06-14 02:58:52 -07:00
|
|
|
err = set_memory_p(kbuffer, pagecount);
|
|
|
|
if (err && !ret)
|
|
|
|
ret = err;
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
|
2024-06-14 02:58:52 -07:00
|
|
|
return ret;
|
2021-10-25 05:21:08 -07:00
|
|
|
}
|
2021-12-13 00:14:06 -07:00
|
|
|
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
static bool hv_vtom_tlb_flush_required(bool private)
|
|
|
|
{
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
/*
|
|
|
|
* Since hv_vtom_clear_present() marks the PTEs as "not present"
|
|
|
|
* and flushes the TLB, they can't be in the TLB. That makes the
|
|
|
|
* flush controlled by this function redundant, so return "false".
|
|
|
|
*/
|
|
|
|
return false;
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static bool hv_vtom_cache_flush_required(void)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool hv_is_private_mmio(u64 addr)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Hyper-V always provides a single IO-APIC in a guest VM.
|
|
|
|
* When a paravisor is used, it is emulated by the paravisor
|
|
|
|
* in the guest context and must be mapped private.
|
|
|
|
*/
|
|
|
|
if (addr >= HV_IOAPIC_BASE_ADDRESS &&
|
|
|
|
addr < (HV_IOAPIC_BASE_ADDRESS + PAGE_SIZE))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
/* Same with a vTPM */
|
|
|
|
if (addr >= VTPM_BASE_ADDRESS &&
|
|
|
|
addr < (VTPM_BASE_ADDRESS + PAGE_SIZE))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
void __init hv_vtom_init(void)
|
|
|
|
{
|
x86/hyperv: Introduce a global variable hyperv_paravisor_present
The new variable hyperv_paravisor_present is set only when the VM
is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform().
We introduce hyperv_paravisor_present because we can not use
ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:
struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which
is included at the end of arch/x86/include/asm/mshyperv.h, but at the
beginning of arch/x86/include/asm/mshyperv.h, we would already need to use
struct ms_hyperv_info in hv_do_hypercall().
We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h,
and use ms_hyperv.paravisor_present elsewhere. In the future, we'll
introduce a hypercall function structure for different VM types, and
at boot time, the right function pointers would be written into the
structure so that runtime testing of TDX vs. SNP vs. normal will be
avoided and hyperv_paravisor_present will no longer be needed.
Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present
is true, i.e. the VM is a SNP VM or TDX VM with the paravisor.
Enhance hv_vtom_init() for a TDX VM with the paravisor.
In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
for a TDX VM with the paravisor, just like we don't decrypt the page
for a SNP VM with the paravisor.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-7-decui@microsoft.com
2023-08-24 01:07:08 -07:00
|
|
|
enum hv_isolation_type type = hv_get_isolation_type();
|
|
|
|
|
|
|
|
switch (type) {
|
|
|
|
case HV_ISOLATION_TYPE_VBS:
|
|
|
|
fallthrough;
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
/*
|
|
|
|
* By design, a VM using vTOM doesn't see the SEV setting,
|
|
|
|
* so SEV initialization is bypassed and sev_status isn't set.
|
|
|
|
* Set it here to indicate a vTOM VM.
|
x86/hyperv: Introduce a global variable hyperv_paravisor_present
The new variable hyperv_paravisor_present is set only when the VM
is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform().
We introduce hyperv_paravisor_present because we can not use
ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:
struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which
is included at the end of arch/x86/include/asm/mshyperv.h, but at the
beginning of arch/x86/include/asm/mshyperv.h, we would already need to use
struct ms_hyperv_info in hv_do_hypercall().
We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h,
and use ms_hyperv.paravisor_present elsewhere. In the future, we'll
introduce a hypercall function structure for different VM types, and
at boot time, the right function pointers would be written into the
structure so that runtime testing of TDX vs. SNP vs. normal will be
avoided and hyperv_paravisor_present will no longer be needed.
Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present
is true, i.e. the VM is a SNP VM or TDX VM with the paravisor.
Enhance hv_vtom_init() for a TDX VM with the paravisor.
In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
for a TDX VM with the paravisor, just like we don't decrypt the page
for a SNP VM with the paravisor.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-7-decui@microsoft.com
2023-08-24 01:07:08 -07:00
|
|
|
*
|
|
|
|
* Note: if CONFIG_AMD_MEM_ENCRYPT is not set, sev_status is
|
|
|
|
* defined as 0ULL, to which we can't assigned a value.
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
*/
|
x86/hyperv: Introduce a global variable hyperv_paravisor_present
The new variable hyperv_paravisor_present is set only when the VM
is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform().
We introduce hyperv_paravisor_present because we can not use
ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:
struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which
is included at the end of arch/x86/include/asm/mshyperv.h, but at the
beginning of arch/x86/include/asm/mshyperv.h, we would already need to use
struct ms_hyperv_info in hv_do_hypercall().
We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h,
and use ms_hyperv.paravisor_present elsewhere. In the future, we'll
introduce a hypercall function structure for different VM types, and
at boot time, the right function pointers would be written into the
structure so that runtime testing of TDX vs. SNP vs. normal will be
avoided and hyperv_paravisor_present will no longer be needed.
Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present
is true, i.e. the VM is a SNP VM or TDX VM with the paravisor.
Enhance hv_vtom_init() for a TDX VM with the paravisor.
In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
for a TDX VM with the paravisor, just like we don't decrypt the page
for a SNP VM with the paravisor.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-7-decui@microsoft.com
2023-08-24 01:07:08 -07:00
|
|
|
#ifdef CONFIG_AMD_MEM_ENCRYPT
|
|
|
|
case HV_ISOLATION_TYPE_SNP:
|
|
|
|
sev_status = MSR_AMD64_SNP_VTOM;
|
|
|
|
cc_vendor = CC_VENDOR_AMD;
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
case HV_ISOLATION_TYPE_TDX:
|
|
|
|
cc_vendor = CC_VENDOR_INTEL;
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
panic("hv_vtom_init: unsupported isolation type %d\n", type);
|
|
|
|
}
|
|
|
|
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
cc_set_mask(ms_hyperv.shared_gpa_boundary);
|
|
|
|
physical_mask &= ms_hyperv.shared_gpa_boundary - 1;
|
|
|
|
|
|
|
|
x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
|
|
|
|
x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
|
|
|
|
x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
|
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
2024-01-15 19:20:08 -07:00
|
|
|
x86_platform.guest.enc_status_change_prepare = hv_vtom_clear_present;
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
|
2023-05-02 05:09:19 -07:00
|
|
|
|
|
|
|
/* Set WB as the default cache mode. */
|
|
|
|
mtrr_overwrite_state(NULL, 0, MTRR_TYPE_WRBACK);
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
}
|
|
|
|
|
x86/hyperv: Introduce a global variable hyperv_paravisor_present
The new variable hyperv_paravisor_present is set only when the VM
is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform().
We introduce hyperv_paravisor_present because we can not use
ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:
struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which
is included at the end of arch/x86/include/asm/mshyperv.h, but at the
beginning of arch/x86/include/asm/mshyperv.h, we would already need to use
struct ms_hyperv_info in hv_do_hypercall().
We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h,
and use ms_hyperv.paravisor_present elsewhere. In the future, we'll
introduce a hypercall function structure for different VM types, and
at boot time, the right function pointers would be written into the
structure so that runtime testing of TDX vs. SNP vs. normal will be
avoided and hyperv_paravisor_present will no longer be needed.
Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present
is true, i.e. the VM is a SNP VM or TDX VM with the paravisor.
Enhance hv_vtom_init() for a TDX VM with the paravisor.
In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
for a TDX VM with the paravisor, just like we don't decrypt the page
for a SNP VM with the paravisor.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-7-decui@microsoft.com
2023-08-24 01:07:08 -07:00
|
|
|
#endif /* defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_INTEL_TDX_GUEST) */
|
x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.
vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.
When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26 06:52:01 -07:00
|
|
|
|
2023-03-26 06:51:57 -07:00
|
|
|
enum hv_isolation_type hv_get_isolation_type(void)
|
|
|
|
{
|
|
|
|
if (!(ms_hyperv.priv_high & HV_ISOLATION))
|
|
|
|
return HV_ISOLATION_TYPE_NONE;
|
|
|
|
return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(hv_get_isolation_type);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* hv_is_isolation_supported - Check system runs in the Hyper-V
|
|
|
|
* isolation VM.
|
|
|
|
*/
|
|
|
|
bool hv_is_isolation_supported(void)
|
|
|
|
{
|
|
|
|
if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
|
|
|
|
}
|
|
|
|
|
|
|
|
DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
|
|
|
|
|
|
|
|
/*
|
2023-08-24 01:07:11 -07:00
|
|
|
* hv_isolation_type_snp - Check if the system runs in an AMD SEV-SNP based
|
2023-03-26 06:51:57 -07:00
|
|
|
* isolation VM.
|
|
|
|
*/
|
|
|
|
bool hv_isolation_type_snp(void)
|
|
|
|
{
|
|
|
|
return static_branch_unlikely(&isolation_type_snp);
|
|
|
|
}
|
2023-08-18 03:29:11 -07:00
|
|
|
|
2023-08-24 01:07:03 -07:00
|
|
|
DEFINE_STATIC_KEY_FALSE(isolation_type_tdx);
|
|
|
|
/*
|
|
|
|
* hv_isolation_type_tdx - Check if the system runs in an Intel TDX based
|
|
|
|
* isolated VM.
|
|
|
|
*/
|
|
|
|
bool hv_isolation_type_tdx(void)
|
|
|
|
{
|
|
|
|
return static_branch_unlikely(&isolation_type_tdx);
|
|
|
|
}
|