A set of X86 fixes:
- x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable it thereby creating inconsistent state. Reorder the logic so it actually works correctly - The XSTATE logic for handling LBR is incorrect as it assumes that XSAVES supports LBR when the CPU supports LBR. In fact both conditions need to be true. Otherwise the enablement of LBR in the IA32_XSS MSR fails and subsequently the machine crashes on the next XRSTORS operation because IA32_XSS is not initialized. Cache the XSTATE support bit during init and make the related functions use this cached information and the LBR CPU feature bit to cure this. - Cure a long standing bug in KASLR KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. - Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of an initialized variabled on the stack to the VMM. The variable is only required as output value, so it does not have to exposed to the VMM in the first place. - Prevent an array overrun in the resource control code on systems with Sub-NUMA Clustering enabled because the code failed to adjust the index by the number of SNC nodes per L3 cache. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbUUu0THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYodFsEADFgxq2wjnH+VpuaIhLiQIfUa7iVeUl bwHAakZRMJ+Cb8BsvaRCMdAWWF+cRdLabAHuh7MRJFFzzdwrVTswnxT9baUBBjEe Kd3ZeQOS4AvWxpJNQEDg9r7tYtavmml9ix+Jh0OF+YmXLIweQk5RhDN+ncha07cJ 0DuPt4ngI24iyAyUX+7gZsRZiwoOm0HqImaRiisaspTbGpNwnrwFQCEioCdwnAv0 H5S7WTAlsZURCINLBNT+fV5oPjk2E3Ckj/CCJGoG1LYedGUD/44M1Hj0Xsqm4pHF Zd0+CuFyYpGqkAuBY6moWOheYP8V2U+yhf9Rtvh8/+h3qxZ/yon5i0ycO/2wMjiF 0NBomMeKh4PNyefYq8lHWK3kcXphrXH3yv09wVBDdLMXDy98beuS5NScGgza8148 /nqq0l1uLUyM9TkWg9H+4wW73EzQW1DYIliDU3tC98u+E77kQbyCx+2f0WI2k+ar 3wy7nYzyEJXl38NUTB+La4xXbhsELcaYQ/Q6scIsWAL+6+KlRb3FNBn+HT+KmOmF y702km/28C0uxrLk2OQCjX/zXQtXe2/4aoUzGqFf9atsifa0IBrc8YBzdIDB49Jt zz/MOAZTcz4jfyD3sRfYuG2QhBbdTz3f/kd3OryquitdAGozpoeztMIGs1PU2Y6s zInlLtUwaosadg== =T4i1 -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable it thereby creating inconsistent state. Reorder the logic so it actually works correctly - The XSTATE logic for handling LBR is incorrect as it assumes that XSAVES supports LBR when the CPU supports LBR. In fact both conditions need to be true. Otherwise the enablement of LBR in the IA32_XSS MSR fails and subsequently the machine crashes on the next XRSTORS operation because IA32_XSS is not initialized. Cache the XSTATE support bit during init and make the related functions use this cached information and the LBR CPU feature bit to cure this. - Cure a long standing bug in KASLR KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. - Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of an initialized variabled on the stack to the VMM. The variable is only required as output value, so it does not have to exposed to the VMM in the first place. - Prevent an array overrun in the resource control code on systems with Sub-NUMA Clustering enabled because the code failed to adjust the index by the number of SNC nodes per L3 cache. * tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix arch_mbm_* array overrun on SNC x86/tdx: Fix data leak in mmio_read() x86/kaslr: Expose and use the end of the physical memory address space x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported x86/apic: Make x2apic_disable() work correctly
This commit is contained in:
commit
c9f016e72b
@ -389,7 +389,6 @@ static bool mmio_read(int size, unsigned long addr, unsigned long *val)
|
|||||||
.r12 = size,
|
.r12 = size,
|
||||||
.r13 = EPT_READ,
|
.r13 = EPT_READ,
|
||||||
.r14 = addr,
|
.r14 = addr,
|
||||||
.r15 = *val,
|
|
||||||
};
|
};
|
||||||
|
|
||||||
if (__tdx_hypercall(&args))
|
if (__tdx_hypercall(&args))
|
||||||
|
@ -591,6 +591,13 @@ struct fpu_state_config {
|
|||||||
* even without XSAVE support, i.e. legacy features FP + SSE
|
* even without XSAVE support, i.e. legacy features FP + SSE
|
||||||
*/
|
*/
|
||||||
u64 legacy_features;
|
u64 legacy_features;
|
||||||
|
/*
|
||||||
|
* @independent_features:
|
||||||
|
*
|
||||||
|
* Features that are supported by XSAVES, but not managed as part of
|
||||||
|
* the FPU core, such as LBR
|
||||||
|
*/
|
||||||
|
u64 independent_features;
|
||||||
};
|
};
|
||||||
|
|
||||||
/* FPU state configuration information */
|
/* FPU state configuration information */
|
||||||
|
@ -17,6 +17,7 @@ extern unsigned long phys_base;
|
|||||||
extern unsigned long page_offset_base;
|
extern unsigned long page_offset_base;
|
||||||
extern unsigned long vmalloc_base;
|
extern unsigned long vmalloc_base;
|
||||||
extern unsigned long vmemmap_base;
|
extern unsigned long vmemmap_base;
|
||||||
|
extern unsigned long physmem_end;
|
||||||
|
|
||||||
static __always_inline unsigned long __phys_addr_nodebug(unsigned long x)
|
static __always_inline unsigned long __phys_addr_nodebug(unsigned long x)
|
||||||
{
|
{
|
||||||
|
@ -140,6 +140,10 @@ extern unsigned int ptrs_per_p4d;
|
|||||||
# define VMEMMAP_START __VMEMMAP_BASE_L4
|
# define VMEMMAP_START __VMEMMAP_BASE_L4
|
||||||
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
|
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
|
||||||
|
|
||||||
|
#ifdef CONFIG_RANDOMIZE_MEMORY
|
||||||
|
# define PHYSMEM_END physmem_end
|
||||||
|
#endif
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* End of the region for which vmalloc page tables are pre-allocated.
|
* End of the region for which vmalloc page tables are pre-allocated.
|
||||||
* For non-KMSAN builds, this is the same as VMALLOC_END.
|
* For non-KMSAN builds, this is the same as VMALLOC_END.
|
||||||
|
@ -156,12 +156,6 @@ static inline void resctrl_sched_in(struct task_struct *tsk)
|
|||||||
__resctrl_sched_in(tsk);
|
__resctrl_sched_in(tsk);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline u32 resctrl_arch_system_num_rmid_idx(void)
|
|
||||||
{
|
|
||||||
/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
|
|
||||||
return boot_cpu_data.x86_cache_max_rmid + 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
|
static inline void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
|
||||||
{
|
{
|
||||||
*rmid = idx;
|
*rmid = idx;
|
||||||
|
@ -1775,12 +1775,9 @@ static __init void apic_set_fixmap(bool read_apic);
|
|||||||
|
|
||||||
static __init void x2apic_disable(void)
|
static __init void x2apic_disable(void)
|
||||||
{
|
{
|
||||||
u32 x2apic_id, state = x2apic_state;
|
u32 x2apic_id;
|
||||||
|
|
||||||
x2apic_mode = 0;
|
if (x2apic_state < X2APIC_ON)
|
||||||
x2apic_state = X2APIC_DISABLED;
|
|
||||||
|
|
||||||
if (state != X2APIC_ON)
|
|
||||||
return;
|
return;
|
||||||
|
|
||||||
x2apic_id = read_apic_id();
|
x2apic_id = read_apic_id();
|
||||||
@ -1793,6 +1790,10 @@ static __init void x2apic_disable(void)
|
|||||||
}
|
}
|
||||||
|
|
||||||
__x2apic_disable();
|
__x2apic_disable();
|
||||||
|
|
||||||
|
x2apic_mode = 0;
|
||||||
|
x2apic_state = X2APIC_DISABLED;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Don't reread the APIC ID as it was already done from
|
* Don't reread the APIC ID as it was already done from
|
||||||
* check_x2apic() and the APIC driver still is a x2APIC variant,
|
* check_x2apic() and the APIC driver still is a x2APIC variant,
|
||||||
|
@ -119,6 +119,14 @@ struct rdt_hw_resource rdt_resources_all[] = {
|
|||||||
},
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
u32 resctrl_arch_system_num_rmid_idx(void)
|
||||||
|
{
|
||||||
|
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
|
||||||
|
|
||||||
|
/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
|
||||||
|
return r->num_rmid;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
|
* cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
|
||||||
* as they do not have CPUID enumeration support for Cache allocation.
|
* as they do not have CPUID enumeration support for Cache allocation.
|
||||||
|
@ -788,6 +788,9 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
|
|||||||
goto out_disable;
|
goto out_disable;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
|
||||||
|
XFEATURE_MASK_INDEPENDENT;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Clear XSAVE features that are disabled in the normal CPUID.
|
* Clear XSAVE features that are disabled in the normal CPUID.
|
||||||
*/
|
*/
|
||||||
|
@ -62,9 +62,9 @@ static inline u64 xfeatures_mask_supervisor(void)
|
|||||||
static inline u64 xfeatures_mask_independent(void)
|
static inline u64 xfeatures_mask_independent(void)
|
||||||
{
|
{
|
||||||
if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR))
|
if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR))
|
||||||
return XFEATURE_MASK_INDEPENDENT & ~XFEATURE_MASK_LBR;
|
return fpu_kernel_cfg.independent_features & ~XFEATURE_MASK_LBR;
|
||||||
|
|
||||||
return XFEATURE_MASK_INDEPENDENT;
|
return fpu_kernel_cfg.independent_features;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* XSAVE/XRSTOR wrapper functions */
|
/* XSAVE/XRSTOR wrapper functions */
|
||||||
|
@ -958,8 +958,12 @@ static void update_end_of_memory_vars(u64 start, u64 size)
|
|||||||
int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
|
int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
|
||||||
struct mhp_params *params)
|
struct mhp_params *params)
|
||||||
{
|
{
|
||||||
|
unsigned long end = ((start_pfn + nr_pages) << PAGE_SHIFT) - 1;
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
|
if (WARN_ON_ONCE(end > PHYSMEM_END))
|
||||||
|
return -ERANGE;
|
||||||
|
|
||||||
ret = __add_pages(nid, start_pfn, nr_pages, params);
|
ret = __add_pages(nid, start_pfn, nr_pages, params);
|
||||||
WARN_ON_ONCE(ret);
|
WARN_ON_ONCE(ret);
|
||||||
|
|
||||||
|
@ -47,13 +47,24 @@ static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
|
|||||||
*/
|
*/
|
||||||
static __initdata struct kaslr_memory_region {
|
static __initdata struct kaslr_memory_region {
|
||||||
unsigned long *base;
|
unsigned long *base;
|
||||||
|
unsigned long *end;
|
||||||
unsigned long size_tb;
|
unsigned long size_tb;
|
||||||
} kaslr_regions[] = {
|
} kaslr_regions[] = {
|
||||||
{ &page_offset_base, 0 },
|
{
|
||||||
{ &vmalloc_base, 0 },
|
.base = &page_offset_base,
|
||||||
{ &vmemmap_base, 0 },
|
.end = &physmem_end,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
.base = &vmalloc_base,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
.base = &vmemmap_base,
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/* The end of the possible address space for physical memory */
|
||||||
|
unsigned long physmem_end __ro_after_init;
|
||||||
|
|
||||||
/* Get size in bytes used by the memory region */
|
/* Get size in bytes used by the memory region */
|
||||||
static inline unsigned long get_padding(struct kaslr_memory_region *region)
|
static inline unsigned long get_padding(struct kaslr_memory_region *region)
|
||||||
{
|
{
|
||||||
@ -82,6 +93,8 @@ void __init kernel_randomize_memory(void)
|
|||||||
BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE);
|
BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE);
|
||||||
BUILD_BUG_ON(vaddr_end > __START_KERNEL_map);
|
BUILD_BUG_ON(vaddr_end > __START_KERNEL_map);
|
||||||
|
|
||||||
|
/* Preset the end of the possible address space for physical memory */
|
||||||
|
physmem_end = ((1ULL << MAX_PHYSMEM_BITS) - 1);
|
||||||
if (!kaslr_memory_enabled())
|
if (!kaslr_memory_enabled())
|
||||||
return;
|
return;
|
||||||
|
|
||||||
@ -128,11 +141,18 @@ void __init kernel_randomize_memory(void)
|
|||||||
vaddr += entropy;
|
vaddr += entropy;
|
||||||
*kaslr_regions[i].base = vaddr;
|
*kaslr_regions[i].base = vaddr;
|
||||||
|
|
||||||
/*
|
/* Calculate the end of the region */
|
||||||
* Jump the region and add a minimum padding based on
|
|
||||||
* randomization alignment.
|
|
||||||
*/
|
|
||||||
vaddr += get_padding(&kaslr_regions[i]);
|
vaddr += get_padding(&kaslr_regions[i]);
|
||||||
|
/*
|
||||||
|
* KASLR trims the maximum possible size of the
|
||||||
|
* direct-map. Update the physmem_end boundary.
|
||||||
|
* No rounding required as the region starts
|
||||||
|
* PUD aligned and size is in units of TB.
|
||||||
|
*/
|
||||||
|
if (kaslr_regions[i].end)
|
||||||
|
*kaslr_regions[i].end = __pa_nodebug(vaddr - 1);
|
||||||
|
|
||||||
|
/* Add a minimum padding based on randomization alignment. */
|
||||||
vaddr = round_up(vaddr + 1, PUD_SIZE);
|
vaddr = round_up(vaddr + 1, PUD_SIZE);
|
||||||
remain_entropy -= entropy;
|
remain_entropy -= entropy;
|
||||||
}
|
}
|
||||||
|
@ -97,6 +97,10 @@ extern const int mmap_rnd_compat_bits_max;
|
|||||||
extern int mmap_rnd_compat_bits __read_mostly;
|
extern int mmap_rnd_compat_bits __read_mostly;
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
#ifndef PHYSMEM_END
|
||||||
|
# define PHYSMEM_END ((1ULL << MAX_PHYSMEM_BITS) - 1)
|
||||||
|
#endif
|
||||||
|
|
||||||
#include <asm/page.h>
|
#include <asm/page.h>
|
||||||
#include <asm/processor.h>
|
#include <asm/processor.h>
|
||||||
|
|
||||||
|
@ -248,6 +248,7 @@ struct resctrl_schema {
|
|||||||
|
|
||||||
/* The number of closid supported by this resource regardless of CDP */
|
/* The number of closid supported by this resource regardless of CDP */
|
||||||
u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
|
u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
|
||||||
|
u32 resctrl_arch_system_num_rmid_idx(void);
|
||||||
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
|
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -1826,8 +1826,7 @@ static resource_size_t gfr_start(struct resource *base, resource_size_t size,
|
|||||||
if (flags & GFR_DESCENDING) {
|
if (flags & GFR_DESCENDING) {
|
||||||
resource_size_t end;
|
resource_size_t end;
|
||||||
|
|
||||||
end = min_t(resource_size_t, base->end,
|
end = min_t(resource_size_t, base->end, PHYSMEM_END);
|
||||||
(1ULL << MAX_PHYSMEM_BITS) - 1);
|
|
||||||
return end - size + 1;
|
return end - size + 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -1844,8 +1843,7 @@ static bool gfr_continue(struct resource *base, resource_size_t addr,
|
|||||||
* @size did not wrap 0.
|
* @size did not wrap 0.
|
||||||
*/
|
*/
|
||||||
return addr > addr - size &&
|
return addr > addr - size &&
|
||||||
addr <= min_t(resource_size_t, base->end,
|
addr <= min_t(resource_size_t, base->end, PHYSMEM_END);
|
||||||
(1ULL << MAX_PHYSMEM_BITS) - 1);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
|
static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
|
||||||
|
@ -1681,7 +1681,7 @@ struct range __weak arch_get_mappable_range(void)
|
|||||||
|
|
||||||
struct range mhp_get_pluggable_range(bool need_mapping)
|
struct range mhp_get_pluggable_range(bool need_mapping)
|
||||||
{
|
{
|
||||||
const u64 max_phys = (1ULL << MAX_PHYSMEM_BITS) - 1;
|
const u64 max_phys = PHYSMEM_END;
|
||||||
struct range mhp_range;
|
struct range mhp_range;
|
||||||
|
|
||||||
if (need_mapping) {
|
if (need_mapping) {
|
||||||
|
@ -129,7 +129,7 @@ static inline int sparse_early_nid(struct mem_section *section)
|
|||||||
static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
|
static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
|
||||||
unsigned long *end_pfn)
|
unsigned long *end_pfn)
|
||||||
{
|
{
|
||||||
unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
|
unsigned long max_sparsemem_pfn = (PHYSMEM_END + 1) >> PAGE_SHIFT;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Sanity checks - do not allow an architecture to pass
|
* Sanity checks - do not allow an architecture to pass
|
||||||
|
Loading…
Reference in New Issue
Block a user