1

x86/resctrl: Update documentation with Sub-NUMA cluster changes

With Sub-NUMA Cluster (SNC) mode enabled, the scope of monitoring resources
is per-NODE instead of per-L3 cache. Backwards compatibility is maintained
by providing files in the mon_L3_XX directories that sum event counts
for all SNC nodes sharing an L3 cache.

New files provide per-SNC node event counts.

Users should be aware that SNC mode also affects the amount of L3 cache
available for allocation within each SNC node.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20240628215619.76401-20-tony.luck@intel.com
This commit is contained in:
Tony Luck 2024-06-28 14:56:19 -07:00 committed by Borislav Petkov (AMD)
parent 13488150f5
commit ea34999f41

View File

@ -375,6 +375,10 @@ When monitoring is enabled all MON groups will also contain:
all tasks in the group. In CTRL_MON groups these files provide
the sum for all tasks in the CTRL_MON group and all tasks in
MON groups. Please see example section for more details on usage.
On systems with Sub-NUMA Cluster (SNC) enabled there are extra
directories for each node (located within the "mon_L3_XX" directory
for the L3 cache they occupy). These are named "mon_sub_L3_YY"
where "YY" is the node number.
"mon_hw_id":
Available only with debug option. The identifier used by hardware
@ -484,6 +488,29 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
each bit represents 5% of the capacity of the cache. You could partition
the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
Notes on Sub-NUMA Cluster mode
==============================
When SNC mode is enabled, Linux may load balance tasks between Sub-NUMA
nodes much more readily than between regular NUMA nodes since the CPUs
on Sub-NUMA nodes share the same L3 cache and the system may report
the NUMA distance between Sub-NUMA nodes with a lower value than used
for regular NUMA nodes.
The top-level monitoring files in each "mon_L3_XX" directory provide
the sum of data across all SNC nodes sharing an L3 cache instance.
Users who bind tasks to the CPUs of a specific Sub-NUMA node can read
the "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes" in the
"mon_sub_L3_YY" directories to get node local data.
Memory bandwidth allocation is still performed at the L3 cache
level. I.e. throttling controls are applied to all SNC nodes.
L3 cache allocation bitmaps also apply to all SNC nodes. But note that
the amount of L3 cache represented by each bit is divided by the number
of SNC nodes per L3 cache. E.g. with a 100MB cache on a system with 10-bit
allocation masks each bit normally represents 10MB. With SNC mode enabled
with two SNC nodes per L3 cache, each bit only represents 5MB.
Memory bandwidth Allocation and monitoring
==========================================