2021-08-12 21:22:23 +08:00
|
|
|
|
============================
|
|
|
|
|
NUMA resource associativity
|
2021-08-25 12:24:47 +08:00
|
|
|
|
============================
|
2021-08-12 21:22:23 +08:00
|
|
|
|
|
|
|
|
|
Associativity represents the groupings of the various platform resources into
|
|
|
|
|
domains of substantially similar mean performance relative to resources outside
|
|
|
|
|
of that domain. Resources subsets of a given domain that exhibit better
|
|
|
|
|
performance relative to each other than relative to other resources subsets
|
|
|
|
|
are represented as being members of a sub-grouping domain. This performance
|
|
|
|
|
characteristic is presented in terms of NUMA node distance within the Linux kernel.
|
|
|
|
|
From the platform view, these groups are also referred to as domains.
|
|
|
|
|
|
|
|
|
|
PAPR interface currently supports different ways of communicating these resource
|
|
|
|
|
grouping details to the OS. These are referred to as Form 0, Form 1 and Form2
|
|
|
|
|
associativity grouping. Form 0 is the oldest format and is now considered deprecated.
|
|
|
|
|
|
|
|
|
|
Hypervisor indicates the type/form of associativity used via "ibm,architecture-vec-5 property".
|
|
|
|
|
Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of Form 0 or Form 1.
|
|
|
|
|
A value of 1 indicates the usage of Form 1 associativity. For Form 2 associativity
|
|
|
|
|
bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used.
|
|
|
|
|
|
|
|
|
|
Form 0
|
2021-08-25 12:24:47 +08:00
|
|
|
|
------
|
2021-08-12 21:22:23 +08:00
|
|
|
|
Form 0 associativity supports only two NUMA distances (LOCAL and REMOTE).
|
|
|
|
|
|
|
|
|
|
Form 1
|
2021-08-25 12:24:47 +08:00
|
|
|
|
------
|
2021-08-12 21:22:23 +08:00
|
|
|
|
With Form 1 a combination of ibm,associativity-reference-points, and ibm,associativity
|
|
|
|
|
device tree properties are used to determine the NUMA distance between resource groups/domains.
|
|
|
|
|
|
|
|
|
|
The “ibm,associativity” property contains a list of one or more numbers (domainID)
|
|
|
|
|
representing the resource’s platform grouping domains.
|
|
|
|
|
|
|
|
|
|
The “ibm,associativity-reference-points” property contains a list of one or more numbers
|
|
|
|
|
(domainID index) that represents the 1 based ordinal in the associativity lists.
|
|
|
|
|
The list of domainID indexes represents an increasing hierarchy of resource grouping.
|
|
|
|
|
|
|
|
|
|
ex:
|
|
|
|
|
{ primary domainID index, secondary domainID index, tertiary domainID index.. }
|
|
|
|
|
|
|
|
|
|
Linux kernel uses the domainID at the primary domainID index as the NUMA node id.
|
|
|
|
|
Linux kernel computes NUMA distance between two domains by recursively comparing
|
|
|
|
|
if they belong to the same higher-level domains. For mismatch at every higher
|
|
|
|
|
level of the resource group, the kernel doubles the NUMA distance between the
|
|
|
|
|
comparing domains.
|
|
|
|
|
|
|
|
|
|
Form 2
|
|
|
|
|
-------
|
|
|
|
|
Form 2 associativity format adds separate device tree properties representing NUMA node distance
|
|
|
|
|
thereby making the node distance computation flexible. Form 2 also allows flexible primary
|
|
|
|
|
domain numbering. With numa distance computation now detached from the index value in
|
|
|
|
|
"ibm,associativity-reference-points" property, Form 2 allows a large number of primary domain
|
|
|
|
|
ids at the same domainID index representing resource groups of different performance/latency
|
|
|
|
|
characteristics.
|
|
|
|
|
|
|
|
|
|
Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in the
|
|
|
|
|
"ibm,architecture-vec-5" property.
|
|
|
|
|
|
|
|
|
|
"ibm,numa-lookup-index-table" property contains a list of one or more numbers representing
|
|
|
|
|
the domainIDs present in the system. The offset of the domainID in this property is
|
|
|
|
|
used as an index while computing numa distance information via "ibm,numa-distance-table".
|
|
|
|
|
|
|
|
|
|
prop-encoded-array: The number N of the domainIDs encoded as with encode-int, followed by
|
|
|
|
|
N domainID encoded as with encode-int
|
|
|
|
|
|
|
|
|
|
For ex:
|
|
|
|
|
"ibm,numa-lookup-index-table" = {4, 0, 8, 250, 252}. The offset of domainID 8 (2) is used when
|
|
|
|
|
computing the distance of domain 8 from other domains present in the system. For the rest of
|
|
|
|
|
this document, this offset will be referred to as domain distance offset.
|
|
|
|
|
|
|
|
|
|
"ibm,numa-distance-table" property contains a list of one or more numbers representing the NUMA
|
|
|
|
|
distance between resource groups/domains present in the system.
|
|
|
|
|
|
|
|
|
|
prop-encoded-array: The number N of the distance values encoded as with encode-int, followed by
|
|
|
|
|
N distance values encoded as with encode-bytes. The max distance value we could encode is 255.
|
|
|
|
|
The number N must be equal to the square of m where m is the number of domainIDs in the
|
|
|
|
|
numa-lookup-index-table.
|
|
|
|
|
|
|
|
|
|
For ex:
|
|
|
|
|
ibm,numa-lookup-index-table = <3 0 8 40>;
|
2021-08-25 12:24:47 +08:00
|
|
|
|
ibm,numa-distace-table = <9>, /bits/ 8 < 10 20 80 20 10 160 80 160 10>;
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
| 0 8 40
|
|
|
|
|
--|------------
|
|
|
|
|
|
|
|
|
|
|
0 | 10 20 80
|
|
|
|
|
|
|
|
|
|
|
8 | 20 10 160
|
|
|
|
|
|
|
|
|
|
|
40| 80 160 10
|
2021-08-12 21:22:23 +08:00
|
|
|
|
|
|
|
|
|
A possible "ibm,associativity" property for resources in node 0, 8 and 40
|
|
|
|
|
|
|
|
|
|
{ 3, 6, 7, 0 }
|
|
|
|
|
{ 3, 6, 9, 8 }
|
|
|
|
|
{ 3, 6, 7, 40}
|
|
|
|
|
|
|
|
|
|
With "ibm,associativity-reference-points" { 0x3 }
|
|
|
|
|
|
|
|
|
|
"ibm,lookup-index-table" helps in having a compact representation of distance matrix.
|
|
|
|
|
Since domainID can be sparse, the matrix of distances can also be effectively sparse.
|
|
|
|
|
With "ibm,lookup-index-table" we can achieve a compact representation of
|
|
|
|
|
distance information.
|