While the DOC at the beginning of lib/bitmap.c explicitly states that
"The number of valid bits in a given bitmap does _not_ need to be an
exact multiple of BITS_PER_LONG.", some of the bitmap operations do
indeed access BITS_PER_LONG portions of the provided bitmap no matter
the size of the provided bitmap. For example, if bitmap_intersects()
is provided with an 8 bit bitmap the operation will access
BITS_PER_LONG bits from the provided bitmap. While the operation
ensures that these extra bits do not affect the result, the memory
is still accessed.
The capacity bitmasks (CBMs) are typically stored in u32 since they
can never exceed 32 bits. A few instances exist where a bitmap_*
operation is performed on a CBM by simply pointing the bitmap operation
to the stored u32 value.
The consequence of this pattern is that some bitmap_* operations will
access out-of-bounds memory when interacting with the provided CBM. This
is confirmed with a KASAN test that reports:
BUG: KASAN: stack-out-of-bounds in __bitmap_intersects+0xa2/0x100
and
BUG: KASAN: stack-out-of-bounds in __bitmap_weight+0x58/0x90
Fix this by moving any CBM provided to a bitmap operation needing
BITS_PER_LONG to an 'unsigned long' variable.
[ tglx: Changed related function arguments to unsigned long and got rid
of the _cbm extra step ]
Fixes: 72d5050566 ("x86/intel_rdt: Add utilities to test pseudo-locked region possibility")
Fixes: 49f7b4efa1 ("x86/intel_rdt: Enable setting of exclusive mode")
Fixes: d9b48c86eb ("x86/intel_rdt: Display resource groups' allocations' size in bytes")
Fixes: 95f0b77efa ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/69a428613a53f10e80594679ac726246020ff94f.1538686926.git.reinette.chatre@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The number of CLOSIDs supported by a system is the minimum number of
CLOSIDs supported by any of its resources. Care should be taken when
iterating over the CLOSIDs of a resource since it may be that the number
of CLOSIDs supported on the system is less than the number of CLOSIDs
supported by the resource.
Introduce a helper function that can be used to query the number of
CLOSIDs that is supported by all resources, irrespective of how many
CLOSIDs are supported by a particular resource.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-4-git-send-email-fenghua.yu@intel.com
Each resource is associated with a parsing callback to parse the data
provided from user space when writing schemata file.
The 'data' parameter in the callbacks is defined as a void pointer which
is error prone due to lack of type check.
parse_bw() processes the 'data' parameter as a string while its caller
actually passes the parameter as a pointer to struct rdt_cbm_parse_data.
Thus, parse_bw() takes wrong data and causes failure of parsing MBA
throttle value.
To fix the issue, the 'data' parameter in all parsing callbacks is defined
and handled as a pointer to struct rdt_parse_data (renamed from struct
rdt_cbm_parse_data).
Fixes: 7604df6e16 ("x86/intel_rdt: Support flexible data to parsing callbacks")
Fixes: 9ab9aa15c3 ("x86/intel_rdt: Ensure requested schemata respects mode")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-2-git-send-email-fenghua.yu@intel.com
As the mode of a resource group changes, the operations it can support may
also change. One way in which the supported operations are managed is to
modify the permissions of the files within the resource group's resctrl
directory.
At the moment only two possible permissions are supported: the default
permissions or no permissions in support for when the operation is "locked
down". It is possible where an operation on a resource group may have more
possibilities. For example, if by default changes can be made to the
resource group by writing to a resctrl file while the current settings can
be obtained by reading from the file, then it may be possible that in
another mode it is only possible to read the current settings, and not
change them.
Make it possible to modify some of the permissions of a resctrl file in
support of a more flexible way to manage the operations on a resource
group. In this preparation work the original behavior is maintained where
all permissions are restored.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/8773aadfade7bcb2c48a45fa294a04d2c03bb0a1.1530421961.git.reinette.chatre@intel.com
There is no simple yes/no test to determine if pseudo-locking was
successful. In order to test pseudo-locking we expose a debugfs file for
each pseudo-locked region that will record the latency of reading the
pseudo-locked memory at a stride of 32 bytes (hardcoded). These numbers
will give us an idea of locking was successful or not since they will
reflect cache hits and cache misses (hardware prefetching is disabled
during the test).
The new debugfs file "pseudo_lock_measure" will, when the
pseudo_lock_mem_latency tracepoint is enabled, record the latency of
accessing each cache line twice.
Kernel tracepoints offer us histograms (when CONFIG_HIST_TRIGGERS is
enabled) that is a simple way to visualize the memory access latency
and immediately see any cache misses. For example, the hist trigger
below before trigger of the measurement will display the memory access
latency and instances at each latency:
echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/\
pseudo_lock_mem_latency/trigger
echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
echo 1 > /sys/kernel/debug/resctrl/<newlock>/pseudo_lock_measure
echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/6b2ea76181099d1b79ccfa7d3be24497ab2d1a45.1529706536.git.reinette.chatre@intel.com
The user requests a pseudo-locked region by providing a schemata to a
resource group that is in the pseudo-locksetup mode. This is the
functionality that consumes the parsed user data and creates the
pseudo-locked region.
First, required information is deduced from user provided data.
This includes, how much memory does the requested bitmask represent,
which CPU the requested region is associated with, and what is the
cache line size of that cache (to learn the stride needed for locking).
Second, a contiguous block of memory matching the requested bitmask is
allocated.
Finally, pseudo-locking is performed. The resource group already has the
allocation that reflects the requested bitmask. With this class of service
active and interference minimized, the allocated memory is loaded into the
cache.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/67391160bbf06143bc62d856d3d234eb152008b7.1529706536.git.reinette.chatre@intel.com
A pseudo-locked region does not have a class of service associated with
it and thus not tracked in the array of control values maintained as
part of the domain. Even so, when the user provides a new bitmask for
another resource group it needs to be checked for interference with
existing pseudo-locked regions.
Additionally only one pseudo-locked region can be created in any cache
hierarchy.
Introduce two utilities in support of above scenarios: (1) a utility
that can be used to test if a given capacity bitmask overlaps with any
pseudo-locked regions associated with a particular cache instance, (2) a
utility that can be used to test if a pseudo-locked region exists within
a particular cache hierarchy.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/b8e31dbdcf22ddf71df46072647b47e7558abb32.1529706536.git.reinette.chatre@intel.com
The locksetup mode is the way in which the user communicates that the
resource group will be used for a pseudo-locked region. Locksetup mode
should thus ensure that all restrictions on a resource group are met before
locksetup mode can be entered. The resource group should also be configured
to ensure that it cannot be modified in unsupported ways when a
pseudo-locked region.
Introduce the support where the request for entering locksetup mode can be
validated. This includes: CDP is not active, no cpus or tasks are assigned
to the resource group, monitoring is not in progress on the resource
group. Once the resource group is determined ready for a pseudo-locked
region it is configured to not allow future changes to these properties.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/b120f71ced30116bcc6c6f651e8a7906ae6b903d.1529706536.git.reinette.chatre@intel.com
When a resource group is used for Cache Pseudo-Locking then the region of
cache ends up being orphaned with no class of service referring to it. The
resctrl files intended to manage how the classes of services are utilized
thus become irrelevant.
The fact that a resctrl file is not relevant can be communicated to the
user by setting all of its permissions to zero. That is, its read, write,
and execute permissions are unset for all users.
Introduce two utilities, rdtgroup_kn_mode_restrict() and
rdtgroup_kn_mode_restore(), that can be used to restrict and restore the
permissions of a file or directory belonging to a resource group.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/7afdbf5551b2f93cd45d61fbf5e01d87331f529a.1529706536.git.reinette.chatre@intel.com
mba_sc is a feedback loop where we periodically read MBM counters and
try to restrict the bandwidth below a max value so the below is always
true:
"current bandwidth(cur_bw) < user specified bandwidth(user_bw)"
The frequency of these checks is currently 1s and we just tag along the
MBM overflow timer to do the updates. Doing it once in a second also
makes the calculation of bandwidth easy. The steps of increase or
decrease of bandwidth is the minimum granularity specified by the
hardware.
Although the MBA's goal is to restrict the bandwidth below a maximum,
there may be a need to even increase the bandwidth. Since MBA controls
the L2 external bandwidth where as MBM measures the L3 external
bandwidth, we may end up restricting some rdtgroups unnecessarily. This
may happen in the sequence where rdtgroup (set of jobs) had high
"L3 <-> memory traffic" in initial phases -> mba_sc kicks in and reduced
bandwidth percentage values -> but after some it has mostly "L2 <-> L3"
traffic. In this scenario mba_sc increases the bandwidth percentage when
there is lesser memory traffic.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-7-git-send-email-vikas.shivappa@linux.intel.com
When MBA software controller is enabled, a per domain storage is required
for user specified bandwidth in "MBps" and the "percentage" values which
are programmed into the IA32_MBA_THRTL_MSR. Add support for these data
structures and initialization.
The MBA percentage values have a default max value of 100 but however the
max value in MBps is not available from the hardware so it's set to
U32_MAX.
This simply says that the control group can use all bandwidth by default
but does not say what is the actual max bandwidth available. The actual
bandwidth that is available may depend on lot of factors like QPI link,
number of memory channels, memory channel frequency, its width and memory
speed, how many channels are configured and also if memory interleaving is
enabled. So there is no way to determine the maximum at runtime reliably.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-4-git-send-email-vikas.shivappa@linux.intel.com
Bit 0 in MSR IA32_L2_QOS_CFG (0xc82) is L2 CDP enable bit. By default,
the bit is zero, i.e. L2 CAT is enabled, and L2 CDP is disabled. When
the resctrl mount parameter "cdpl2" is given, the bit is set to 1 and L2
CDP is enabled.
In L2 CDP mode, the L2 CAT mask MSRs are re-mapped into interleaved pairs
of mask MSRs for code (referenced by an odd CLOSID) and data (referenced by
an even CLOSID).
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-6-git-send-email-fenghua.yu@intel.com
L2 data and L2 code are added as new resources in rdt_resources_all[]
and data in the resources are configured.
When L2 CDP is enabled, the schemata will have the two resources in
this format:
L2DATA:l2id0=xxxx;l2id1=xxxx;....
L2CODE:l2id0=xxxx;l2id1=xxxx;....
xxxx represent CBM (Cache Bit Mask) values in the schemata, similar to all
others (L2 CAT/L3 CAT/L3 CDP).
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-5-git-send-email-fenghua.yu@intel.com
Pull x86 cache resource updates from Thomas Gleixner:
"This update provides updates to RDT:
- A diagnostic framework for the Resource Director Technology (RDT)
user interface (sysfs). The failure modes of the user interface are
hard to diagnose from the error codes. An extra last command status
file provides now sensible textual information about the failure so
its simpler to use.
- A few minor cleanups and updates in the RDT code"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Fix a silent failure when writing zero value schemata
x86/intel_rdt: Fix potential deadlock during resctrl mount
x86/intel_rdt: Fix potential deadlock during resctrl unmount
x86/intel_rdt: Initialize bitmask of shareable resource if CDP enabled
x86/intel_rdt: Remove redundant assignment
x86/intel_rdt/cqm: Make integer rmid_limbo_count static
x86/intel_rdt: Add documentation for "info/last_cmd_status"
x86/intel_rdt: Add diagnostics when making directories
x86/intel_rdt: Add diagnostics when writing the cpus file
x86/intel_rdt: Add diagnostics when writing the tasks file
x86/intel_rdt: Add diagnostics when writing the schemata file
x86/intel_rdt: Add framework for better RDT UI diagnostics
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commands are given to the resctrl file system by making/removing
directories, or by writing to files. When something goes wrong
the user is generally left wondering why they got:
bash: echo: write error: Invalid argument
Add a new file "last_cmd_status" to the "info" directory that
will give the user some better clues on what went wrong.
Provide functions to clear and update last_cmd_status which
check that we hold the rdtgroup_mutex.
[ tglx: Made last_cmd_status static and folded back the hunk from patch 3
which replaces the open coded access to last_cmd_status with the
accessor function ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/edc4e0e9741eee89bba569f0021b1b2662fd9508.1506382469.git.tony.luck@intel.com
During a mkdir, the entire limbo list is synchronously checked on each
package for free RMIDs by sending IPIs. With a large number of RMIDs (SKL
has 192) this creates a intolerable amount of work in IPIs.
Replace the IPI based checking of the limbo list with asynchronous worker
threads on each package which periodically scan the limbo list and move the
RMIDs that have:
llc_occupancy < threshold_occupancy
on all packages to the free list.
mkdir now returns -ENOSPC if the free list and the limbo list ere empty or
returns -EBUSY if there are RMIDs on the limbo list and the free list is
empty.
Getting rid of the IPIs also simplifies the data structures and the
serialization required for handling the lists.
[ tglx: Rewrote changelog ... ]
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Link: http://lkml.kernel.org/r/1502845243-20454-3-git-send-email-vikas.shivappa@linux.intel.com
CPUID.(EAX=0x10, ECX=res#):EBX[31:0] reports a bit mask for a resource.
Each set bit within the length of the CBM indicates the corresponding
unit of the resource allocation may be used by other entities in the
platform (e.g. an integrated graphics engine or hardware units outside
the processor core and have direct access to the resource). Each
cleared bit within the length of the CBM indicates the corresponding
allocation unit can be configured to implement a priority-based
allocation scheme without interference with other hardware agents in
the system. Bits outside the length of the CBM are reserved.
More details on the bit mask are described in x86 Software Developer's
Manual.
The bitmask is shown in "info" directory for each resource. It's
up to user to decide how to use the bitmask within a CBM in a partition
to share or isolate a resource with other executing units.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: vikas.shivappa@linux.intel.com
Link: http://lkml.kernel.org/r/20170725223904.12996-1-tony.luck@intel.com
Add a mon_data directory for the root rdtgroup and all other rdtgroups.
The directory holds all of the monitored data for all domains and events
of all resources being monitored.
The mon_data itself has a list of directories in the format
mon_<domain_name>_<domain_id>. Each of these subdirectories contain one
file per event in the mode "0444". Reading the file displays a snapshot
of the monitored data for the event the file represents.
For ex, on a 2 socket Broadwell with llc_occupancy being
monitored the mon_data contents look as below:
$ ls /sys/fs/resctrl/p1/mon_data/
mon_L3_00
mon_L3_01
Each domain directory has one file per event:
$ ls /sys/fs/resctrl/p1/mon_data/mon_L3_00/
llc_occupancy
To read current llc_occupancy of ctrl_mon group p1
$ cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
33789096
[This patch idea is based on Tony's sample patches to organise data in a
per domain directory and have one file per event (and use the fp->priv to
store mon data bits)]
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-20-git-send-email-vikas.shivappa@linux.intel.com
Resource control groups can be created using mkdir in resctrl
fs(rdtgroup). In order to extend the resctrl interface to support
monitoring the control groups, extend the current mkdir to support
resource monitoring also.
This allows the rdtgroup created under the root directory to be able to
both control and monitor resources (ctrl_mon group). The ctrl_mon groups
are associated with one CLOSID like the legacy rdtgroups and one
RMID(Resource monitoring ID) as well. Hardware uses RMID to track the
resource usage. Once either of the CLOSID or RMID are exhausted, the
mkdir fails with -ENOSPC. If there are RMIDs in limbo list but not free
an -EBUSY is returned. User can also monitor a subset of the ctrl_mon
rdtgroup's tasks/cpus using the monitor groups. The monitor groups are
created using mkdir under the "mon_groups" directory in every ctrl_mon
group.
[Merged Tony's code: Removed a lot of common mkdir code, a fix to handling
of the list of the child rdtgroups and some cleanups in list
traversal. Also the changes to have similar alloc and free for CLOS/RMID
and return -EBUSY when RMIDs are in limbo and not free]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-14-git-send-email-vikas.shivappa@linux.intel.com
The info directory files and base files need to be different for each
resource like cache and Memory bandwidth. With in each resource, the
files would be further different for monitoring and ctrl. This leads to
a lot of different static array declarations given that we are adding
resctrl monitoring.
Simplify this to one common list of files and then declare a set of
flags to choose the files based on the resource, whether it is info or
base and if it is control type file. This is as a preparation to include
monitoring based info and base files.
No functional change.
[Vikas: Extended the flags to have few bits per category like resource,
info/base etc]
Signed-off-by: Tony luck <tony.luck@intel.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-11-git-send-email-vikas.shivappa@linux.intel.com
Hardware uses RMID(Resource monitoring ID) to keep track of each of the
RDT events associated with tasks. The number of RMIDs is dependent on
the SKU and is enumerated via CPUID. We add support to manage the RMIDs
which include managing the RMID allocation and reading LLC occupancy
for an RMID.
RMID allocation is managed by keeping a free list which is initialized
to all available RMIDs except for RMID 0 which is always reserved for
root group. RMIDs goto a limbo list once they are
freed since the RMIDs are still tagged to cache lines of the tasks which
were using them - thereby still having some occupancy. They continue to
be in limbo list until the occupancy < threshold_occupancy. The
threshold_occupancy is a user configurable value.
OS uses IA32_QM_CTR MSR to read the occupancy associated with an RMID
after programming the IA32_EVENTSEL MSR with the RMID.
[Tony: Improved limbo search]
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-10-git-send-email-vikas.shivappa@linux.intel.com