License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2007-05-07 05:49:36 +08:00
|
|
|
#ifndef _LINUX_SLUB_DEF_H
|
|
|
|
#define _LINUX_SLUB_DEF_H
|
|
|
|
|
|
|
|
/*
|
|
|
|
* SLUB : A Slab allocator without object queues.
|
|
|
|
*
|
2008-07-05 00:59:22 +08:00
|
|
|
* (C) 2007 SGI, Christoph Lameter
|
2007-05-07 05:49:36 +08:00
|
|
|
*/
|
2021-02-26 09:19:16 +08:00
|
|
|
#include <linux/kfence.h>
|
2007-05-07 05:49:36 +08:00
|
|
|
#include <linux/kobject.h>
|
2020-08-07 14:20:42 +08:00
|
|
|
#include <linux/reciprocal_div.h>
|
2021-05-22 07:59:38 +08:00
|
|
|
#include <linux/local_lock.h>
|
2007-05-07 05:49:36 +08:00
|
|
|
|
2008-02-08 09:47:41 +08:00
|
|
|
enum stat_item {
|
|
|
|
ALLOC_FASTPATH, /* Allocation from cpu slab */
|
|
|
|
ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */
|
2013-11-08 20:47:36 +08:00
|
|
|
FREE_FASTPATH, /* Free to cpu slab */
|
2008-02-08 09:47:41 +08:00
|
|
|
FREE_SLOWPATH, /* Freeing not to cpu slab */
|
|
|
|
FREE_FROZEN, /* Freeing to frozen slab */
|
|
|
|
FREE_ADD_PARTIAL, /* Freeing moves slab to partial list */
|
|
|
|
FREE_REMOVE_PARTIAL, /* Freeing removes last object */
|
2012-02-03 23:34:56 +08:00
|
|
|
ALLOC_FROM_PARTIAL, /* Cpu slab acquired from node partial list */
|
2008-02-08 09:47:41 +08:00
|
|
|
ALLOC_SLAB, /* Cpu slab acquired from page allocator */
|
|
|
|
ALLOC_REFILL, /* Refill cpu slab from slab freelist */
|
2011-06-02 01:25:57 +08:00
|
|
|
ALLOC_NODE_MISMATCH, /* Switching cpu slab */
|
2008-02-08 09:47:41 +08:00
|
|
|
FREE_SLAB, /* Slab freed to the page allocator */
|
|
|
|
CPUSLAB_FLUSH, /* Abandoning of the cpu slab */
|
|
|
|
DEACTIVATE_FULL, /* Cpu slab was full when deactivated */
|
|
|
|
DEACTIVATE_EMPTY, /* Cpu slab was empty when deactivated */
|
|
|
|
DEACTIVATE_TO_HEAD, /* Cpu slab was moved to the head of partials */
|
|
|
|
DEACTIVATE_TO_TAIL, /* Cpu slab was moved to the tail of partials */
|
|
|
|
DEACTIVATE_REMOTE_FREES,/* Slab contained remotely freed objects */
|
2011-06-02 01:25:58 +08:00
|
|
|
DEACTIVATE_BYPASS, /* Implicit deactivation */
|
2008-04-15 00:11:40 +08:00
|
|
|
ORDER_FALLBACK, /* Number of times fallback was necessary */
|
2011-03-23 02:35:00 +08:00
|
|
|
CMPXCHG_DOUBLE_CPU_FAIL,/* Failure of this_cpu_cmpxchg_double */
|
2011-06-02 01:25:49 +08:00
|
|
|
CMPXCHG_DOUBLE_FAIL, /* Number of times that cmpxchg double did not match */
|
2011-08-10 05:12:27 +08:00
|
|
|
CPU_PARTIAL_ALLOC, /* Used cpu partial on alloc */
|
2012-02-03 23:34:56 +08:00
|
|
|
CPU_PARTIAL_FREE, /* Refill cpu partial on free */
|
|
|
|
CPU_PARTIAL_NODE, /* Refill cpu partial from node partial */
|
|
|
|
CPU_PARTIAL_DRAIN, /* Drain cpu partial to node partial */
|
2008-02-08 09:47:41 +08:00
|
|
|
NR_SLUB_STAT_ITEMS };
|
|
|
|
|
2021-05-22 07:59:38 +08:00
|
|
|
/*
|
|
|
|
* When changing the layout, make sure freelist and tid are still compatible
|
|
|
|
* with this_cpu_cmpxchg_double() alignment requirements.
|
|
|
|
*/
|
2007-10-16 16:26:05 +08:00
|
|
|
struct kmem_cache_cpu {
|
2011-02-26 01:38:54 +08:00
|
|
|
void **freelist; /* Pointer to next available object */
|
|
|
|
unsigned long tid; /* Globally unique transaction id */
|
2008-01-08 15:20:31 +08:00
|
|
|
struct page *page; /* The slab from which we are allocating */
|
2017-07-07 06:36:31 +08:00
|
|
|
#ifdef CONFIG_SLUB_CPU_PARTIAL
|
2011-08-10 05:12:27 +08:00
|
|
|
struct page *partial; /* Partially allocated frozen slabs */
|
2017-07-07 06:36:31 +08:00
|
|
|
#endif
|
2021-05-22 07:59:38 +08:00
|
|
|
local_lock_t lock; /* Protects the fields above */
|
2008-02-08 09:47:41 +08:00
|
|
|
#ifdef CONFIG_SLUB_STATS
|
|
|
|
unsigned stat[NR_SLUB_STAT_ITEMS];
|
|
|
|
#endif
|
2007-10-16 16:26:08 +08:00
|
|
|
};
|
2007-10-16 16:26:05 +08:00
|
|
|
|
2017-07-07 06:36:31 +08:00
|
|
|
#ifdef CONFIG_SLUB_CPU_PARTIAL
|
|
|
|
#define slub_percpu_partial(c) ((c)->partial)
|
|
|
|
|
|
|
|
#define slub_set_percpu_partial(c, p) \
|
|
|
|
({ \
|
|
|
|
slub_percpu_partial(c) = (p)->next; \
|
|
|
|
})
|
|
|
|
|
|
|
|
#define slub_percpu_partial_read_once(c) READ_ONCE(slub_percpu_partial(c))
|
|
|
|
#else
|
|
|
|
#define slub_percpu_partial(c) NULL
|
|
|
|
|
|
|
|
#define slub_set_percpu_partial(c, p)
|
|
|
|
|
|
|
|
#define slub_percpu_partial_read_once(c) NULL
|
|
|
|
#endif // CONFIG_SLUB_CPU_PARTIAL
|
|
|
|
|
2008-04-15 00:11:31 +08:00
|
|
|
/*
|
|
|
|
* Word size structure that can be atomically updated or read and that
|
|
|
|
* contains both the order and the number of objects that a slab of the
|
|
|
|
* given order would contain.
|
|
|
|
*/
|
|
|
|
struct kmem_cache_order_objects {
|
2018-04-06 07:21:39 +08:00
|
|
|
unsigned int x;
|
2008-04-15 00:11:31 +08:00
|
|
|
};
|
|
|
|
|
2007-05-07 05:49:36 +08:00
|
|
|
/*
|
|
|
|
* Slab cache management.
|
|
|
|
*/
|
|
|
|
struct kmem_cache {
|
2010-08-07 20:29:22 +08:00
|
|
|
struct kmem_cache_cpu __percpu *cpu_slab;
|
2019-03-06 07:42:07 +08:00
|
|
|
/* Used for retrieving partial slabs, etc. */
|
2017-11-16 09:32:18 +08:00
|
|
|
slab_flags_t flags;
|
2011-02-26 01:38:51 +08:00
|
|
|
unsigned long min_partial;
|
2019-03-06 07:42:07 +08:00
|
|
|
unsigned int size; /* The size of an object including metadata */
|
|
|
|
unsigned int object_size;/* The size of an object without metadata */
|
2020-08-07 14:20:42 +08:00
|
|
|
struct reciprocal_value reciprocal_size;
|
2019-03-06 07:42:07 +08:00
|
|
|
unsigned int offset; /* Free pointer offset */
|
2017-07-07 06:36:34 +08:00
|
|
|
#ifdef CONFIG_SLUB_CPU_PARTIAL
|
2018-04-06 07:21:10 +08:00
|
|
|
/* Number of per cpu partial objects to keep around */
|
|
|
|
unsigned int cpu_partial;
|
mm, slub: change percpu partial accounting from objects to pages
With CONFIG_SLUB_CPU_PARTIAL enabled, SLUB keeps a percpu list of
partial slabs that can be promoted to cpu slab when the previous one is
depleted, without accessing the shared partial list. A slab can be
added to this list by 1) refill of an empty list from get_partial_node()
- once we really have to access the shared partial list, we acquire
multiple slabs to amortize the cost of locking, and 2) first free to a
previously full slab - instead of putting the slab on a shared partial
list, we can more cheaply freeze it and put it on the per-cpu list.
To control how large a percpu partial list can grow for a kmem cache,
set_cpu_partial() calculates a target number of free objects on each
cpu's percpu partial list, and this can be also set by the sysfs file
cpu_partial.
However, the tracking of actual number of objects is imprecise, in order
to limit overhead from cpu X freeing an objects to a slab on percpu
partial list of cpu Y. Basically, the percpu partial slabs form a
single linked list, and when we add a new slab to the list with current
head "oldpage", we set in the struct page of the slab we're adding:
page->pages = oldpage->pages + 1; // this is precise
page->pobjects = oldpage->pobjects + (page->objects - page->inuse);
page->next = oldpage;
Thus the real number of free objects in the slab (objects - inuse) is
only determined at the moment of adding the slab to the percpu partial
list, and further freeing doesn't update the pobjects counter nor
propagate it to the current list head. As Jann reports [1], this can
easily lead to large inaccuracies, where the target number of objects
(up to 30 by default) can translate to the same number of (empty) slab
pages on the list. In case 2) above, we put a slab with 1 free object
on the list, thus only increase page->pobjects by 1, even if there are
subsequent frees on the same slab. Jann has noticed this in practice
and so did we [2] when investigating significant increase of kmemcg
usage after switching from SLAB to SLUB.
While this is no longer a problem in kmemcg context thanks to the
accounting rewrite in 5.9, the memory waste is still not ideal and it's
questionable whether it makes sense to perform free object count based
control when object counts can easily become so much inaccurate. So
this patch converts the accounting to be based on number of pages only
(which is precise) and removes the page->pobjects field completely.
This is also ultimately simpler.
To retain the existing set_cpu_partial() heuristic, first calculate the
target number of objects as previously, but then convert it to target
number of pages by assuming the pages will be half-filled on average.
This assumption might obviously also be inaccurate in practice, but
cannot degrade to actual number of pages being equal to the target
number of objects.
We could also skip the intermediate step with target number of objects
and rewrite the heuristic in terms of pages. However we still have the
sysfs file cpu_partial which uses number of objects and could break
existing users if it suddenly becomes number of pages, so this patch
doesn't do that.
In practice, after this patch the heuristics limit the size of percpu
partial list up to 2 pages. In case of a reported regression (which
would mean some workload has benefited from the previous imprecise
object based counting), we can tune the heuristics to get a better
compromise within the new scheme, while still avoid the unexpectedly
long percpu partial lists.
[1] https://lore.kernel.org/linux-mm/CAG48ez2Qx5K1Cab-m8BdSibp6wLTip6ro4=-umR7BLsEgjEYzA@mail.gmail.com/
[2] https://lore.kernel.org/all/2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse.cz/
==========
Evaluation
==========
Mel was kind enough to run v1 through mmtests machinery for netperf
(localhost) and hackbench and, for most significant results see below.
So there are some apparent regressions, especially with hackbench, which
I think ultimately boils down to having shorter percpu partial lists on
average and some benchmarks benefiting from longer ones. Monitoring
slab usage also indicated less memory usage by slab. Based on that, the
following patch will bump the defaults to allow longer percpu partial
lists than after this patch.
However the goal is certainly not such that we would limit the percpu
partial lists to 30 pages just because previously a specific alloc/free
pattern could lead to the limit of 30 objects translate to a limit to 30
pages - that would make little sense. This is a correctness patch, and
if a workload benefits from larger lists, the sysfs tuning knobs are
still there to allow that.
Netperf
2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads per socket), 384GB RAM
TCP-RR:
hmean before 127045.79 after 121092.94 (-4.69%, worse)
stddev before 2634.37 after 1254.08
UDP-RR:
hmean before 166985.45 after 160668.94 ( -3.78%, worse)
stddev before 4059.69 after 1943.63
2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads per socket), 512GB RAM
TCP-RR:
hmean before 84173.25 after 76914.72 ( -8.62%, worse)
UDP-RR:
hmean before 93571.12 after 96428.69 ( 3.05%, better)
stddev before 23118.54 after 16828.14
2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads per socket), 64GB RAM
TCP-RR:
hmean before 49984.92 after 48922.27 ( -2.13%, worse)
stddev before 6248.15 after 4740.51
UDP-RR:
hmean before 61854.31 after 68761.81 ( 11.17%, better)
stddev before 4093.54 after 5898.91
other machines - within 2%
Hackbench
(results before and after the patch, negative % means worse)
2-socket AMD EPYC 7713 (64 cores, 128 threads per core), 256GB RAM
hackbench-process-sockets
Amean 1 0.5380 0.5583 ( -3.78%)
Amean 4 0.7510 0.8150 ( -8.52%)
Amean 7 0.7930 0.9533 ( -20.22%)
Amean 12 0.7853 1.1313 ( -44.06%)
Amean 21 1.1520 1.4993 ( -30.15%)
Amean 30 1.6223 1.9237 ( -18.57%)
Amean 48 2.6767 2.9903 ( -11.72%)
Amean 79 4.0257 5.1150 ( -27.06%)
Amean 110 5.5193 7.4720 ( -35.38%)
Amean 141 7.2207 9.9840 ( -38.27%)
Amean 172 8.4770 12.1963 ( -43.88%)
Amean 203 9.6473 14.3137 ( -48.37%)
Amean 234 11.3960 18.7917 ( -64.90%)
Amean 265 13.9627 22.4607 ( -60.86%)
Amean 296 14.9163 26.0483 ( -74.63%)
hackbench-thread-sockets
Amean 1 0.5597 0.5877 ( -5.00%)
Amean 4 0.7913 0.8960 ( -13.23%)
Amean 7 0.8190 1.0017 ( -22.30%)
Amean 12 0.9560 1.1727 ( -22.66%)
Amean 21 1.7587 1.5660 ( 10.96%)
Amean 30 2.4477 1.9807 ( 19.08%)
Amean 48 3.4573 3.0630 ( 11.41%)
Amean 79 4.7903 5.1733 ( -8.00%)
Amean 110 6.1370 7.4220 ( -20.94%)
Amean 141 7.5777 9.2617 ( -22.22%)
Amean 172 9.2280 11.0907 ( -20.18%)
Amean 203 10.2793 13.3470 ( -29.84%)
Amean 234 11.2410 17.1070 ( -52.18%)
Amean 265 12.5970 23.3323 ( -85.22%)
Amean 296 17.1540 24.2857 ( -41.57%)
2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads
per socket), 384GB RAM
hackbench-process-sockets
Amean 1 0.5760 0.4793 ( 16.78%)
Amean 4 0.9430 0.9707 ( -2.93%)
Amean 7 1.5517 1.8843 ( -21.44%)
Amean 12 2.4903 2.7267 ( -9.49%)
Amean 21 3.9560 4.2877 ( -8.38%)
Amean 30 5.4613 5.8343 ( -6.83%)
Amean 48 8.5337 9.2937 ( -8.91%)
Amean 79 14.0670 15.2630 ( -8.50%)
Amean 110 19.2253 21.2467 ( -10.51%)
Amean 141 23.7557 25.8550 ( -8.84%)
Amean 172 28.4407 29.7603 ( -4.64%)
Amean 203 33.3407 33.9927 ( -1.96%)
Amean 234 38.3633 39.1150 ( -1.96%)
Amean 265 43.4420 43.8470 ( -0.93%)
Amean 296 48.3680 48.9300 ( -1.16%)
hackbench-thread-sockets
Amean 1 0.6080 0.6493 ( -6.80%)
Amean 4 1.0000 1.0513 ( -5.13%)
Amean 7 1.6607 2.0260 ( -22.00%)
Amean 12 2.7637 2.9273 ( -5.92%)
Amean 21 5.0613 4.5153 ( 10.79%)
Amean 30 6.3340 6.1140 ( 3.47%)
Amean 48 9.0567 9.5577 ( -5.53%)
Amean 79 14.5657 15.7983 ( -8.46%)
Amean 110 19.6213 21.6333 ( -10.25%)
Amean 141 24.1563 26.2697 ( -8.75%)
Amean 172 28.9687 30.2187 ( -4.32%)
Amean 203 33.9763 34.6970 ( -2.12%)
Amean 234 38.8647 39.3207 ( -1.17%)
Amean 265 44.0813 44.1507 ( -0.16%)
Amean 296 49.2040 49.4330 ( -0.47%)
2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads
per socket), 512GB RAM
hackbench-process-sockets
Amean 1 0.5027 0.5017 ( 0.20%)
Amean 4 1.1053 1.2033 ( -8.87%)
Amean 7 1.8760 2.1820 ( -16.31%)
Amean 12 2.9053 3.1810 ( -9.49%)
Amean 21 4.6777 4.9920 ( -6.72%)
Amean 30 6.5180 6.7827 ( -4.06%)
Amean 48 10.0710 10.5227 ( -4.48%)
Amean 79 16.4250 17.5053 ( -6.58%)
Amean 110 22.6203 24.4617 ( -8.14%)
Amean 141 28.0967 31.0363 ( -10.46%)
Amean 172 34.4030 36.9233 ( -7.33%)
Amean 203 40.5933 43.0850 ( -6.14%)
Amean 234 46.6477 48.7220 ( -4.45%)
Amean 265 53.0530 53.9597 ( -1.71%)
Amean 296 59.2760 59.9213 ( -1.09%)
hackbench-thread-sockets
Amean 1 0.5363 0.5330 ( 0.62%)
Amean 4 1.1647 1.2157 ( -4.38%)
Amean 7 1.9237 2.2833 ( -18.70%)
Amean 12 2.9943 3.3110 ( -10.58%)
Amean 21 4.9987 5.1880 ( -3.79%)
Amean 30 6.7583 7.0043 ( -3.64%)
Amean 48 10.4547 10.8353 ( -3.64%)
Amean 79 16.6707 17.6790 ( -6.05%)
Amean 110 22.8207 24.4403 ( -7.10%)
Amean 141 28.7090 31.0533 ( -8.17%)
Amean 172 34.9387 36.8260 ( -5.40%)
Amean 203 41.1567 43.0450 ( -4.59%)
Amean 234 47.3790 48.5307 ( -2.43%)
Amean 265 53.9543 54.6987 ( -1.38%)
Amean 296 60.0820 60.2163 ( -0.22%)
1-socket Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz (4 cores, 8 threads),
32 GB RAM
hackbench-process-sockets
Amean 1 1.4760 1.5773 ( -6.87%)
Amean 3 3.9370 4.0910 ( -3.91%)
Amean 5 6.6797 6.9357 ( -3.83%)
Amean 7 9.3367 9.7150 ( -4.05%)
Amean 12 15.7627 16.1400 ( -2.39%)
Amean 18 23.5360 23.6890 ( -0.65%)
Amean 24 31.0663 31.3137 ( -0.80%)
Amean 30 38.7283 39.0037 ( -0.71%)
Amean 32 41.3417 41.6097 ( -0.65%)
hackbench-thread-sockets
Amean 1 1.5250 1.6043 ( -5.20%)
Amean 3 4.0897 4.2603 ( -4.17%)
Amean 5 6.7760 7.0933 ( -4.68%)
Amean 7 9.4817 9.9157 ( -4.58%)
Amean 12 15.9610 16.3937 ( -2.71%)
Amean 18 23.9543 24.3417 ( -1.62%)
Amean 24 31.4400 31.7217 ( -0.90%)
Amean 30 39.2457 39.5467 ( -0.77%)
Amean 32 41.8267 42.1230 ( -0.71%)
2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads
per socket), 64GB RAM
hackbench-process-sockets
Amean 1 1.0347 1.0880 ( -5.15%)
Amean 4 1.7267 1.8527 ( -7.30%)
Amean 7 2.6707 2.8110 ( -5.25%)
Amean 12 4.1617 4.3383 ( -4.25%)
Amean 21 7.0070 7.2600 ( -3.61%)
Amean 30 9.9187 10.2397 ( -3.24%)
Amean 48 15.6710 16.3923 ( -4.60%)
Amean 79 24.7743 26.1247 ( -5.45%)
Amean 110 34.3000 35.9307 ( -4.75%)
Amean 141 44.2043 44.8010 ( -1.35%)
Amean 172 54.2430 54.7260 ( -0.89%)
Amean 192 60.6557 60.9777 ( -0.53%)
hackbench-thread-sockets
Amean 1 1.0610 1.1353 ( -7.01%)
Amean 4 1.7543 1.9140 ( -9.10%)
Amean 7 2.7840 2.9573 ( -6.23%)
Amean 12 4.3813 4.4937 ( -2.56%)
Amean 21 7.3460 7.5350 ( -2.57%)
Amean 30 10.2313 10.5190 ( -2.81%)
Amean 48 15.9700 16.5940 ( -3.91%)
Amean 79 25.3973 26.6637 ( -4.99%)
Amean 110 35.1087 36.4797 ( -3.91%)
Amean 141 45.8220 46.3053 ( -1.05%)
Amean 172 55.4917 55.7320 ( -0.43%)
Amean 192 62.7490 62.5410 ( 0.33%)
Link: https://lkml.kernel.org/r/20211012134651.11258-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Jann Horn <jannh@google.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:35:17 +08:00
|
|
|
/* Number of per cpu partial pages to keep around */
|
|
|
|
unsigned int cpu_partial_pages;
|
2017-07-07 06:36:34 +08:00
|
|
|
#endif
|
2008-04-15 00:11:31 +08:00
|
|
|
struct kmem_cache_order_objects oo;
|
2007-05-07 05:49:36 +08:00
|
|
|
|
|
|
|
/* Allocation and freeing of slabs */
|
2008-04-15 00:11:40 +08:00
|
|
|
struct kmem_cache_order_objects max;
|
2008-04-15 00:11:40 +08:00
|
|
|
struct kmem_cache_order_objects min;
|
2008-02-15 06:21:32 +08:00
|
|
|
gfp_t allocflags; /* gfp flags to use on each alloc */
|
2007-05-07 05:49:36 +08:00
|
|
|
int refcount; /* Refcount for slab cache destroy */
|
2008-07-26 10:45:34 +08:00
|
|
|
void (*ctor)(void *);
|
2018-04-06 07:21:06 +08:00
|
|
|
unsigned int inuse; /* Offset to metadata */
|
2018-04-06 07:21:02 +08:00
|
|
|
unsigned int align; /* Alignment */
|
2018-04-06 07:20:55 +08:00
|
|
|
unsigned int red_left_pad; /* Left redzone padding size */
|
2007-05-07 05:49:36 +08:00
|
|
|
const char *name; /* Name (only for display!) */
|
|
|
|
struct list_head list; /* List of slab caches */
|
2010-10-06 02:57:26 +08:00
|
|
|
#ifdef CONFIG_SYSFS
|
2007-05-07 05:49:36 +08:00
|
|
|
struct kobject kobj; /* For sysfs */
|
2007-07-17 19:03:24 +08:00
|
|
|
#endif
|
2017-09-07 07:19:18 +08:00
|
|
|
#ifdef CONFIG_SLAB_FREELIST_HARDENED
|
|
|
|
unsigned long random;
|
|
|
|
#endif
|
|
|
|
|
2007-05-07 05:49:36 +08:00
|
|
|
#ifdef CONFIG_NUMA
|
2008-01-08 15:20:26 +08:00
|
|
|
/*
|
|
|
|
* Defragmentation by allocating from a remote node.
|
|
|
|
*/
|
2018-04-06 07:20:48 +08:00
|
|
|
unsigned int remote_node_defrag_ratio;
|
2007-05-07 05:49:36 +08:00
|
|
|
#endif
|
2016-07-27 06:21:59 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_SLAB_FREELIST_RANDOM
|
|
|
|
unsigned int *random_seq;
|
|
|
|
#endif
|
|
|
|
|
2016-07-29 06:49:07 +08:00
|
|
|
#ifdef CONFIG_KASAN
|
|
|
|
struct kasan_cache kasan_info;
|
|
|
|
#endif
|
|
|
|
|
2018-04-06 07:21:31 +08:00
|
|
|
unsigned int useroffset; /* Usercopy region offset */
|
|
|
|
unsigned int usersize; /* Usercopy region size */
|
usercopy: Prepare for usercopy whitelisting
This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.
This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.
Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)
To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.
In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.
After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:
Total Slab Memory: 48074720
Usercopyable Memory: 6367532 13.2%
task_struct 0.2% 4480/1630720
RAW 0.3% 300/96000
RAWv6 2.1% 1408/64768
ext4_inode_cache 3.0% 269760/8740224
dentry 11.1% 585984/5273856
mm_struct 29.1% 54912/188448
kmalloc-8 100.0% 24576/24576
kmalloc-16 100.0% 28672/28672
kmalloc-32 100.0% 81920/81920
kmalloc-192 100.0% 96768/96768
kmalloc-128 100.0% 143360/143360
names_cache 100.0% 163840/163840
kmalloc-64 100.0% 167936/167936
kmalloc-256 100.0% 339968/339968
kmalloc-512 100.0% 350720/350720
kmalloc-96 100.0% 455616/455616
kmalloc-8192 100.0% 655360/655360
kmalloc-1024 100.0% 812032/812032
kmalloc-4096 100.0% 819200/819200
kmalloc-2048 100.0% 1310720/1310720
After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:
Total Slab Memory: 95516184
Usercopyable Memory: 8497452 8.8%
task_struct 0.2% 4000/1456000
RAW 0.3% 300/96000
RAWv6 2.1% 1408/64768
ext4_inode_cache 3.0% 1217280/39439872
dentry 11.1% 1623200/14608800
mm_struct 29.1% 73216/251264
kmalloc-8 100.0% 24576/24576
kmalloc-16 100.0% 28672/28672
kmalloc-32 100.0% 94208/94208
kmalloc-192 100.0% 96768/96768
kmalloc-128 100.0% 143360/143360
names_cache 100.0% 163840/163840
kmalloc-64 100.0% 245760/245760
kmalloc-256 100.0% 339968/339968
kmalloc-512 100.0% 350720/350720
kmalloc-96 100.0% 563520/563520
kmalloc-8192 100.0% 655360/655360
kmalloc-1024 100.0% 794624/794624
kmalloc-4096 100.0% 819200/819200
kmalloc-2048 100.0% 1257472/1257472
Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split out a few extra kmalloc hunks]
[kees: add field names to function declarations]
[kees: convert BUGs to WARNs and fail closed]
[kees: add attack surface reduction analysis to commit log]
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christoph Lameter <cl@linux.com>
2017-06-11 10:50:28 +08:00
|
|
|
|
2010-09-28 21:10:26 +08:00
|
|
|
struct kmem_cache_node *node[MAX_NUMNODES];
|
2007-05-07 05:49:36 +08:00
|
|
|
};
|
|
|
|
|
2014-05-07 03:50:08 +08:00
|
|
|
#ifdef CONFIG_SYSFS
|
|
|
|
#define SLAB_SUPPORTS_SYSFS
|
2018-06-28 14:26:09 +08:00
|
|
|
void sysfs_slab_unlink(struct kmem_cache *);
|
2017-02-23 07:41:11 +08:00
|
|
|
void sysfs_slab_release(struct kmem_cache *);
|
2014-05-07 03:50:08 +08:00
|
|
|
#else
|
2018-06-28 14:26:09 +08:00
|
|
|
static inline void sysfs_slab_unlink(struct kmem_cache *s)
|
|
|
|
{
|
|
|
|
}
|
2017-02-23 07:41:11 +08:00
|
|
|
static inline void sysfs_slab_release(struct kmem_cache *s)
|
2014-05-07 03:50:08 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2015-02-14 06:39:35 +08:00
|
|
|
void object_err(struct kmem_cache *s, struct page *page,
|
|
|
|
u8 *object, char *reason);
|
|
|
|
|
2016-07-29 06:49:04 +08:00
|
|
|
void *fixup_red_left(struct kmem_cache *s, void *p);
|
|
|
|
|
2016-03-26 05:21:59 +08:00
|
|
|
static inline void *nearest_obj(struct kmem_cache *cache, struct page *page,
|
|
|
|
void *x) {
|
|
|
|
void *object = x - (x - page_address(page)) % cache->size;
|
|
|
|
void *last_object = page_address(page) +
|
|
|
|
(page->objects - 1) * cache->size;
|
2016-07-29 06:49:04 +08:00
|
|
|
void *result = (unlikely(object > last_object)) ? last_object : object;
|
|
|
|
|
|
|
|
result = fixup_red_left(cache, result);
|
|
|
|
return result;
|
2016-03-26 05:21:59 +08:00
|
|
|
}
|
|
|
|
|
2020-08-07 14:20:42 +08:00
|
|
|
/* Determine object index from a given position */
|
|
|
|
static inline unsigned int __obj_to_index(const struct kmem_cache *cache,
|
|
|
|
void *addr, void *obj)
|
|
|
|
{
|
|
|
|
return reciprocal_divide(kasan_reset_tag(obj) - addr,
|
|
|
|
cache->reciprocal_size);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned int obj_to_index(const struct kmem_cache *cache,
|
|
|
|
const struct page *page, void *obj)
|
|
|
|
{
|
2021-02-26 09:19:16 +08:00
|
|
|
if (is_kfence_address(obj))
|
|
|
|
return 0;
|
2020-08-07 14:20:42 +08:00
|
|
|
return __obj_to_index(cache, page_address(page), obj);
|
|
|
|
}
|
|
|
|
|
2020-08-07 14:20:52 +08:00
|
|
|
static inline int objs_per_slab_page(const struct kmem_cache *cache,
|
|
|
|
const struct page *page)
|
|
|
|
{
|
|
|
|
return page->objects;
|
|
|
|
}
|
2007-05-07 05:49:36 +08:00
|
|
|
#endif /* _LINUX_SLUB_DEF_H */
|