License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2010-05-25 05:32:27 +08:00
|
|
|
#ifndef _LINUX_COMPACTION_H
|
|
|
|
#define _LINUX_COMPACTION_H
|
|
|
|
|
2016-07-29 06:49:28 +08:00
|
|
|
/*
|
|
|
|
* Determines how hard direct compaction should try to succeed.
|
|
|
|
* Lower value means higher priority, analogically to reclaim priority.
|
|
|
|
*/
|
|
|
|
enum compact_priority {
|
2016-10-08 07:57:47 +08:00
|
|
|
COMPACT_PRIO_SYNC_FULL,
|
|
|
|
MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL,
|
2016-07-29 06:49:28 +08:00
|
|
|
COMPACT_PRIO_SYNC_LIGHT,
|
2016-10-08 08:00:34 +08:00
|
|
|
MIN_COMPACT_COSTLY_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
|
2016-07-29 06:49:28 +08:00
|
|
|
DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
|
|
|
|
COMPACT_PRIO_ASYNC,
|
|
|
|
INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC
|
|
|
|
};
|
|
|
|
|
2010-05-25 05:32:30 +08:00
|
|
|
/* Return values for compact_zone() and try_to_compact_pages() */
|
2015-11-06 10:47:56 +08:00
|
|
|
/* When adding new states, please adjust include/trace/events/compaction.h */
|
2016-05-21 07:56:38 +08:00
|
|
|
enum compact_result {
|
2016-05-21 07:56:50 +08:00
|
|
|
/* For more detailed tracepoint output - internal to compaction */
|
|
|
|
COMPACT_NOT_SUITABLE_ZONE,
|
2016-05-21 07:56:38 +08:00
|
|
|
/*
|
|
|
|
* compaction didn't start as it was not possible or direct reclaim
|
|
|
|
* was more suitable
|
|
|
|
*/
|
|
|
|
COMPACT_SKIPPED,
|
2016-05-21 07:56:44 +08:00
|
|
|
/* compaction didn't start as it was deferred due to past failures */
|
|
|
|
COMPACT_DEFERRED,
|
2016-05-21 07:56:50 +08:00
|
|
|
|
|
|
|
/* For more detailed tracepoint output - internal to compaction */
|
|
|
|
COMPACT_NO_SUITABLE_PAGE,
|
2016-05-21 07:56:38 +08:00
|
|
|
/* compaction should continue to another pageblock */
|
|
|
|
COMPACT_CONTINUE,
|
2016-05-21 07:56:50 +08:00
|
|
|
|
2016-05-21 07:56:38 +08:00
|
|
|
/*
|
2016-05-21 07:56:50 +08:00
|
|
|
* The full zone was compacted scanned but wasn't successfull to compact
|
|
|
|
* suitable pages.
|
2016-05-21 07:56:38 +08:00
|
|
|
*/
|
2016-05-21 07:56:50 +08:00
|
|
|
COMPACT_COMPLETE,
|
2016-05-21 07:56:47 +08:00
|
|
|
/*
|
|
|
|
* direct compaction has scanned part of the zone but wasn't successfull
|
|
|
|
* to compact suitable pages.
|
|
|
|
*/
|
|
|
|
COMPACT_PARTIAL_SKIPPED,
|
2016-05-21 07:56:50 +08:00
|
|
|
|
|
|
|
/* compaction terminated prematurely due to lock contentions */
|
|
|
|
COMPACT_CONTENDED,
|
|
|
|
|
2016-05-21 07:56:47 +08:00
|
|
|
/*
|
2016-10-08 07:57:41 +08:00
|
|
|
* direct compaction terminated after concluding that the allocation
|
|
|
|
* should now succeed
|
2016-05-21 07:56:47 +08:00
|
|
|
*/
|
2016-10-08 07:57:41 +08:00
|
|
|
COMPACT_SUCCESS,
|
2016-05-21 07:56:38 +08:00
|
|
|
};
|
2010-05-25 05:32:27 +08:00
|
|
|
|
2015-02-12 07:25:44 +08:00
|
|
|
struct alloc_context; /* in mm/internal.h */
|
|
|
|
|
2016-10-08 07:57:53 +08:00
|
|
|
/*
|
|
|
|
* Number of free order-0 pages that should be available above given watermark
|
|
|
|
* to make sure compaction has reasonable chance of not running out of free
|
|
|
|
* pages that it needs to isolate as migration target during its work.
|
|
|
|
*/
|
|
|
|
static inline unsigned long compact_gap(unsigned int order)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Although all the isolations for migration are temporary, compaction
|
|
|
|
* free scanner may have up to 1 << order pages on its list and then
|
|
|
|
* try to split an (order - 1) free page. At that point, a gap of
|
|
|
|
* 1 << order might not be enough, so it's safer to require twice that
|
|
|
|
* amount. Note that the number of pages on the list is also
|
|
|
|
* effectively limited by COMPACT_CLUSTER_MAX, as that's the maximum
|
|
|
|
* that the migrate scanner can have isolated on migrate list, and free
|
|
|
|
* scanner is only invoked when the number of isolated free pages is
|
|
|
|
* lower than that. But it's not worth to complicate the formula here
|
|
|
|
* as a bigger gap for higher orders than strictly necessary can also
|
|
|
|
* improve chances of compaction success.
|
|
|
|
*/
|
|
|
|
return 2UL << order;
|
|
|
|
}
|
|
|
|
|
2010-05-25 05:32:28 +08:00
|
|
|
#ifdef CONFIG_COMPACTION
|
|
|
|
extern int sysctl_compact_memory;
|
2020-08-12 09:31:07 +08:00
|
|
|
extern unsigned int sysctl_compaction_proactiveness;
|
2010-05-25 05:32:28 +08:00
|
|
|
extern int sysctl_compaction_handler(struct ctl_table *table, int write,
|
2020-04-24 14:43:38 +08:00
|
|
|
void *buffer, size_t *length, loff_t *ppos);
|
2010-05-25 05:32:31 +08:00
|
|
|
extern int sysctl_extfrag_threshold;
|
2015-04-16 07:13:20 +08:00
|
|
|
extern int sysctl_compact_unevictable_allowed;
|
2010-05-25 05:32:30 +08:00
|
|
|
|
2020-08-12 09:31:07 +08:00
|
|
|
extern unsigned int extfrag_for_order(struct zone *zone, unsigned int order);
|
2010-05-25 05:32:30 +08:00
|
|
|
extern int fragmentation_index(struct zone *zone, unsigned int order);
|
2016-05-21 07:56:38 +08:00
|
|
|
extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
|
mm, compaction: simplify contended compaction handling
Async compaction detects contention either due to failing trylock on
zone->lock or lru_lock, or by need_resched(). Since 1f9efdef4f3f ("mm,
compaction: khugepaged should not give up due to need_resched()") the
code got quite complicated to distinguish these two up to the
__alloc_pages_slowpath() level, so different decisions could be taken
for khugepaged allocations.
After the recent changes, khugepaged allocations don't check for
contended compaction anymore, so we again don't need to distinguish lock
and sched contention, and simplify the current convoluted code a lot.
However, I believe it's also possible to simplify even more and
completely remove the check for contended compaction after the initial
async compaction for costly orders, which was originally aimed at THP
page fault allocations. There are several reasons why this can be done
now:
- with the new defaults, THP page faults no longer do reclaim/compaction at
all, unless the system admin has overridden the default, or application has
indicated via madvise that it can benefit from THP's. In both cases, it
means that the potential extra latency is expected and worth the benefits.
- even if reclaim/compaction proceeds after this patch where it previously
wouldn't, the second compaction attempt is still async and will detect the
contention and back off, if the contention persists
- there are still heuristics like deferred compaction and pageblock skip bits
in place that prevent excessive THP page fault latencies
Link: http://lkml.kernel.org/r/20160721073614.24395-9-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-29 06:49:30 +08:00
|
|
|
unsigned int order, unsigned int alloc_flags,
|
2019-03-06 07:45:41 +08:00
|
|
|
const struct alloc_context *ac, enum compact_priority prio,
|
|
|
|
struct page **page);
|
2012-10-09 07:32:47 +08:00
|
|
|
extern void reset_isolation_suitable(pg_data_t *pgdat);
|
2016-05-21 07:56:38 +08:00
|
|
|
extern enum compact_result compaction_suitable(struct zone *zone, int order,
|
2020-06-04 06:59:01 +08:00
|
|
|
unsigned int alloc_flags, int highest_zoneidx);
|
2010-05-25 05:32:32 +08:00
|
|
|
|
2015-02-12 07:27:09 +08:00
|
|
|
extern void compaction_defer_reset(struct zone *zone, int order,
|
|
|
|
bool alloc_success);
|
2012-10-09 07:32:47 +08:00
|
|
|
|
2016-05-21 07:56:56 +08:00
|
|
|
/* Compaction has made some progress and retrying makes sense */
|
|
|
|
static inline bool compaction_made_progress(enum compact_result result)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Even though this might sound confusing this in fact tells us
|
|
|
|
* that the compaction successfully isolated and migrated some
|
|
|
|
* pageblocks.
|
|
|
|
*/
|
2016-10-08 07:57:41 +08:00
|
|
|
if (result == COMPACT_SUCCESS)
|
2016-05-21 07:56:56 +08:00
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Compaction has failed and it doesn't make much sense to keep retrying. */
|
|
|
|
static inline bool compaction_failed(enum compact_result result)
|
|
|
|
{
|
|
|
|
/* All zones were scanned completely and still not result. */
|
|
|
|
if (result == COMPACT_COMPLETE)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
mm, compaction: raise compaction priority after it withdrawns
Mike Kravetz reports that "hugetlb allocations could stall for minutes or
hours when should_compact_retry() would return true more often then it
should. Specifically, this was in the case where compact_result was
COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being
made."
The problem is that the compaction_withdrawn() test in
should_compact_retry() includes compaction outcomes that are only possible
on low compaction priority, and results in a retry without increasing the
priority. This may result in furter reclaim, and more incomplete
compaction attempts.
With this patch, compaction priority is raised when possible, or
should_compact_retry() returns false.
The COMPACT_SKIPPED result doesn't really fit together with the other
outcomes in compaction_withdrawn(), as that's a result caused by
insufficient order-0 pages, not due to low compaction priority. With this
patch, it is moved to a new compaction_needs_reclaim() function, and for
that outcome we keep the current logic of retrying if it looks like
reclaim will be able to help.
Link: http://lkml.kernel.org/r/20190806014744.15446-4-mike.kravetz@oracle.com
Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 06:37:32 +08:00
|
|
|
/* Compaction needs reclaim to be performed first, so it can continue. */
|
|
|
|
static inline bool compaction_needs_reclaim(enum compact_result result)
|
2016-05-21 07:56:56 +08:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Compaction backed off due to watermark checks for order-0
|
|
|
|
* so the regular reclaim has to try harder and reclaim something.
|
|
|
|
*/
|
|
|
|
if (result == COMPACT_SKIPPED)
|
|
|
|
return true;
|
|
|
|
|
mm, compaction: raise compaction priority after it withdrawns
Mike Kravetz reports that "hugetlb allocations could stall for minutes or
hours when should_compact_retry() would return true more often then it
should. Specifically, this was in the case where compact_result was
COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being
made."
The problem is that the compaction_withdrawn() test in
should_compact_retry() includes compaction outcomes that are only possible
on low compaction priority, and results in a retry without increasing the
priority. This may result in furter reclaim, and more incomplete
compaction attempts.
With this patch, compaction priority is raised when possible, or
should_compact_retry() returns false.
The COMPACT_SKIPPED result doesn't really fit together with the other
outcomes in compaction_withdrawn(), as that's a result caused by
insufficient order-0 pages, not due to low compaction priority. With this
patch, it is moved to a new compaction_needs_reclaim() function, and for
that outcome we keep the current logic of retrying if it looks like
reclaim will be able to help.
Link: http://lkml.kernel.org/r/20190806014744.15446-4-mike.kravetz@oracle.com
Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 06:37:32 +08:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Compaction has backed off for some reason after doing some work or none
|
|
|
|
* at all. It might be throttling or lock contention. Retrying might be still
|
|
|
|
* worthwhile, but with a higher priority if allowed.
|
|
|
|
*/
|
|
|
|
static inline bool compaction_withdrawn(enum compact_result result)
|
|
|
|
{
|
2016-05-21 07:56:56 +08:00
|
|
|
/*
|
|
|
|
* If compaction is deferred for high-order allocations, it is
|
|
|
|
* because sync compaction recently failed. If this is the case
|
|
|
|
* and the caller requested a THP allocation, we do not want
|
|
|
|
* to heavily disrupt the system, so we fail the allocation
|
|
|
|
* instead of entering direct reclaim.
|
|
|
|
*/
|
|
|
|
if (result == COMPACT_DEFERRED)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If compaction in async mode encounters contention or blocks higher
|
|
|
|
* priority task we back off early rather than cause stalls.
|
|
|
|
*/
|
|
|
|
if (result == COMPACT_CONTENDED)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Page scanners have met but we haven't scanned full zones so this
|
|
|
|
* is a back off in fact.
|
|
|
|
*/
|
|
|
|
if (result == COMPACT_PARTIAL_SKIPPED)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-05-21 07:57:12 +08:00
|
|
|
|
|
|
|
bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
|
|
|
|
int alloc_flags);
|
|
|
|
|
mm, compaction: introduce kcompactd
Memory compaction can be currently performed in several contexts:
- kswapd balancing a zone after a high-order allocation failure
- direct compaction to satisfy a high-order allocation, including THP
page fault attemps
- khugepaged trying to collapse a hugepage
- manually from /proc
The purpose of compaction is two-fold. The obvious purpose is to
satisfy a (pending or future) high-order allocation, and is easy to
evaluate. The other purpose is to keep overal memory fragmentation low
and help the anti-fragmentation mechanism. The success wrt the latter
purpose is more
The current situation wrt the purposes has a few drawbacks:
- compaction is invoked only when a high-order page or hugepage is not
available (or manually). This might be too late for the purposes of
keeping memory fragmentation low.
- direct compaction increases latency of allocations. Again, it would
be better if compaction was performed asynchronously to keep
fragmentation low, before the allocation itself comes.
- (a special case of the previous) the cost of compaction during THP
page faults can easily offset the benefits of THP.
- kswapd compaction appears to be complex, fragile and not working in
some scenarios. It could also end up compacting for a high-order
allocation request when it should be reclaiming memory for a later
order-0 request.
To improve the situation, we should be able to benefit from an
equivalent of kswapd, but for compaction - i.e. a background thread
which responds to fragmentation and the need for high-order allocations
(including hugepages) somewhat proactively.
One possibility is to extend the responsibilities of kswapd, which could
however complicate its design too much. It should be better to let
kswapd handle reclaim, as order-0 allocations are often more critical
than high-order ones.
Another possibility is to extend khugepaged, but this kthread is a
single instance and tied to THP configs.
This patch goes with the option of a new set of per-node kthreads called
kcompactd, and lays the foundations, without introducing any new
tunables. The lifecycle mimics kswapd kthreads, including the memory
hotplug hooks.
For compaction, kcompactd uses the standard compaction_suitable() and
ompact_finished() criteria and the deferred compaction functionality.
Unlike direct compaction, it uses only sync compaction, as there's no
allocation latency to minimize.
This patch doesn't yet add a call to wakeup_kcompactd. The kswapd
compact/reclaim loop for high-order pages will be replaced by waking up
kcompactd in the next patch with the description of what's wrong with
the old approach.
Waking up of the kcompactd threads is also tied to kswapd activity and
follows these rules:
- we don't want to affect any fastpaths, so wake up kcompactd only from
the slowpath, as it's done for kswapd
- if kswapd is doing reclaim, it's more important than compaction, so
don't invoke kcompactd until kswapd goes to sleep
- the target order used for kswapd is passed to kcompactd
Future possible future uses for kcompactd include the ability to wake up
kcompactd on demand in special situations, such as when hugepages are
not available (currently not done due to __GFP_NO_KSWAPD) or when a
fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also
possible to perform periodic compaction with kcompactd.
[arnd@arndb.de: fix build errors with kcompactd]
[paul.gortmaker@windriver.com: don't use modular references for non modular code]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 05:18:08 +08:00
|
|
|
extern int kcompactd_run(int nid);
|
|
|
|
extern void kcompactd_stop(int nid);
|
2020-06-04 06:59:01 +08:00
|
|
|
extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx);
|
mm, compaction: introduce kcompactd
Memory compaction can be currently performed in several contexts:
- kswapd balancing a zone after a high-order allocation failure
- direct compaction to satisfy a high-order allocation, including THP
page fault attemps
- khugepaged trying to collapse a hugepage
- manually from /proc
The purpose of compaction is two-fold. The obvious purpose is to
satisfy a (pending or future) high-order allocation, and is easy to
evaluate. The other purpose is to keep overal memory fragmentation low
and help the anti-fragmentation mechanism. The success wrt the latter
purpose is more
The current situation wrt the purposes has a few drawbacks:
- compaction is invoked only when a high-order page or hugepage is not
available (or manually). This might be too late for the purposes of
keeping memory fragmentation low.
- direct compaction increases latency of allocations. Again, it would
be better if compaction was performed asynchronously to keep
fragmentation low, before the allocation itself comes.
- (a special case of the previous) the cost of compaction during THP
page faults can easily offset the benefits of THP.
- kswapd compaction appears to be complex, fragile and not working in
some scenarios. It could also end up compacting for a high-order
allocation request when it should be reclaiming memory for a later
order-0 request.
To improve the situation, we should be able to benefit from an
equivalent of kswapd, but for compaction - i.e. a background thread
which responds to fragmentation and the need for high-order allocations
(including hugepages) somewhat proactively.
One possibility is to extend the responsibilities of kswapd, which could
however complicate its design too much. It should be better to let
kswapd handle reclaim, as order-0 allocations are often more critical
than high-order ones.
Another possibility is to extend khugepaged, but this kthread is a
single instance and tied to THP configs.
This patch goes with the option of a new set of per-node kthreads called
kcompactd, and lays the foundations, without introducing any new
tunables. The lifecycle mimics kswapd kthreads, including the memory
hotplug hooks.
For compaction, kcompactd uses the standard compaction_suitable() and
ompact_finished() criteria and the deferred compaction functionality.
Unlike direct compaction, it uses only sync compaction, as there's no
allocation latency to minimize.
This patch doesn't yet add a call to wakeup_kcompactd. The kswapd
compact/reclaim loop for high-order pages will be replaced by waking up
kcompactd in the next patch with the description of what's wrong with
the old approach.
Waking up of the kcompactd threads is also tied to kswapd activity and
follows these rules:
- we don't want to affect any fastpaths, so wake up kcompactd only from
the slowpath, as it's done for kswapd
- if kswapd is doing reclaim, it's more important than compaction, so
don't invoke kcompactd until kswapd goes to sleep
- the target order used for kswapd is passed to kcompactd
Future possible future uses for kcompactd include the ability to wake up
kcompactd on demand in special situations, such as when hugepages are
not available (currently not done due to __GFP_NO_KSWAPD) or when a
fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also
possible to perform periodic compaction with kcompactd.
[arnd@arndb.de: fix build errors with kcompactd]
[paul.gortmaker@windriver.com: don't use modular references for non modular code]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 05:18:08 +08:00
|
|
|
|
2010-05-25 05:32:30 +08:00
|
|
|
#else
|
2012-10-09 07:32:47 +08:00
|
|
|
static inline void reset_isolation_suitable(pg_data_t *pgdat)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2016-05-21 07:56:38 +08:00
|
|
|
static inline enum compact_result compaction_suitable(struct zone *zone, int order,
|
2020-06-04 06:59:01 +08:00
|
|
|
int alloc_flags, int highest_zoneidx)
|
2011-01-14 07:45:56 +08:00
|
|
|
{
|
|
|
|
return COMPACT_SKIPPED;
|
|
|
|
}
|
|
|
|
|
2016-05-21 07:56:56 +08:00
|
|
|
static inline bool compaction_made_progress(enum compact_result result)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool compaction_failed(enum compact_result result)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
mm, compaction: raise compaction priority after it withdrawns
Mike Kravetz reports that "hugetlb allocations could stall for minutes or
hours when should_compact_retry() would return true more often then it
should. Specifically, this was in the case where compact_result was
COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being
made."
The problem is that the compaction_withdrawn() test in
should_compact_retry() includes compaction outcomes that are only possible
on low compaction priority, and results in a retry without increasing the
priority. This may result in furter reclaim, and more incomplete
compaction attempts.
With this patch, compaction priority is raised when possible, or
should_compact_retry() returns false.
The COMPACT_SKIPPED result doesn't really fit together with the other
outcomes in compaction_withdrawn(), as that's a result caused by
insufficient order-0 pages, not due to low compaction priority. With this
patch, it is moved to a new compaction_needs_reclaim() function, and for
that outcome we keep the current logic of retrying if it looks like
reclaim will be able to help.
Link: http://lkml.kernel.org/r/20190806014744.15446-4-mike.kravetz@oracle.com
Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 06:37:32 +08:00
|
|
|
static inline bool compaction_needs_reclaim(enum compact_result result)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-05-21 07:56:56 +08:00
|
|
|
static inline bool compaction_withdrawn(enum compact_result result)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
mm, compaction: introduce kcompactd
Memory compaction can be currently performed in several contexts:
- kswapd balancing a zone after a high-order allocation failure
- direct compaction to satisfy a high-order allocation, including THP
page fault attemps
- khugepaged trying to collapse a hugepage
- manually from /proc
The purpose of compaction is two-fold. The obvious purpose is to
satisfy a (pending or future) high-order allocation, and is easy to
evaluate. The other purpose is to keep overal memory fragmentation low
and help the anti-fragmentation mechanism. The success wrt the latter
purpose is more
The current situation wrt the purposes has a few drawbacks:
- compaction is invoked only when a high-order page or hugepage is not
available (or manually). This might be too late for the purposes of
keeping memory fragmentation low.
- direct compaction increases latency of allocations. Again, it would
be better if compaction was performed asynchronously to keep
fragmentation low, before the allocation itself comes.
- (a special case of the previous) the cost of compaction during THP
page faults can easily offset the benefits of THP.
- kswapd compaction appears to be complex, fragile and not working in
some scenarios. It could also end up compacting for a high-order
allocation request when it should be reclaiming memory for a later
order-0 request.
To improve the situation, we should be able to benefit from an
equivalent of kswapd, but for compaction - i.e. a background thread
which responds to fragmentation and the need for high-order allocations
(including hugepages) somewhat proactively.
One possibility is to extend the responsibilities of kswapd, which could
however complicate its design too much. It should be better to let
kswapd handle reclaim, as order-0 allocations are often more critical
than high-order ones.
Another possibility is to extend khugepaged, but this kthread is a
single instance and tied to THP configs.
This patch goes with the option of a new set of per-node kthreads called
kcompactd, and lays the foundations, without introducing any new
tunables. The lifecycle mimics kswapd kthreads, including the memory
hotplug hooks.
For compaction, kcompactd uses the standard compaction_suitable() and
ompact_finished() criteria and the deferred compaction functionality.
Unlike direct compaction, it uses only sync compaction, as there's no
allocation latency to minimize.
This patch doesn't yet add a call to wakeup_kcompactd. The kswapd
compact/reclaim loop for high-order pages will be replaced by waking up
kcompactd in the next patch with the description of what's wrong with
the old approach.
Waking up of the kcompactd threads is also tied to kswapd activity and
follows these rules:
- we don't want to affect any fastpaths, so wake up kcompactd only from
the slowpath, as it's done for kswapd
- if kswapd is doing reclaim, it's more important than compaction, so
don't invoke kcompactd until kswapd goes to sleep
- the target order used for kswapd is passed to kcompactd
Future possible future uses for kcompactd include the ability to wake up
kcompactd on demand in special situations, such as when hugepages are
not available (currently not done due to __GFP_NO_KSWAPD) or when a
fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also
possible to perform periodic compaction with kcompactd.
[arnd@arndb.de: fix build errors with kcompactd]
[paul.gortmaker@windriver.com: don't use modular references for non modular code]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 05:18:08 +08:00
|
|
|
static inline int kcompactd_run(int nid)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
static inline void kcompactd_stop(int nid)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2020-06-04 06:59:01 +08:00
|
|
|
static inline void wakeup_kcompactd(pg_data_t *pgdat,
|
|
|
|
int order, int highest_zoneidx)
|
mm, compaction: introduce kcompactd
Memory compaction can be currently performed in several contexts:
- kswapd balancing a zone after a high-order allocation failure
- direct compaction to satisfy a high-order allocation, including THP
page fault attemps
- khugepaged trying to collapse a hugepage
- manually from /proc
The purpose of compaction is two-fold. The obvious purpose is to
satisfy a (pending or future) high-order allocation, and is easy to
evaluate. The other purpose is to keep overal memory fragmentation low
and help the anti-fragmentation mechanism. The success wrt the latter
purpose is more
The current situation wrt the purposes has a few drawbacks:
- compaction is invoked only when a high-order page or hugepage is not
available (or manually). This might be too late for the purposes of
keeping memory fragmentation low.
- direct compaction increases latency of allocations. Again, it would
be better if compaction was performed asynchronously to keep
fragmentation low, before the allocation itself comes.
- (a special case of the previous) the cost of compaction during THP
page faults can easily offset the benefits of THP.
- kswapd compaction appears to be complex, fragile and not working in
some scenarios. It could also end up compacting for a high-order
allocation request when it should be reclaiming memory for a later
order-0 request.
To improve the situation, we should be able to benefit from an
equivalent of kswapd, but for compaction - i.e. a background thread
which responds to fragmentation and the need for high-order allocations
(including hugepages) somewhat proactively.
One possibility is to extend the responsibilities of kswapd, which could
however complicate its design too much. It should be better to let
kswapd handle reclaim, as order-0 allocations are often more critical
than high-order ones.
Another possibility is to extend khugepaged, but this kthread is a
single instance and tied to THP configs.
This patch goes with the option of a new set of per-node kthreads called
kcompactd, and lays the foundations, without introducing any new
tunables. The lifecycle mimics kswapd kthreads, including the memory
hotplug hooks.
For compaction, kcompactd uses the standard compaction_suitable() and
ompact_finished() criteria and the deferred compaction functionality.
Unlike direct compaction, it uses only sync compaction, as there's no
allocation latency to minimize.
This patch doesn't yet add a call to wakeup_kcompactd. The kswapd
compact/reclaim loop for high-order pages will be replaced by waking up
kcompactd in the next patch with the description of what's wrong with
the old approach.
Waking up of the kcompactd threads is also tied to kswapd activity and
follows these rules:
- we don't want to affect any fastpaths, so wake up kcompactd only from
the slowpath, as it's done for kswapd
- if kswapd is doing reclaim, it's more important than compaction, so
don't invoke kcompactd until kswapd goes to sleep
- the target order used for kswapd is passed to kcompactd
Future possible future uses for kcompactd include the ability to wake up
kcompactd on demand in special situations, such as when hugepages are
not available (currently not done due to __GFP_NO_KSWAPD) or when a
fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also
possible to perform periodic compaction with kcompactd.
[arnd@arndb.de: fix build errors with kcompactd]
[paul.gortmaker@windriver.com: don't use modular references for non modular code]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 05:18:08 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2010-05-25 05:32:28 +08:00
|
|
|
#endif /* CONFIG_COMPACTION */
|
|
|
|
|
mm: migrate: support non-lru movable page migration
We have allowed migration for only LRU pages until now and it was enough
to make high-order pages. But recently, embedded system(e.g., webOS,
android) uses lots of non-movable pages(e.g., zram, GPU memory) so we
have seen several reports about troubles of small high-order allocation.
For fixing the problem, there were several efforts (e,g,. enhance
compaction algorithm, SLUB fallback to 0-order page, reserved memory,
vmalloc and so on) but if there are lots of non-movable pages in system,
their solutions are void in the long run.
So, this patch is to support facility to change non-movable pages with
movable. For the feature, this patch introduces functions related to
migration to address_space_operations as well as some page flags.
If a driver want to make own pages movable, it should define three
functions which are function pointers of struct
address_space_operations.
1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully. On returing true, VM marks the
page as PG_isolated so concurrent isolation in several CPUs skip the
page for isolation. If a driver cannot isolate the page, it should
return *false*.
Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.
2. int (*migratepage) (struct address_space *mapping,
struct page *newpage, struct page *oldpage, enum migrate_mode);
After isolation, VM calls migratepage of driver with isolated page. The
function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage. Keep in mind that you should
indicate to the VM the oldpage is no longer movable via
__ClearPageMovable() under page_lock if you migrated the oldpage
successfully and returns 0. If driver cannot migrate the page at the
moment, driver can return -EAGAIN. On -EAGAIN, VM will retry page
migration in a short time because VM interprets -EAGAIN as "temporal
migration failure". On returning any error except -EAGAIN, VM will give
up the page migration without retrying in this time.
Driver shouldn't touch page.lru field VM using in the functions.
3. void (*putback_page)(struct page *);
If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed
page. In this function, driver should put the isolated page back to the
own data structure.
4. non-lru movable page flags
There are two page flags for supporting non-lru movable page.
* PG_movable
Driver should use the below function to make page movable under
page_lock.
void __SetPageMovable(struct page *page, struct address_space *mapping)
It needs argument of address_space for registering migration family
functions which will be called by VM. Exactly speaking, PG_movable is
not a real flag of struct page. Rather than, VM reuses page->mapping's
lower bits to represent it.
#define PAGE_MAPPING_MOVABLE 0x2
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
so driver shouldn't access page->mapping directly. Instead, driver
should use page_mapping which mask off the low two bits of page->mapping
so it can get right struct address_space.
For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page. As
well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at
__ClearPageMovable). But __PageMovable is cheap to catch whether page
is LRU or non-lru movable once the page has been isolated. Because LRU
pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more
expensive checking with lock_page in pfn scanning to select victim.
For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page. The lock_page prevents
sudden destroying of page->mapping.
Driver using __SetPageMovable should clear the flag via
__ClearMovablePage under page_lock before the releasing the page.
* PG_isolated
To prevent concurrent isolation among several CPUs, VM marks isolated
page as PG_isolated under lock_page. So if a CPU encounters PG_isolated
non-lru movable page, it can skip it. Driver doesn't need to manipulate
the flag because VM will set/clear it automatically. Keep in mind that
if driver sees PG_isolated page, it means the page have been isolated by
VM so it shouldn't touch page.lru field. PG_isolated is alias with
PG_reclaim flag so driver shouldn't use the flag for own purpose.
[opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru]
Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test
Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-27 06:23:05 +08:00
|
|
|
struct node;
|
2019-03-06 07:48:56 +08:00
|
|
|
#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
|
2010-05-25 05:32:29 +08:00
|
|
|
extern int compaction_register_node(struct node *node);
|
|
|
|
extern void compaction_unregister_node(struct node *node);
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
|
|
|
static inline int compaction_register_node(struct node *node)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void compaction_unregister_node(struct node *node)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
#endif /* CONFIG_COMPACTION && CONFIG_SYSFS && CONFIG_NUMA */
|
|
|
|
|
2010-05-25 05:32:27 +08:00
|
|
|
#endif /* _LINUX_COMPACTION_H */
|