mm, compaction: rename compact_control->rescan to finish_pageblock

Patch series "Fix excessive CPU usage during compaction".

Commit 7efc3b7261 ("mm/compaction: fix set skip in fast_find_migrateblock")
fixed a problem where pageblocks found by fast_find_migrateblock() were
ignored. Unfortunately there were numerous bug reports complaining about high
CPU usage and massive stalls once 6.1 was released. Due to the severity,
the patch was reverted by Vlastimil as a short-term fix[1] to -stable.		

The underlying problem for each of the bugs is suspected to be the
repeated scanning of the same pageblocks.  This series should guarantee
forward progress even with commit 7efc3b7261.  More information is in
the changelog for patch 4.

[1] http://lore.kernel.org/r/20230113173345.9692-1-vbabka@suse.cz


This patch (of 4):

The rescan field was not well named albeit accurate at the time.  Rename
the field to finish_pageblock to indicate that the remainder of the
pageblock should be scanned regardless of COMPACT_CLUSTER_MAX.  The intent
is that pageblocks with transient failures get marked for skipping to
avoid revisiting the same pageblock.

Link: https://lkml.kernel.org/r/20230125134434.18017-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Chuyi Zhou <zhouchuyi@bytedance.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
Mel Gorman 2023-01-25 13:44:31 +00:00 committed by Andrew Morton
parent c5acf1f6f0
commit 48731c8436
2 changed files with 17 additions and 13 deletions

View File

@ -1101,12 +1101,12 @@ isolate_success_no_list:
/* /*
* Avoid isolating too much unless this block is being * Avoid isolating too much unless this block is being
* rescanned (e.g. dirty/writeback pages, parallel allocation) * fully scanned (e.g. dirty/writeback pages, parallel allocation)
* or a lock is contended. For contention, isolate quickly to * or a lock is contended. For contention, isolate quickly to
* potentially remove one source of contention. * potentially remove one source of contention.
*/ */
if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX && if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
!cc->rescan && !cc->contended) { !cc->finish_pageblock && !cc->contended) {
++low_pfn; ++low_pfn;
break; break;
} }
@ -1171,14 +1171,14 @@ isolate_abort:
} }
/* /*
* Updated the cached scanner pfn once the pageblock has been scanned * Update the cached scanner pfn once the pageblock has been scanned.
* Pages will either be migrated in which case there is no point * Pages will either be migrated in which case there is no point
* scanning in the near future or migration failed in which case the * scanning in the near future or migration failed in which case the
* failure reason may persist. The block is marked for skipping if * failure reason may persist. The block is marked for skipping if
* there were no pages isolated in the block or if the block is * there were no pages isolated in the block or if the block is
* rescanned twice in a row. * rescanned twice in a row.
*/ */
if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) { if (low_pfn == end_pfn && (!nr_isolated || cc->finish_pageblock)) {
if (valid_page && !skip_updated) if (valid_page && !skip_updated)
set_pageblock_skip(valid_page); set_pageblock_skip(valid_page);
update_cached_migrate(cc, low_pfn); update_cached_migrate(cc, low_pfn);
@ -2372,17 +2372,17 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
unsigned long iteration_start_pfn = cc->migrate_pfn; unsigned long iteration_start_pfn = cc->migrate_pfn;
/* /*
* Avoid multiple rescans which can happen if a page cannot be * Avoid multiple rescans of the same pageblock which can
* isolated (dirty/writeback in async mode) or if the migrated * happen if a page cannot be isolated (dirty/writeback in
* pages are being allocated before the pageblock is cleared. * async mode) or if the migrated pages are being allocated
* The first rescan will capture the entire pageblock for * before the pageblock is cleared. The first rescan will
* migration. If it fails, it'll be marked skip and scanning * capture the entire pageblock for migration. If it fails,
* will proceed as normal. * it'll be marked skip and scanning will proceed as normal.
*/ */
cc->rescan = false; cc->finish_pageblock = false;
if (pageblock_start_pfn(last_migrated_pfn) == if (pageblock_start_pfn(last_migrated_pfn) ==
pageblock_start_pfn(iteration_start_pfn)) { pageblock_start_pfn(iteration_start_pfn)) {
cc->rescan = true; cc->finish_pageblock = true;
} }
switch (isolate_migratepages(cc)) { switch (isolate_migratepages(cc)) {

View File

@ -448,7 +448,11 @@ struct compact_control {
bool proactive_compaction; /* kcompactd proactive compaction */ bool proactive_compaction; /* kcompactd proactive compaction */
bool whole_zone; /* Whole zone should/has been scanned */ bool whole_zone; /* Whole zone should/has been scanned */
bool contended; /* Signal lock contention */ bool contended; /* Signal lock contention */
bool rescan; /* Rescanning the same pageblock */ bool finish_pageblock; /* Scan the remainder of a pageblock. Used
* when there are potentially transient
* isolation or migration failures to
* ensure forward progress.
*/
bool alloc_contig; /* alloc_contig_range allocation */ bool alloc_contig; /* alloc_contig_range allocation */
}; };