hwpoison: fix the handling path of the victimized page frame that belong to non-LRU
Until now, the kernel has the same policy to handle victimized page frames that belong to kernel-space(reserved/slab-subsystem) or non-LRU(unknown page state). In other word, the result of handling either of these victimized page frames is (IGNORED | FAILED), and the return value of memory_failure() is -EBUSY. This patch is to avoid that memory_failure() returns very soon due to the "true" value of (!PageLRU(p)), and it also ensures that action_result() can report more precise information("reserved kernel", "kernel slab", and "unknown page state") instead of "non LRU", especially for memory errors which are detected by memory-scrubbing. Andi said: : While running the mcelog test suite on 3.14 I hit the following VM_BUG_ON: : : soft_offline: 0x56d4: unknown non LRU page type 3ffff800008000 : page:ffffea000015b400 count:3 mapcount:2097169 mapping: (null) index:0xffff8800056d7000 : page flags: 0x3ffff800004081(locked|slab|head) : ------------[ cut here ]------------ : kernel BUG at mm/rmap.c:1495! : : I think what happened is that a LRU page turned into a slab page in : parallel with offlining. memory_failure initially tests for this case, : but doesn't retest later after the page has been locked. : : ... : : I ran this patch in a loop over night with some stress plus : the mcelog test suite running in a loop. I cannot guarantee it hit it, : but it should have given it a good beating. : : The kernel survived with no messages, although the mcelog test suite : got killed at some point because it couldn't fork anymore. Probably : some unrelated problem. : : So the patch is ok for me for .16. Signed-off-by: Chen Yucong <slaoub@gmail.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
parent
b27ebf7791
commit
0bc1f8b068
|
@ -895,7 +895,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
|
|||
struct page *hpage = *hpagep;
|
||||
struct page *ppage;
|
||||
|
||||
if (PageReserved(p) || PageSlab(p))
|
||||
if (PageReserved(p) || PageSlab(p) || !PageLRU(p))
|
||||
return SWAP_SUCCESS;
|
||||
|
||||
/*
|
||||
|
@ -1159,9 +1159,6 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
|
|||
action_result(pfn, "free buddy, 2nd try", DELAYED);
|
||||
return 0;
|
||||
}
|
||||
action_result(pfn, "non LRU", IGNORED);
|
||||
put_page(p);
|
||||
return -EBUSY;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -1194,6 +1191,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
|
|||
return 0;
|
||||
}
|
||||
|
||||
if (!PageHuge(p) && !PageTransTail(p) && !PageLRU(p))
|
||||
goto identify_page_state;
|
||||
|
||||
/*
|
||||
* For error on the tail page, we should set PG_hwpoison
|
||||
* on the head page to show that the hugepage is hwpoisoned
|
||||
|
@ -1243,6 +1243,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
|
|||
goto out;
|
||||
}
|
||||
|
||||
identify_page_state:
|
||||
res = -EBUSY;
|
||||
/*
|
||||
* The first check uses the current page flags which may not have any
|
||||
|
|
Loading…
Reference in New Issue