2019-05-27 14:55:01 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2008-11-19 20:50:04 +08:00
|
|
|
/*
|
|
|
|
* This file contains common routines for dealing with free of page tables
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
* Along with common page table handling code
|
2008-11-19 20:50:04 +08:00
|
|
|
*
|
|
|
|
* Derived from arch/powerpc/mm/tlb_64.c:
|
|
|
|
* Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
|
|
|
|
*
|
|
|
|
* Modifications by Paul Mackerras (PowerMac) (paulus@cs.anu.edu.au)
|
|
|
|
* and Cort Dougan (PReP) (cort@cs.nmt.edu)
|
|
|
|
* Copyright (C) 1996 Paul Mackerras
|
|
|
|
*
|
|
|
|
* Derived from "arch/i386/mm/init.c"
|
|
|
|
* Copyright (C) 1991, 1992, 1993, 1994 Linus Torvalds
|
|
|
|
*
|
|
|
|
* Dave Engebretsen <engebret@us.ibm.com>
|
|
|
|
* Rework for PPC64 port.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/kernel.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/gfp.h>
|
2008-11-19 20:50:04 +08:00
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/percpu.h>
|
|
|
|
#include <linux/hardirq.h>
|
2011-06-28 17:54:48 +08:00
|
|
|
#include <linux/hugetlb.h>
|
2008-11-19 20:50:04 +08:00
|
|
|
#include <asm/tlbflush.h>
|
|
|
|
#include <asm/tlb.h>
|
2019-04-26 13:59:41 +08:00
|
|
|
#include <asm/hugetlb.h>
|
2021-01-04 22:31:44 +08:00
|
|
|
#include <asm/pte-walk.h>
|
2008-11-19 20:50:04 +08:00
|
|
|
|
2021-06-24 20:34:20 +08:00
|
|
|
#ifdef CONFIG_PPC64
|
|
|
|
#define PGD_ALIGN (sizeof(pgd_t) * MAX_PTRS_PER_PGD)
|
|
|
|
#else
|
|
|
|
#define PGD_ALIGN PAGE_SIZE
|
|
|
|
#endif
|
|
|
|
|
|
|
|
pgd_t swapper_pg_dir[MAX_PTRS_PER_PGD] __section(".bss..page_aligned") __aligned(PGD_ALIGN);
|
2021-06-07 18:56:05 +08:00
|
|
|
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
static inline int is_exec_fault(void)
|
|
|
|
{
|
|
|
|
return current->thread.regs && TRAP(current->thread.regs) == 0x400;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* We only try to do i/d cache coherency on stuff that looks like
|
|
|
|
* reasonably "normal" PTEs. We currently require a PTE to be present
|
2016-04-29 21:25:38 +08:00
|
|
|
* and we avoid _PAGE_SPECIAL and cache inhibited pte. We also only do that
|
2009-08-19 03:00:34 +08:00
|
|
|
* on userspace PTEs
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
*/
|
|
|
|
static inline int pte_looks_normal(pte_t pte)
|
|
|
|
{
|
2016-04-29 21:25:34 +08:00
|
|
|
|
2018-10-09 21:51:56 +08:00
|
|
|
if (pte_present(pte) && !pte_special(pte)) {
|
2016-04-29 21:25:38 +08:00
|
|
|
if (pte_ci(pte))
|
|
|
|
return 0;
|
2016-04-29 21:25:34 +08:00
|
|
|
if (pte_user(pte))
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
}
|
|
|
|
|
2014-08-20 06:55:18 +08:00
|
|
|
static struct page *maybe_pte_to_page(pte_t pte)
|
2009-08-19 03:00:34 +08:00
|
|
|
{
|
|
|
|
unsigned long pfn = pte_pfn(pte);
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
if (unlikely(!pfn_valid(pfn)))
|
|
|
|
return NULL;
|
|
|
|
page = pfn_to_page(pfn);
|
|
|
|
if (PageReserved(page))
|
|
|
|
return NULL;
|
|
|
|
return page;
|
|
|
|
}
|
|
|
|
|
2018-10-09 21:51:47 +08:00
|
|
|
#ifdef CONFIG_PPC_BOOK3S
|
2009-08-19 03:00:34 +08:00
|
|
|
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
/* Server-style MMU handles coherency when hashing if HW exec permission
|
2009-08-19 03:00:34 +08:00
|
|
|
* is supposed per page (currently 64-bit only). If not, then, we always
|
|
|
|
* flush the cache for valid PTEs in set_pte. Embedded CPU without HW exec
|
|
|
|
* support falls into the same category.
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
*/
|
2009-08-19 03:00:34 +08:00
|
|
|
|
2018-11-29 01:21:10 +08:00
|
|
|
static pte_t set_pte_filter_hash(pte_t pte)
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
{
|
2016-04-29 21:26:20 +08:00
|
|
|
if (radix_enabled())
|
|
|
|
return pte;
|
|
|
|
|
2009-08-19 03:00:34 +08:00
|
|
|
pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
|
|
|
|
if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
|
|
|
|
cpu_has_feature(CPU_FTR_NOEXECUTE))) {
|
|
|
|
struct page *pg = maybe_pte_to_page(pte);
|
|
|
|
if (!pg)
|
|
|
|
return pte;
|
2021-02-03 12:58:11 +08:00
|
|
|
if (!test_bit(PG_dcache_clean, &pg->flags)) {
|
2009-08-19 03:00:34 +08:00
|
|
|
flush_dcache_icache_page(pg);
|
2021-02-03 12:58:11 +08:00
|
|
|
set_bit(PG_dcache_clean, &pg->flags);
|
2009-08-19 03:00:34 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return pte;
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
}
|
2009-08-19 03:00:34 +08:00
|
|
|
|
2018-10-09 21:51:47 +08:00
|
|
|
#else /* CONFIG_PPC_BOOK3S */
|
2009-08-19 03:00:34 +08:00
|
|
|
|
2018-11-29 01:21:10 +08:00
|
|
|
static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
|
|
|
|
|
|
|
|
#endif /* CONFIG_PPC_BOOK3S */
|
|
|
|
|
2009-08-19 03:00:34 +08:00
|
|
|
/* Embedded type MMU with HW exec support. This is a bit more complicated
|
|
|
|
* as we don't have two bits to spare for _PAGE_EXEC and _PAGE_HWEXEC so
|
|
|
|
* instead we "filter out" the exec permission for non clean pages.
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
*/
|
2020-05-19 13:49:06 +08:00
|
|
|
static inline pte_t set_pte_filter(pte_t pte)
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
{
|
2009-08-19 03:00:34 +08:00
|
|
|
struct page *pg;
|
|
|
|
|
2018-11-29 01:21:10 +08:00
|
|
|
if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
|
|
|
|
return set_pte_filter_hash(pte);
|
|
|
|
|
2009-08-19 03:00:34 +08:00
|
|
|
/* No exec permission in the first place, move on */
|
2018-10-09 21:51:56 +08:00
|
|
|
if (!pte_exec(pte) || !pte_looks_normal(pte))
|
2009-08-19 03:00:34 +08:00
|
|
|
return pte;
|
|
|
|
|
|
|
|
/* If you set _PAGE_EXEC on weird pages you're on your own */
|
|
|
|
pg = maybe_pte_to_page(pte);
|
|
|
|
if (unlikely(!pg))
|
|
|
|
return pte;
|
|
|
|
|
|
|
|
/* If the page clean, we move on */
|
2021-02-03 12:58:11 +08:00
|
|
|
if (test_bit(PG_dcache_clean, &pg->flags))
|
2009-08-19 03:00:34 +08:00
|
|
|
return pte;
|
|
|
|
|
|
|
|
/* If it's an exec fault, we flush the cache and make it clean */
|
|
|
|
if (is_exec_fault()) {
|
|
|
|
flush_dcache_icache_page(pg);
|
2021-02-03 12:58:11 +08:00
|
|
|
set_bit(PG_dcache_clean, &pg->flags);
|
2009-08-19 03:00:34 +08:00
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Else, we filter out _PAGE_EXEC */
|
2018-10-09 21:51:56 +08:00
|
|
|
return pte_exprotect(pte);
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
}
|
2009-08-19 03:00:34 +08:00
|
|
|
|
|
|
|
static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
|
|
|
|
int dirty)
|
|
|
|
{
|
|
|
|
struct page *pg;
|
|
|
|
|
2018-11-29 01:21:10 +08:00
|
|
|
if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
|
|
|
|
return pte;
|
|
|
|
|
2009-08-19 03:00:34 +08:00
|
|
|
/* So here, we only care about exec faults, as we use them
|
|
|
|
* to recover lost _PAGE_EXEC and perform I$/D$ coherency
|
|
|
|
* if necessary. Also if _PAGE_EXEC is already set, same deal,
|
|
|
|
* we just bail out
|
|
|
|
*/
|
2018-10-09 21:51:56 +08:00
|
|
|
if (dirty || pte_exec(pte) || !is_exec_fault())
|
2009-08-19 03:00:34 +08:00
|
|
|
return pte;
|
|
|
|
|
|
|
|
#ifdef CONFIG_DEBUG_VM
|
|
|
|
/* So this is an exec fault, _PAGE_EXEC is not set. If it was
|
|
|
|
* an error we would have bailed out earlier in do_page_fault()
|
|
|
|
* but let's make sure of it
|
|
|
|
*/
|
|
|
|
if (WARN_ON(!(vma->vm_flags & VM_EXEC)))
|
|
|
|
return pte;
|
|
|
|
#endif /* CONFIG_DEBUG_VM */
|
|
|
|
|
|
|
|
/* If you set _PAGE_EXEC on weird pages you're on your own */
|
|
|
|
pg = maybe_pte_to_page(pte);
|
|
|
|
if (unlikely(!pg))
|
|
|
|
goto bail;
|
|
|
|
|
|
|
|
/* If the page is already clean, we move on */
|
2021-02-03 12:58:11 +08:00
|
|
|
if (test_bit(PG_dcache_clean, &pg->flags))
|
2009-08-19 03:00:34 +08:00
|
|
|
goto bail;
|
|
|
|
|
2021-02-03 12:58:11 +08:00
|
|
|
/* Clean the page and set PG_dcache_clean */
|
2009-08-19 03:00:34 +08:00
|
|
|
flush_dcache_icache_page(pg);
|
2021-02-03 12:58:11 +08:00
|
|
|
set_bit(PG_dcache_clean, &pg->flags);
|
2009-08-19 03:00:34 +08:00
|
|
|
|
|
|
|
bail:
|
2018-10-09 21:51:56 +08:00
|
|
|
return pte_mkexec(pte);
|
2009-08-19 03:00:34 +08:00
|
|
|
}
|
|
|
|
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
/*
|
|
|
|
* set_pte stores a linux PTE into the linux page table.
|
|
|
|
*/
|
2009-08-19 03:00:34 +08:00
|
|
|
void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
|
|
|
|
pte_t pte)
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
{
|
2015-02-13 06:58:22 +08:00
|
|
|
/*
|
2018-09-21 02:09:42 +08:00
|
|
|
* Make sure hardware valid bit is not set. We don't do
|
|
|
|
* tlb flush for this update.
|
2015-02-13 06:58:22 +08:00
|
|
|
*/
|
powerpc/mm: Fix WARN_ON with THP NUMA migration
WARNING: CPU: 12 PID: 4322 at /arch/powerpc/mm/pgtable-book3s64.c:76 set_pmd_at+0x4c/0x2b0
Modules linked in:
CPU: 12 PID: 4322 Comm: qemu-system-ppc Tainted: G W 4.19.0-rc3-00758-g8f0c636b0542 #36
NIP: c0000000000872fc LR: c000000000484eec CTR: 0000000000000000
REGS: c000003fba876fe0 TRAP: 0700 Tainted: G W (4.19.0-rc3-00758-g8f0c636b0542)
MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 24282884 XER: 00000000
CFAR: c000000000484ee8 IRQMASK: 0
GPR00: c000000000484eec c000003fba877268 c000000001f0ec00 c000003fbd229f80
GPR04: 00007c8fe8e00000 c000003f864c5a38 860300853e0000c0 0000000000000080
GPR08: 0000000080000000 0000000000000001 0401000000000080 0000000000000001
GPR12: 0000000000002000 c000003fffff5400 c000003fce292000 00007c9024570000
GPR16: 0000000000000000 0000000000ffffff 0000000000000001 c000000001885950
GPR20: 0000000000000000 001ffffc0004807c 0000000000000008 c000000001f49d05
GPR24: 00007c8fe8e00000 c0000000020f2468 ffffffffffffffff c000003fcd33b090
GPR28: 00007c8fe8e00000 c000003fbd229f80 c000003f864c5a38 860300853e0000c0
NIP [c0000000000872fc] set_pmd_at+0x4c/0x2b0
LR [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
Call Trace:
[c000003fba877268] [c00000000045931c] mpol_misplaced+0x1bc/0x230 (unreliable)
[c000003fba8772c8] [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
[c000003fba877398] [c00000000040d344] __handle_mm_fault+0x5e4/0x2300
[c000003fba8774d8] [c00000000040f400] handle_mm_fault+0x3a0/0x420
[c000003fba877528] [c0000000003ff6f4] __get_user_pages+0x2e4/0x560
[c000003fba877628] [c000000000400314] get_user_pages_unlocked+0x104/0x2a0
[c000003fba8776c8] [c000000000118f44] __gfn_to_pfn_memslot+0x284/0x6a0
[c000003fba877748] [c0000000001463a0] kvmppc_book3s_radix_page_fault+0x360/0x12d0
[c000003fba877838] [c000000000142228] kvmppc_book3s_hv_page_fault+0x48/0x1300
[c000003fba877988] [c00000000013dc08] kvmppc_vcpu_run_hv+0x1808/0x1b50
[c000003fba877af8] [c000000000126b44] kvmppc_vcpu_run+0x34/0x50
[c000003fba877b18] [c000000000123268] kvm_arch_vcpu_ioctl_run+0x288/0x2d0
[c000003fba877b98] [c00000000011253c] kvm_vcpu_ioctl+0x1fc/0x8c0
[c000003fba877d08] [c0000000004e9b24] do_vfs_ioctl+0xa44/0xae0
[c000003fba877db8] [c0000000004e9c44] ksys_ioctl+0x84/0xf0
[c000003fba877e08] [c0000000004e9cd8] sys_ioctl+0x28/0x80
We removed the pte_protnone check earlier with the understanding that we
mark the pte invalid before the set_pte/set_pmd usage. But the huge pmd
autonuma still use the set_pmd_at directly. This is ok because a protnone pte
won't have translation cache in TLB.
Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 00:48:15 +08:00
|
|
|
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
|
2016-04-29 21:25:30 +08:00
|
|
|
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
/* Note: mm->context.id might not yet have been assigned as
|
|
|
|
* this context might not have been activated yet when this
|
|
|
|
* is called.
|
|
|
|
*/
|
2013-09-12 00:44:44 +08:00
|
|
|
pte = set_pte_filter(pte);
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
|
|
|
|
/* Perform the setting of the PTE */
|
|
|
|
__set_pte_at(mm, addr, ptep, pte, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is called when relaxing access to a PTE. It's also called in the page
|
|
|
|
* fault path when we don't hit any of the major fault cases, ie, a minor
|
|
|
|
* update of _PAGE_ACCESSED, _PAGE_DIRTY, etc... The generic code will have
|
|
|
|
* handled those two for us, we additionally deal with missing execute
|
|
|
|
* permission here on some processors
|
|
|
|
*/
|
|
|
|
int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
|
|
|
|
pte_t *ptep, pte_t entry, int dirty)
|
|
|
|
{
|
|
|
|
int changed;
|
2009-08-19 03:00:34 +08:00
|
|
|
entry = set_access_flags_filter(entry, vma, dirty);
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
changed = !pte_same(*(ptep), entry);
|
|
|
|
if (changed) {
|
2018-05-29 22:28:38 +08:00
|
|
|
assert_pte_locked(vma->vm_mm, address);
|
2018-05-29 22:28:40 +08:00
|
|
|
__ptep_set_access_flags(vma, ptep, entry,
|
|
|
|
address, mmu_virtual_psize);
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
}
|
|
|
|
return changed;
|
|
|
|
}
|
|
|
|
|
2018-05-29 22:28:38 +08:00
|
|
|
#ifdef CONFIG_HUGETLB_PAGE
|
2018-10-31 22:24:11 +08:00
|
|
|
int huge_ptep_set_access_flags(struct vm_area_struct *vma,
|
|
|
|
unsigned long addr, pte_t *ptep,
|
|
|
|
pte_t pte, int dirty)
|
2018-05-29 22:28:38 +08:00
|
|
|
{
|
|
|
|
#ifdef HUGETLB_NEED_PRELOAD
|
|
|
|
/*
|
|
|
|
* The "return 1" forces a call of update_mmu_cache, which will write a
|
|
|
|
* TLB entry. Without this, platforms that don't do a write of the TLB
|
|
|
|
* entry in the TLB miss handler asm will fault ad infinitum.
|
|
|
|
*/
|
|
|
|
ptep_set_access_flags(vma, addr, ptep, pte, dirty);
|
|
|
|
return 1;
|
|
|
|
#else
|
2018-05-29 22:28:40 +08:00
|
|
|
int changed, psize;
|
2018-05-29 22:28:38 +08:00
|
|
|
|
|
|
|
pte = set_access_flags_filter(pte, vma, dirty);
|
|
|
|
changed = !pte_same(*(ptep), pte);
|
|
|
|
if (changed) {
|
2018-05-29 22:28:40 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_PPC_BOOK3S_64
|
2018-06-01 16:24:24 +08:00
|
|
|
struct hstate *h = hstate_vma(vma);
|
|
|
|
|
|
|
|
psize = hstate_get_psize(h);
|
|
|
|
#ifdef CONFIG_DEBUG_VM
|
|
|
|
assert_spin_locked(huge_pte_lockptr(h, vma->vm_mm, ptep));
|
|
|
|
#endif
|
|
|
|
|
2018-05-29 22:28:40 +08:00
|
|
|
#else
|
|
|
|
/*
|
2020-05-19 13:49:06 +08:00
|
|
|
* Not used on non book3s64 platforms.
|
|
|
|
* 8xx compares it with mmu_virtual_psize to
|
|
|
|
* know if it is a huge page or not.
|
2018-05-29 22:28:40 +08:00
|
|
|
*/
|
2020-05-19 13:49:06 +08:00
|
|
|
psize = MMU_PAGE_COUNT;
|
2018-05-29 22:28:38 +08:00
|
|
|
#endif
|
2018-05-29 22:28:40 +08:00
|
|
|
__ptep_set_access_flags(vma, ptep, pte, addr, psize);
|
2018-05-29 22:28:38 +08:00
|
|
|
}
|
|
|
|
return changed;
|
|
|
|
#endif
|
|
|
|
}
|
2020-05-19 13:49:06 +08:00
|
|
|
|
|
|
|
#if defined(CONFIG_PPC_8xx)
|
|
|
|
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
|
|
|
|
{
|
2020-06-11 05:34:02 +08:00
|
|
|
pmd_t *pmd = pmd_off(mm, addr);
|
powerpc/8xx: Manage 512k huge pages as standard pages.
At the time being, 512k huge pages are handled through hugepd page
tables. The PMD entry is flagged as a hugepd pointer and it
means that only 512k hugepages can be managed in that 4M block.
However, the hugepd table has the same size as a normal page
table, and 512k entries can therefore be nested with normal pages.
On the 8xx, TLB loading is performed by software and allthough the
page tables are organised to match the L1 and L2 level defined by
the HW, all TLB entries have both L1 and L2 independent entries.
It means that even if two TLB entries are associated with the same
PMD entry, they can be loaded with different values in L1 part.
The L1 entry contains the page size (PS field):
- 00 for 4k and 16 pages
- 01 for 512k pages
- 11 for 8M pages
By adding a flag for hugepages in the PTE (_PAGE_HUGE) and copying it
into the lower bit of PS, we can then manage 512k pages with normal
page tables:
- PMD entry has PS=11 for 8M pages
- PMD entry has PS=00 for other pages.
As a PMD entry covers 4M areas, a PMD will either point to a hugepd
table having a single entry to an 8M page, or the PMD will point to
a standard page table which will have either entries to 4k or 16k or
512k pages. For 512k pages, as the L1 entry will not know it is a
512k page before the PTE is read, there will be 128 entries in the
PTE as if it was 4k pages. But when loading the TLB, it will be
flagged as a 512k page.
Note that we can't use pmd_ptr() in asm/nohash/32/pgtable.h because
it is not defined yet.
In ITLB miss, we keep the possibility to opt it out as when kernel
text is pinned and no user hugepages are used, we can save several
instruction by not using r11.
In DTLB miss, that's just one instruction so it's not worth bothering
with it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/002819e8e166bf81d24b24782d98de7c40905d8f.1589866984.git.christophe.leroy@csgroup.eu
2020-05-19 13:49:09 +08:00
|
|
|
pte_basic_t val;
|
|
|
|
pte_basic_t *entry = &ptep->pte;
|
2020-08-31 16:30:43 +08:00
|
|
|
int num, i;
|
powerpc/8xx: Manage 512k huge pages as standard pages.
At the time being, 512k huge pages are handled through hugepd page
tables. The PMD entry is flagged as a hugepd pointer and it
means that only 512k hugepages can be managed in that 4M block.
However, the hugepd table has the same size as a normal page
table, and 512k entries can therefore be nested with normal pages.
On the 8xx, TLB loading is performed by software and allthough the
page tables are organised to match the L1 and L2 level defined by
the HW, all TLB entries have both L1 and L2 independent entries.
It means that even if two TLB entries are associated with the same
PMD entry, they can be loaded with different values in L1 part.
The L1 entry contains the page size (PS field):
- 00 for 4k and 16 pages
- 01 for 512k pages
- 11 for 8M pages
By adding a flag for hugepages in the PTE (_PAGE_HUGE) and copying it
into the lower bit of PS, we can then manage 512k pages with normal
page tables:
- PMD entry has PS=11 for 8M pages
- PMD entry has PS=00 for other pages.
As a PMD entry covers 4M areas, a PMD will either point to a hugepd
table having a single entry to an 8M page, or the PMD will point to
a standard page table which will have either entries to 4k or 16k or
512k pages. For 512k pages, as the L1 entry will not know it is a
512k page before the PTE is read, there will be 128 entries in the
PTE as if it was 4k pages. But when loading the TLB, it will be
flagged as a 512k page.
Note that we can't use pmd_ptr() in asm/nohash/32/pgtable.h because
it is not defined yet.
In ITLB miss, we keep the possibility to opt it out as when kernel
text is pinned and no user hugepages are used, we can save several
instruction by not using r11.
In DTLB miss, that's just one instruction so it's not worth bothering
with it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/002819e8e166bf81d24b24782d98de7c40905d8f.1589866984.git.christophe.leroy@csgroup.eu
2020-05-19 13:49:09 +08:00
|
|
|
|
2020-05-19 13:49:06 +08:00
|
|
|
/*
|
|
|
|
* Make sure hardware valid bit is not set. We don't do
|
|
|
|
* tlb flush for this update.
|
|
|
|
*/
|
|
|
|
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
|
|
|
|
|
|
|
|
pte = set_pte_filter(pte);
|
|
|
|
|
powerpc/8xx: Manage 512k huge pages as standard pages.
At the time being, 512k huge pages are handled through hugepd page
tables. The PMD entry is flagged as a hugepd pointer and it
means that only 512k hugepages can be managed in that 4M block.
However, the hugepd table has the same size as a normal page
table, and 512k entries can therefore be nested with normal pages.
On the 8xx, TLB loading is performed by software and allthough the
page tables are organised to match the L1 and L2 level defined by
the HW, all TLB entries have both L1 and L2 independent entries.
It means that even if two TLB entries are associated with the same
PMD entry, they can be loaded with different values in L1 part.
The L1 entry contains the page size (PS field):
- 00 for 4k and 16 pages
- 01 for 512k pages
- 11 for 8M pages
By adding a flag for hugepages in the PTE (_PAGE_HUGE) and copying it
into the lower bit of PS, we can then manage 512k pages with normal
page tables:
- PMD entry has PS=11 for 8M pages
- PMD entry has PS=00 for other pages.
As a PMD entry covers 4M areas, a PMD will either point to a hugepd
table having a single entry to an 8M page, or the PMD will point to
a standard page table which will have either entries to 4k or 16k or
512k pages. For 512k pages, as the L1 entry will not know it is a
512k page before the PTE is read, there will be 128 entries in the
PTE as if it was 4k pages. But when loading the TLB, it will be
flagged as a 512k page.
Note that we can't use pmd_ptr() in asm/nohash/32/pgtable.h because
it is not defined yet.
In ITLB miss, we keep the possibility to opt it out as when kernel
text is pinned and no user hugepages are used, we can save several
instruction by not using r11.
In DTLB miss, that's just one instruction so it's not worth bothering
with it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/002819e8e166bf81d24b24782d98de7c40905d8f.1589866984.git.christophe.leroy@csgroup.eu
2020-05-19 13:49:09 +08:00
|
|
|
val = pte_val(pte);
|
2020-08-31 16:30:43 +08:00
|
|
|
|
|
|
|
num = number_of_cells_per_pte(pmd, val, 1);
|
|
|
|
|
powerpc/8xx: Manage 512k huge pages as standard pages.
At the time being, 512k huge pages are handled through hugepd page
tables. The PMD entry is flagged as a hugepd pointer and it
means that only 512k hugepages can be managed in that 4M block.
However, the hugepd table has the same size as a normal page
table, and 512k entries can therefore be nested with normal pages.
On the 8xx, TLB loading is performed by software and allthough the
page tables are organised to match the L1 and L2 level defined by
the HW, all TLB entries have both L1 and L2 independent entries.
It means that even if two TLB entries are associated with the same
PMD entry, they can be loaded with different values in L1 part.
The L1 entry contains the page size (PS field):
- 00 for 4k and 16 pages
- 01 for 512k pages
- 11 for 8M pages
By adding a flag for hugepages in the PTE (_PAGE_HUGE) and copying it
into the lower bit of PS, we can then manage 512k pages with normal
page tables:
- PMD entry has PS=11 for 8M pages
- PMD entry has PS=00 for other pages.
As a PMD entry covers 4M areas, a PMD will either point to a hugepd
table having a single entry to an 8M page, or the PMD will point to
a standard page table which will have either entries to 4k or 16k or
512k pages. For 512k pages, as the L1 entry will not know it is a
512k page before the PTE is read, there will be 128 entries in the
PTE as if it was 4k pages. But when loading the TLB, it will be
flagged as a 512k page.
Note that we can't use pmd_ptr() in asm/nohash/32/pgtable.h because
it is not defined yet.
In ITLB miss, we keep the possibility to opt it out as when kernel
text is pinned and no user hugepages are used, we can save several
instruction by not using r11.
In DTLB miss, that's just one instruction so it's not worth bothering
with it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/002819e8e166bf81d24b24782d98de7c40905d8f.1589866984.git.christophe.leroy@csgroup.eu
2020-05-19 13:49:09 +08:00
|
|
|
for (i = 0; i < num; i++, entry++, val += SZ_4K)
|
|
|
|
*entry = val;
|
2020-05-19 13:49:06 +08:00
|
|
|
}
|
|
|
|
#endif
|
2018-05-29 22:28:38 +08:00
|
|
|
#endif /* CONFIG_HUGETLB_PAGE */
|
|
|
|
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
#ifdef CONFIG_DEBUG_VM
|
|
|
|
void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
|
|
|
|
{
|
|
|
|
pgd_t *pgd;
|
2020-06-05 07:46:44 +08:00
|
|
|
p4d_t *p4d;
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
pud_t *pud;
|
|
|
|
pmd_t *pmd;
|
|
|
|
|
|
|
|
if (mm == &init_mm)
|
|
|
|
return;
|
|
|
|
pgd = mm->pgd + pgd_index(addr);
|
|
|
|
BUG_ON(pgd_none(*pgd));
|
2020-06-05 07:46:44 +08:00
|
|
|
p4d = p4d_offset(pgd, addr);
|
|
|
|
BUG_ON(p4d_none(*p4d));
|
|
|
|
pud = pud_offset(p4d, addr);
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
BUG_ON(pud_none(*pud));
|
|
|
|
pmd = pmd_offset(pud, addr);
|
2013-06-20 17:00:24 +08:00
|
|
|
/*
|
|
|
|
* khugepaged to collapse normal pages to hugepage, first set
|
2020-06-09 12:33:54 +08:00
|
|
|
* pmd to none to force page fault/gup to take mmap_lock. After
|
2013-06-20 17:00:24 +08:00
|
|
|
* pmd is set to none, we do a pte_clear which does this assertion
|
|
|
|
* so if we find pmd none, return.
|
|
|
|
*/
|
|
|
|
if (pmd_none(*pmd))
|
|
|
|
return;
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
BUG_ON(!pmd_present(*pmd));
|
2009-08-18 23:21:40 +08:00
|
|
|
assert_spin_locked(pte_lockptr(mm, pmd));
|
powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 00:02:37 +08:00
|
|
|
}
|
|
|
|
#endif /* CONFIG_DEBUG_VM */
|
|
|
|
|
2016-02-15 09:55:03 +08:00
|
|
|
unsigned long vmalloc_to_phys(void *va)
|
|
|
|
{
|
|
|
|
unsigned long pfn = vmalloc_to_pfn(va);
|
|
|
|
|
|
|
|
BUG_ON(!pfn);
|
|
|
|
return __pa(pfn_to_kaddr(pfn)) + offset_in_page(va);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(vmalloc_to_phys);
|
2019-04-26 13:59:41 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We have 4 cases for pgds and pmds:
|
|
|
|
* (1) invalid (all zeroes)
|
|
|
|
* (2) pointer to next table, as normal; bottom 6 bits == 0
|
|
|
|
* (3) leaf pte for huge page _PAGE_PTE set
|
|
|
|
* (4) hugepd pointer, _PAGE_PTE = 0 and bits [2..6] indicate size of table
|
|
|
|
*
|
|
|
|
* So long as we atomically load page table pointers we are safe against teardown,
|
|
|
|
* we can follow the address down to the the page and take a ref on it.
|
|
|
|
* This function need to be called with interrupts disabled. We use this variant
|
|
|
|
* when we have MSR[EE] = 0 but the paca->irq_soft_mask = IRQS_ENABLED
|
|
|
|
*/
|
|
|
|
pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
|
|
|
|
bool *is_thp, unsigned *hpage_shift)
|
|
|
|
{
|
2020-06-05 07:46:44 +08:00
|
|
|
pgd_t *pgdp;
|
|
|
|
p4d_t p4d, *p4dp;
|
2019-04-26 13:59:41 +08:00
|
|
|
pud_t pud, *pudp;
|
|
|
|
pmd_t pmd, *pmdp;
|
|
|
|
pte_t *ret_pte;
|
|
|
|
hugepd_t *hpdp = NULL;
|
2020-06-05 07:46:44 +08:00
|
|
|
unsigned pdshift;
|
2019-04-26 13:59:41 +08:00
|
|
|
|
|
|
|
if (hpage_shift)
|
|
|
|
*hpage_shift = 0;
|
|
|
|
|
|
|
|
if (is_thp)
|
|
|
|
*is_thp = false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Always operate on the local stack value. This make sure the
|
|
|
|
* value don't get updated by a parallel THP split/collapse,
|
|
|
|
* page fault or a page unmap. The return pte_t * is still not
|
|
|
|
* stable. So should be checked there for above conditions.
|
2020-06-05 07:46:44 +08:00
|
|
|
* Top level is an exception because it is folded into p4d.
|
2019-04-26 13:59:41 +08:00
|
|
|
*/
|
2020-06-05 07:46:44 +08:00
|
|
|
pgdp = pgdir + pgd_index(ea);
|
|
|
|
p4dp = p4d_offset(pgdp, ea);
|
|
|
|
p4d = READ_ONCE(*p4dp);
|
|
|
|
pdshift = P4D_SHIFT;
|
|
|
|
|
|
|
|
if (p4d_none(p4d))
|
2019-04-26 13:59:41 +08:00
|
|
|
return NULL;
|
2019-04-26 13:59:51 +08:00
|
|
|
|
2020-06-05 07:46:44 +08:00
|
|
|
if (p4d_is_leaf(p4d)) {
|
|
|
|
ret_pte = (pte_t *)p4dp;
|
2019-04-26 13:59:41 +08:00
|
|
|
goto out;
|
2019-04-26 13:59:51 +08:00
|
|
|
}
|
2019-05-14 14:03:00 +08:00
|
|
|
|
2020-06-05 07:46:44 +08:00
|
|
|
if (is_hugepd(__hugepd(p4d_val(p4d)))) {
|
|
|
|
hpdp = (hugepd_t *)&p4d;
|
2019-04-26 13:59:51 +08:00
|
|
|
goto out_huge;
|
|
|
|
}
|
2019-04-26 13:59:41 +08:00
|
|
|
|
2019-04-26 13:59:53 +08:00
|
|
|
/*
|
|
|
|
* Even if we end up with an unmap, the pgtable will not
|
|
|
|
* be freed, because we do an rcu free and here we are
|
|
|
|
* irq disabled
|
|
|
|
*/
|
|
|
|
pdshift = PUD_SHIFT;
|
2020-06-05 07:46:44 +08:00
|
|
|
pudp = pud_offset(&p4d, ea);
|
2019-04-26 13:59:53 +08:00
|
|
|
pud = READ_ONCE(*pudp);
|
2019-04-26 13:59:51 +08:00
|
|
|
|
2019-04-26 13:59:53 +08:00
|
|
|
if (pud_none(pud))
|
|
|
|
return NULL;
|
2019-04-26 13:59:52 +08:00
|
|
|
|
2019-05-14 14:03:00 +08:00
|
|
|
if (pud_is_leaf(pud)) {
|
2019-04-26 13:59:53 +08:00
|
|
|
ret_pte = (pte_t *)pudp;
|
|
|
|
goto out;
|
2019-04-26 13:59:41 +08:00
|
|
|
}
|
2019-05-14 14:03:00 +08:00
|
|
|
|
2019-04-26 13:59:53 +08:00
|
|
|
if (is_hugepd(__hugepd(pud_val(pud)))) {
|
|
|
|
hpdp = (hugepd_t *)&pud;
|
|
|
|
goto out_huge;
|
|
|
|
}
|
2019-05-14 14:03:00 +08:00
|
|
|
|
2019-04-26 13:59:53 +08:00
|
|
|
pdshift = PMD_SHIFT;
|
|
|
|
pmdp = pmd_offset(&pud, ea);
|
|
|
|
pmd = READ_ONCE(*pmdp);
|
2019-06-07 11:56:36 +08:00
|
|
|
|
2019-04-26 13:59:53 +08:00
|
|
|
/*
|
2019-06-07 11:56:36 +08:00
|
|
|
* A hugepage collapse is captured by this condition, see
|
|
|
|
* pmdp_collapse_flush.
|
2019-04-26 13:59:53 +08:00
|
|
|
*/
|
|
|
|
if (pmd_none(pmd))
|
|
|
|
return NULL;
|
|
|
|
|
2019-06-07 11:56:36 +08:00
|
|
|
#ifdef CONFIG_PPC_BOOK3S_64
|
|
|
|
/*
|
|
|
|
* A hugepage split is captured by this condition, see
|
|
|
|
* pmdp_invalidate.
|
|
|
|
*
|
|
|
|
* Huge page modification can be caught here too.
|
|
|
|
*/
|
|
|
|
if (pmd_is_serializing(pmd))
|
|
|
|
return NULL;
|
|
|
|
#endif
|
|
|
|
|
2019-04-26 13:59:53 +08:00
|
|
|
if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
|
|
|
|
if (is_thp)
|
|
|
|
*is_thp = true;
|
|
|
|
ret_pte = (pte_t *)pmdp;
|
|
|
|
goto out;
|
|
|
|
}
|
2019-05-14 14:03:00 +08:00
|
|
|
|
|
|
|
if (pmd_is_leaf(pmd)) {
|
2019-04-26 13:59:53 +08:00
|
|
|
ret_pte = (pte_t *)pmdp;
|
|
|
|
goto out;
|
|
|
|
}
|
2019-05-14 14:03:00 +08:00
|
|
|
|
2019-04-26 13:59:53 +08:00
|
|
|
if (is_hugepd(__hugepd(pmd_val(pmd)))) {
|
|
|
|
hpdp = (hugepd_t *)&pmd;
|
|
|
|
goto out_huge;
|
|
|
|
}
|
|
|
|
|
|
|
|
return pte_offset_kernel(&pmd, ea);
|
|
|
|
|
2019-04-26 13:59:51 +08:00
|
|
|
out_huge:
|
2019-04-26 13:59:41 +08:00
|
|
|
if (!hpdp)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
ret_pte = hugepte_offset(*hpdp, ea, pdshift);
|
|
|
|
pdshift = hugepd_shift(*hpdp);
|
|
|
|
out:
|
|
|
|
if (hpage_shift)
|
|
|
|
*hpage_shift = pdshift;
|
|
|
|
return ret_pte;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(__find_linux_pte);
|