This introduce memblock.current_limit which is used to limit allocations
from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE).
The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still
be used with memblock_alloc_base() to allocate really anywhere.
It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears.
Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I
strongly recommend that you ensure that you set an appropriate limit
during boot in order to guarantee that an memblock_alloc() at any time
results in something that is accessible with a simple __va().
The reason is that a subsequent patch will introduce the ability for
the array to resize itself by reallocating itself. The MEMBLOCK core will
honor the current limit when performing those allocations.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Oooops... we missed these. We incorrectly converted strings
used when parsing the device-tree on pseries, thus breaking
access to drconf memory and hotplug memory.
While at it, also revert some variable names that represent
something the FW calls "lmb" and thus don't need to be converted
to "memblock".
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
This adds some debug output to our MMU hash code to print out some
useful debug data if the hypervisor refuses the insertion (which
should normally never happen).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
There's a couple of nasty bugs lurking in our huge page hashing code.
First, we don't check the access permission atomically with setting
the _PAGE_BUSY bit, which means that the PTE value we end up using
for the hashing might be different than the one we have checked
the access permissions for.
We've seen cases where that leads us to try to use an invalidated
PTE for hashing, causing all sort of "interesting" issues.
Then, we also failed to set _PAGE_DIRTY on a write access.
Finally, a minor tweak but we should return 0 when we find the
PTE busy, in order to just re-execute the access, rather than 1
which means going to do_page_fault().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
Instead of adding _PAGE_PRESENT to the access permission mask
in each low level routine independently, we add it once from
hash_page().
We also move the preliminary access check (the racy one before
the PTE is locked) up so it applies to the huge page case. This
duplicates code in __hash_page_huge() which we'll remove in a
subsequent patch to fix a race in there.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If the hypervisor gives us an error on a hugepage insert we panic. The
normal page code already handles this by returning an error instead and we end
calling low_hash_fault which will just kill the task if possible.
The patch below does a similar thing for the hugepage case.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
via following scripts
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/lmb/memblock/g' \
-e 's/LMB/MEMBLOCK/g' \
$FILES
for N in $(find . -name lmb.[ch]); do
M=$(echo $N | sed 's/lmb/memblock/g')
mv $N $M
done
and remove some wrong change like lmbench and dlmb etc.
also move memblock.c from lib/ to mm/
Suggested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use the MMU config registers to scan for available direct and
indirect page sizes and print out the result. Will be needed
for future hugetlbfs implementation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We patch the TLB miss exception vectors to point to alternate
functions when using HW page table on BookE.
However, we were patching in a new branch in the first instruction
of the exception handler instead of the second one, thus overriding
the nop that is in the first instruction.
This cause problems when single stepping as we rely on that nop for
the single step to stop properly within the exception vector range
rather than on the target of the branch.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CONFIG_SMP_750 doesn't exist in Kconfig, therefore removing all
references for it from the source code.
Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Form 1 affinity allows multiple entries in ibm,associativity-reference-points
which represent affinity domains in decreasing order of importance. The
Linux concept of a node is always the first entry, but using the other
values as an input to node_distance() allows the memory allocator to make
better decisions on which node to go first when local memory has been
exhausted.
We keep things simple and create an array indexed by NUMA node, capped at
4 entries. Each time we lookup an associativity property we initialise
the array which is overkill, but since we should only hit this path during
boot it didn't seem worth adding a per node valid bit.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove all rcu head inits. We don't care about the RCU head state before
passing it to call_rcu() anyway. Only leave the "on_stack" variants so
debugobjects can keep track of objects on stack.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are no BATS on BookE - we have the TLBCAM instead. Also correct
the page size information to included extended sizes. We don't actually allow
a 4G page size to be used, so comment on that as well.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CONFIG_HIGHPTE doesn't exist in Kconfig, therefore removing all
references for it from the source code.
Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
KVM: x86: Add missing locking to arch specific vcpu ioctls
KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
KVM: MMU: Segregate shadow pages with different cr0.wp
KVM: x86: Check LMA bit before set_efer
KVM: Don't allow lmsw to clear cr0.pe
KVM: Add cpuid.txt file
KVM: x86: Tell the guest we'll warn it about tsc stability
x86, paravirt: don't compute pvclock adjustments if we trust the tsc
x86: KVM guest: Try using new kvm clock msrs
KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
KVM: x86: add new KVMCLOCK cpuid feature
KVM: x86: change msr numbers for kvmclock
x86, paravirt: Add a global synchronization point for pvclock
x86, paravirt: Enable pvclock flags in vcpu_time_info structure
KVM: x86: Inject #GP with the right rip on efer writes
KVM: SVM: Don't allow nested guest to VMMCALL into host
KVM: x86: Fix exception reinjection forced to true
KVM: Fix wallclock version writing race
KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
...
I've been told that the architected way to determine we are in form 1
affinity mode is by reading the ibm,architecture-vec-5 property which
mirrors the layout of the fifth vector of the ibm,client-architecture
structure.
Eventually we may want to parse the ibm,architecture-vec-5 and create
FW_FEATURE_* bits.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we build with ftrace enabled its possible that loadcam_entry would
have used the stack pointer (even though the code doesn't need it). We
call loadcam_entry in __secondary_start before the stack is setup. To
ensure that loadcam_entry doesn't use the stack pointer the easiest
solution is to just have it in asm code.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We need to reserve a context from KVM to make sure we have our own
segment space. While we did that split for Book3S_64 already, 32 bit
is still outstanding.
So let's split it now.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Convert NUMA code to new cpumask API. We shift the node to cpumask
setup code until after we complete bootmem allocation so we can
dynamically allocate the cpumasks.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As explained in commit 1c0fe6e3bd, we want to call the architecture independent
oom killer when getting an unexplained OOM from handle_mm_fault, rather than
simply killing current.
Cc: linuxppc-dev@ozlabs.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to keep track of the backing pages that get allocated by
vmemmap_populate() so that when we use kdump, the dump-capture kernel knows
where these pages are.
We use a simple linked list of structures that contain the physical address
of the backing page and corresponding virtual address to track the backing
pages.
To save space, we just use a pointer to the next struct vmemmap_backing. We
can also do this because we never remove nodes. We call the pointer "list"
to be compatible with changes made to the crash utility.
vmemmap_populate() is called either at boot-time or on a memory hotplug
operation. We don't have to worry about the boot-time calls because they
will be inherently single-threaded, and for a memory hotplug operation
vmemmap_populate() is called through:
sparse_add_one_section()
|
V
kmalloc_section_memmap()
|
V
sparse_mem_map_populate()
|
V
vmemmap_populate()
and in sparse_add_one_section() we're protected by pgdat_resize_lock().
So, we don't need a spinlock to protect the vmemmap_list.
We allocate space for the vmemmap_backing structs by allocating whole pages
in vmemmap_list_alloc() and then handing out chunks of this to
vmemmap_list_populate().
This means that we waste at most just under one page, but this keeps the code
is simple.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
So we tried to speed things up a bit using flush_hash_pages() directly
but that falls over on 603 of course meaning we fail to flush the TLB
properly and we may even end up having it corrupt memory randomly by
accessing a hash table that doesn't exist.
This removes the "optimization" by always going through flush_tlb_page()
for now at least.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds the base support for the 476 processor. The code was
primarily written by Ben Herrenschmidt and Torez Smith, but I've been
maintaining it for a while.
The goal is to have a single binary that will run on 44x and 47x, but
we still have some details to work out. The biggest is that the L1 cache
line size differs on the two platforms, but it's currently a compile-time
option.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The bypassing of this test is a leftover from 2.4 vintage
kernels, and is no longer appropriate, or even used by KGDB.
Currently KGDB uses probe_kernel_write() for all access to
memory via the KGDB core, so it can simply be deleted.
This fixes CVE-2010-1446.
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Wufei <fei.wu@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Firmware changed the way it represents memory and cpu affinity on POWER7.
Unfortunately the old method now caps the topology to work around issues
with legacy operating systems. For Linux to get the correct topology we
need to use the new form 1 affinity information.
We set the form 1 field in the client architecture, and if we see "1" in the
ibm,associativity-form property firmware supports form 1 affinity and
we should look at the first field in the ibm,associativity-reference-points
array. If not we use the second field as we always have.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The code was looking for this in cpu_features, not mmu_features. Fix this.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Data address breakpoint exceptions are currently handled along with page-faults
which require interrupts to remain in enabled state. Since exception handling
for data breakpoints aren't pre-empt safe, we handle them separately.
Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We can't just clear the user read permission in book3e pte, because
that will also clear supervisor read permission. This surely isn't
desired. Fix the problem by adding the supervisor read back.
BenH: Slightly simplified the ifdef and applied to ppc64 too
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
powerpc initializes swiotlb before parsing the kernel boot options so
swiotlb options (e.g. specifying the swiotlb buffer size) are ignored.
Any time before freeing bootmem works for swiotlb so this patch moves
powerpc's swiotlb initialization after parsing the kernel boot
options, mem_init (as x86 does).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Becky Bruce <beckyb@kernel.crashing.org>
Tested-by: Albert Herranz <albert_herranz@yahoo.es>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'. Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
Cc: Geoff Levand <geoffrey.levand@am.sony.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies. We do this via make_coherent() by making the pages
uncacheable.
This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().
Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():
On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
to construct a pointer to the pte again. Passing a pte_t * is much
more elegant. Maybe we might even replace the pte argument with the
pte_t?
Ben Herrenschmidt would also like the pte pointer for PowerPC:
Passing the ptep in there is exactly what I want. I want that
-instead- of the PTE value, because I have issue on some ppc cases,
for I$/D$ coherency, where set_pte_at() may decide to mask out the
_PAGE_EXEC.
So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.
Includes a fix from Stephen Rothwell:
sparc: fix fallout from update_mmu_cache API change
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
tlbivax_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
native_tlbie_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
context_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now we have real bit locks use them instead of open coding it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds support for boards with more that 512MByte RAM. Currently
only 512MB of memory are enabled in the DCCR/ICCR real-mode cache
control registers. This patch now enables caching in real-mode for
2GByte.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Commit f71dc176aa 'Make
hpte_need_flush() correctly mask for multiple page sizes' introduced
bug, which is triggered when a kernel with a 64k base page size is run
on a system whose hardware does not 64k hash PTEs. In this case, we
emulate 64k pages with multiple 4k hash PTEs, however in
hpte_need_flush() we incorrectly only mask the hardware page size from
the address, instead of the logical page size. This causes things to
go wrong when we later attempt to iterate through the hardware
subpages of the logical page.
This patch corrects the error. It has been tested on pSeries bare
metal by Michael Neuling.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We can use the much more lightweight ida allocator since we don't
need the pointer storage idr provides.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make sure compiler won't do weird things with limits. E.g. fetching
them twice may return 2 different values after writable limits are
implemented.
I.e. either use rlimit helpers added in
3e10e716ab
or ACCESS_ONCE if not applicable.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit d28513bc7f ("Fix bug in pagetable
cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for
breakage caused by an earlier clean up patch of mine, contains a
stupid bug. I changed the parameters of the subpage_protection()
function, but failed to update one of the callers.
This patch fixes it, and replaces a void * with a typed pointer so
that the compiler will warn on such an error in future.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The function name of cpumask_clear_cpu was not correct. Fortunately
nobody uses that code with hotplug yet :-)
Reported-by: Jin Qing <b24347@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This time without the funny characters.
Fix following build errors generated with DEBUG=1
cc1: warnings being treated as errors
arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes':
arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int'
arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize':
arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
... SNIP ...
Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Set need to call __set_pte_at() and not set_pte_at() from __change_page_attr()
since the later will perform checks with CONFIG_DEBUG_VM that aren't suitable
to the way we override an existing PTE. (More specifically, it doesn't let
you write over a present PTE).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'next' of git://git.secretlab.ca/git/linux-2.6: (23 commits)
powerpc: fix up for mmu_mapin_ram api change
powerpc: wii: allow ioremap within the memory hole
powerpc: allow ioremap within reserved memory regions
wii: use both mem1 and mem2 as ram
wii: bootwrapper: add fixup to calc useable mem2
powerpc: gamecube/wii: early debugging using usbgecko
powerpc: reserve fixmap entries for early debug
powerpc: wii: default config
powerpc: wii: platform support
powerpc: wii: hollywood interrupt controller support
powerpc: broadway processor support
powerpc: wii: bootwrapper bits
powerpc: wii: device tree
powerpc: gamecube: default config
powerpc: gamecube: platform support
powerpc: gamecube/wii: flipper interrupt controller support
powerpc: gamecube/wii: udbg support for usbgecko
powerpc: gamecube/wii: do not include PCI support
powerpc: gamecube/wii: declare as non-coherent platforms
powerpc: gamecube/wii: introduce GAMECUBE_COMMON
...
Fix up conflicts in arch/powerpc/mm/fsl_booke_mmu.c.
Hopefully even close to correctly.
Today's linux-next build (powerpc ppc44x_defconfig) failed like this:
arch/powerpc/mm/pgtable_32.c: In function 'mapin_ram':
arch/powerpc/mm/pgtable_32.c:318: error: too many arguments to function 'mmu_mapin_ram'
Casued by commit de32400dd2 ("wii: use both
mem1 and mem2 as ram").
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Add a flag to let a platform ioremap memory regions marked as reserved.
This flag will be used later by the Nintendo Wii support code to allow
ioremapping the I/O region sitting between MEM1 and MEM2 and marked
as reserved RAM in the patch "wii: use both mem1 and mem2 as ram".
This will no longer be needed when proper discontig memory support
for 32-bit PowerPC is added to the kernel.
Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The Nintendo Wii video game console has two discontiguous RAM regions:
- MEM1: 24MB @ 0x00000000
- MEM2: 64MB @ 0x10000000
Unfortunately, the kernel currently does not support discontiguous RAM
memory regions on 32-bit PowerPC platforms.
This patch adds a series of workarounds to allow the use of the second
memory region (MEM2) as RAM by the kernel.
Basically, a single range of memory from the beginning of MEM1 to the
end of MEM2 is reported to the kernel, and a memory reservation is
created for the hole between MEM1 and MEM2.
With this patch the system is able to use all the available RAM and not
just ~27% of it.
This will no longer be needed when proper discontig memory support
for 32-bit PowerPC is added to the kernel.
Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
8xx sometimes need to load a invalid/non-present TLBs in
it DTLB asm handler.
These must be invalidated separaly as linux mm don't.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit a0668cdc15 cleans up the handling
of kmem_caches for allocating various levels of pagetables.
Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to
the latter's cleverly hidden technique of adding some extra allocation
space to the top level page directory to store the extra information
it needs.
Since that extra allocation really doesn't fit into the cleaned up
page directory allocating scheme, this patch alters
CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct
subpage_prot_table as part of the mm_context_t.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This reverts commit c045256d14.
It breaks build when CONFIG_PPC_SUBPAGE_PROT is not set. I will
commit a fixed version separately
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit a4fe3ce769 introduced a new
get_user_pages() path for hugepages on powerpc. Unfortunately, there
is a bug in it's loop logic, which can cause it to overrun the end of
the intended region. This came about by copying the logic from the
normal page path, which assumes the address and end parameters have
been pagesize aligned at the top-level. Since they're not *hugepage*
size aligned, the simplistic logic could step over the end of the gup
region without triggering the loop end condition.
This patch fixes the bug by using the technique that the normal page
path uses in levels above the lowest to truncate the ending address to
something we know we'll match with.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit a0668cdc15 cleans up the handling
of kmem_caches for allocating various levels of pagetables.
Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to
the latter's cleverly hidden technique of adding some extra allocation
space to the top level page directory to store the extra information
it needs.
Since that extra allocation really doesn't fit into the cleaned up
page directory allocating scheme, this patch alters
CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct
subpage_prot_table as part of the mm_context_t.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Re-write the code so its more standalone and fixed some issues:
* Bump'd # of CAM entries to 64 to support e500mc
* Make the code handle MAS7 properly
* Use pr_cont instead of creating a string as we go
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
For KVM we need to allocate a new context id, but don't really care about
all the mm context around it.
So let's split the alloc and destroy functions for the context id, so we can
grab one without allocating an mm context.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.
This patch exports all functions and variables that are required for KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I inadvertently left that debug code enabled, causing the number of
contexts to be clamped to 31 which is going to slow things down on
4xx and just plain breaks 8xx
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The hugepage arch code provides a number of hook functions/macros
which mirror the functionality of various normal page pte access
functions. Various changes in the normal page accessors (in
particular BenH's recent changes to the handling of lazy icache
flushing and PAGE_EXEC) have caused the hugepage versions to get out
of sync with the originals. In some cases, this is a bug, at least on
some MMU types.
One of the reasons that some hooks were not identical to the normal
page versions, is that the fact we're dealing with a hugepage needed
to be passed down do use the correct dcache-icache flush function.
This patch makes the main flush_dcache_icache_page() function hugepage
aware (by checking for the PageCompound flag). That in turn means we
can make set_huge_pte_at() just a call to set_pte_at() bringing it
back into sync. As a bonus, this lets us remove the
hash_huge_page_do_lazy_icache() function, replacing it with a call to
the hash_page_do_lazy_icache() function it was based on.
Some other hugepage pte access hooks - huge_ptep_get_and_clear() and
huge_ptep_clear_flush() - are not so easily unified, but this patch at
least brings them back into sync with the current versions of the
corresponding normal page functions.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch separates the parts of hugetlbpage.c which are inherently
specific to the hash MMU into a new hugelbpage-hash64.c file.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch simplifies the logic used to initialize hugepages on
powerpc. The somewhat oddly named set_huge_psize() is renamed to
add_huge_page_size() and now does all necessary verification of
whether it's given a valid hugepage sizes (instead of just some) and
instantiates the generic hstate structure (but no more).
hugetlbpage_init() now steps through the available pagesizes, checks
if they're valid for hugepages by calling add_huge_page_size() and
initializes the kmem_caches for the hugepage pagetables. This means
we can now eliminate the mmu_huge_psizes array, since we no longer
need to pass the sizing information for the pagetable caches from
set_huge_psize() into hugetlbpage_init()
Determination of the default huge page size is also moved from the
hash code into the general hugepage code.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently each available hugepage size uses a slightly different
pagetable layout: that is, the bottem level table of pointers to
hugepages is a different size, and may branch off from the normal page
tables at a different level. Every hugepage aware path that needs to
walk the pagetables must therefore look up the hugepage size from the
slice info first, and work out the correct way to walk the pagetables
accordingly. Future hardware is likely to add more possible hugepage
sizes, more layout options and more mess.
This patch, therefore reworks the handling of hugepage pagetables to
reduce this complexity. In the new scheme, instead of having to
consult the slice mask, pagetable walking code can check a flag in the
PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
and the entry also contains the information (eseentially hugepage
shift) necessary to then interpret that table without recourse to the
slice mask. This scheme can be extended neatly to handle multiple
levels of self-describing "special" hugepage pagetables, although for
now we assume only one level exists.
This approach means that only the pagetable allocation path needs to
know how the pagetables should be set out. All other (hugepage)
pagetable walking paths can just interpret the structure as they go.
There already was a flag bit in PGD/PUD/PMD entries for hugepage
directory pointers, but it was only used for debug. We alter that
flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
pointer (normally it would be 1 since the pointer lies in the linear
mapping). This means that asm pagetable walking can test for (and
punt on) hugepage pointers with the same test that checks for
unpopulated page directory entries (beq becomes bge), since hugepage
pointers will always be positive, and normal pointers always negative.
While we're at it, we get rid of the confusing (and grep defeating)
#defining of hugepte_shift to be the same thing as mmu_huge_psizes.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we have a fair bit of rather fiddly code to manage the
various kmem_caches used to store page tables of various levels. We
generally have two caches holding some combination of PGD, PUD and PMD
tables, plus several more for the special hugepage pagetables.
This patch cleans this all up by taking a different approach. Rather
than the caches being designated as for PUDs or for hugeptes for 16M
pages, the caches are simply allocated to be a specific size. Thus
sharing of caches between different types/levels of pagetables happens
naturally. The pagetable size, where needed, is passed around encoded
in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
pagetable contains 2^n pointers.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, hpte_need_flush() only correctly flushes the given address
for normal pages. Callers for hugepages are required to mask the
address themselves.
But hpte_need_flush() already looks up the page sizes for its own
reasons, so this is a rather silly imposition on the callers. This
patch alters it to mask based on the pagesize it has looked up itself,
and removes the awkward masking code in the hugepage caller.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On pSeries, we always force the IO space to be mapped using 4K
pages even with a 64K base page size to cope with some limitations
in the HV interface to some devices.
However, the SLB miss handler code to discriminate between vmalloc
and ioremap space uses a CPU feature section such that the code
is nop'ed out when the processor support large pages non-cachable
mappings.
Thus, we end up always using the ioremap page size for vmalloc
segments on such processors, causing a discrepency between the
segment and the hash table, and thus a hang continously hashing
the page.
It works for the first segment of the vmalloc space since that
segment is "bolted" in by C code correctly, and thankfully we
almost never use the vmalloc space beyond the first segment,
but the new percpu code made the bug happen.
This fixes it by removing the feature section from the assembly,
we now always do the comparison between vmalloc and ioremap.
Signed-off-by; Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.
Signed-off-by: Rex Feany <rfeany@mrv.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For /proc/kcore, each arch registers its memory range by kclist_add().
In usual,
- range of physical memory
- range of vmalloc area
- text, etc...
are registered but "range of physical memory" has some troubles. It
doesn't updated at memory hotplug and it tend to include unnecessary
memory holes. Now, /proc/iomem (kernel/resource.c) includes required
physical memory range information and it's properly updated at memory
hotplug. Then, it's good to avoid using its own code(duplicating
information) and to rebuild kclist for physical memory based on
/proc/iomem.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally, walk_memory_resource() was introduced to traverse all memory
of "System RAM" for detecting memory hotplug/unplug range. For doing so,
flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
memory hotplug.
But for using other purpose, /proc/kcore, this may includes some firmware
area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
check strict to find out busy "System RAM".
Note: PPC64 keeps their own walk_memory_resouce(), which walk through
ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
patch makes no difference in behavior, finally.
And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
Because pfn_valid() just show "there is memmap or not* and cannot be used
for "there is physical memory or not", this function is useful in generic
to scan physical memory range.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For /proc/kcore, vmalloc areas are registered per arch. But, all of them
registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies
them. By this. archs which have no kclist_add() hooks can see vmalloc
area correctly.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently, kclist_add() only eats start address and size as its arguments.
Considering to make kclist dynamically reconfigulable, it's necessary to
know which kclists are for System RAM and which are not.
This patch add kclist types as
KCORE_RAM
KCORE_VMALLOC
KCORE_TEXT
KCORE_OTHER
This "type" is used in a patch following this for detecting KCORE_RAM.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9617729941 ("Drop free_pages()")
modified nr_free_pages() to return 'unsigned long' instead of 'unsigned
int'. This made the casts to 'unsigned long' in most callers superfluous,
so remove them.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <zankel@tensilica.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
powerpc/nvram: Enable use Generic NVRAM driver for different size chips
powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
powerpc/ps3: Workaround for flash memory I/O error
powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
powerpc/perf_counters: Reduce stack usage of power_check_constraints
powerpc: Fix bug where perf_counters breaks oprofile
powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
powerpc/irq: Improve nanodoc
powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
powerpc/book3e: Add missing page sizes
powerpc/pseries: Fix to handle slb resize across migration
powerpc/powermac: Thermal control turns system off too eagerly
powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
powerpc/405ex: support cuImage via included dtb
powerpc/405ex: provide necessary fixup function to support cuImage
powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
powerpc/44x: Update Arches defconfig
powerpc/44x: Update Arches dts
...
Fix up conflicts in drivers/char/agp/uninorth-agp.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...
Fix trivial conflict as by Tejun Heo in kernel/sched.c
The SLB can change sizes across a live migration, which was not
being handled, resulting in possible machine crashes during
migration if migrating to a machine which has a smaller max SLB
size than the source machine. Fix this by first reducing the
SLB size to the minimum possible value, which is 32, prior to
migration. Then during the device tree update which occurs after
migration, we make the call to ensure the SLB gets updated. Also
add the slb_size to the lparcfg output so that the migration
tools can check to make sure the kernel has this capability
before allowing migration in scenarios where the SLB size will change.
BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid
breaking ppc32 build
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Support for TLB reservation (or TLB Write Conditional) and Paired MAS
registers are optional for a processor implementation so we handle
them via MMU feature sections.
We currently only used paired MAS registers to access the full RPN + perm
bits that are kept in MAS7||MAS3. We assume that if an implementation has
hardware page table at this time it also implements in TLB reservations.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is an attempt at cleaning up a bit the way we handle execute
permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only
defined by CPUs that can do something with it, and the myriad of
#ifdef's in the I$/D$ coherency code is reduced to 2 cases that
hopefully should cover everything.
The logic on BookE is a little bit different than what it was though
not by much. Since now, _PAGE_EXEC will be set by the generic code
for executable pages, we need to filter out if they are unclean and
recover it. However, I don't expect the code to be more bloated than
it already was in that area due to that change.
I could boast that this brings proper enforcing of per-page execute
permissions to all BookE and 40x but in fact, we've had that now for
some time as a side effect of my previous rework in that area (and
I didn't even know it :-) We would only enable execute permission if
the page was cache clean and we would only cache clean it if we took
and exec fault. Since we now enforce that the later only work if
VM_EXEC is part of the VMA flags, we de-fact already enforce per-page
execute permissions... Unless I missed something
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The MMUCSR is now defined as part of the Book-3E architecture so we
can move it into mmu-book3e.h and add some of the additional bits
defined by the architecture specs.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Since the pte_lockptr is a spinlock it gets optimized away on
uniprocessor builds so using spin_is_locked is not correct. We can use
assert_spin_locked instead and get the proper behavior between UP and
SMP builds.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cam[tlbcam_index] is checked before tlbcam_index < ARRAY_SIZE(cam)
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Introduced a temporary variable into our iterating over the list cpus
that are threads on the same core. For some reason Ben forgot how for
loops work.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This contains all the bits that didn't fit in previous patches :-) This
includes the actual exception handlers assembly, the changes to the
kernel entry, other misc bits and wiring it all up in Kconfig.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The base TLB support didn't include support for SPARSEMEM_VMEMMAP, though
we did carve out some virtual space for it, the necessary support code
wasn't there. This implements it by using 16M pages for now, though the
page size could easily be changed at runtime if necessary.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the TLB miss handler assembly, the low level TLB flush routines
along with the necessary hook for dealing with our virtual page tables
or indirect TLB entries that need to be flushes when PTE pages are freed.
There is currently no support for hugetlbfs
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>