Commit Graph

10577 Commits

Author SHA1 Message Date
Linus Torvalds 4be41343a2 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Marcelo Tosatti:
 "PPC and ARM KVM fixes"

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  ARM: KVM: fix L_PTE_S2_RDWR to actually be Read/Write
  ARM: KVM: fix KVM_CAP_ARM_SET_DEVICE_ADDR reporting
  kvm/ppc/e500: eliminate tlb_refs
  kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit
  kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid
  kvm/powerpc/e500mc: fix tlb invalidation on cpu migration
2013-04-16 19:46:14 -07:00
Kevin Hao d8b9229240 powerpc: add a missing label in resume_kernel
A label 0 was missed in the patch a9c4e541 (powerpc/kprobe: Complete
kprobe and migrate exception frame). This will cause the kernel
branch to an undetermined address if there really has a conflict when
updating the thread flags.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Acked-By: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-04-15 17:29:48 +10:00
Alistair Popple 05e38e5d5d powerpc: Fix audit crash due to save/restore PPR changes
The current mainline crashes when hitting userspace with the following:

kernel BUG at kernel/auditsc.c:1769!
cpu 0x1: Vector: 700 (Program Check) at [c000000023883a60]
    pc: c0000000001047a8: .__audit_syscall_entry+0x38/0x130
    lr: c00000000000ed64: .do_syscall_trace_enter+0xc4/0x270
    sp: c000000023883ce0
   msr: 8000000000029032
  current = 0xc000000023800000
  paca    = 0xc00000000f080380   softe: 0        irq_happened: 0x01
    pid   = 1629, comm = start_udev
kernel BUG at kernel/auditsc.c:1769!
enter ? for help
[c000000023883d80] c00000000000ed64 .do_syscall_trace_enter+0xc4/0x270
[c000000023883e30] c000000000009b08 syscall_dotrace+0xc/0x38
 --- Exception: c00 (System Call) at 0000008010ec50dc

Bisecting found the following patch caused it:

commit 44e9309f1f
Author: Haren Myneni <haren@linux.vnet.ibm.com>
powerpc: Implement PPR save/restore

It was found this patch corrupted r9 when calling
SET_DEFAULT_THREAD_PPR()

Using r10 as a scratch register instead of r9 solved the problem.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-04-15 17:29:45 +10:00
Scott Wood 4d2be6f7c7 kvm/ppc/e500: eliminate tlb_refs
Commit 523f0e5421 ("KVM: PPC: E500:
Explicitly mark shadow maps invalid") began using E500_TLB_VALID
for guest TLB1 entries, and skipping invalidations if it's not set.

However, when E500_TLB_VALID was set for such entries, it was on a
fake local ref, and so the invalidations never happen.  gtlb_privs
is documented as being only for guest TLB0, though we already violate
that with E500_TLB_BITMAP.

Now that we have MMU notifiers, and thus don't need to actually
retain a reference to the mapped pages, get rid of tlb_refs, and
use gtlb_privs for E500_TLB_VALID in TLB1.

Since we can have more than one host TLB entry for a given tlbe_ref,
be careful not to clear existing flags that are relevant to other
host TLB entries when preparing a new host TLB entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-11 15:53:43 +02:00
Scott Wood 66a5fecdcc kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit
It's possible that we're using the same host TLB1 slot to map (a
presumably different portion of) the same guest TLB1 entry.  Clear
the bit in the map before setting it, so that if the esels are the same
the bit will remain set.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-11 15:53:38 +02:00
Scott Wood 6b2ba1a912 kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid
Add one to esel values in h2g_tlb1_rmap, so that "no mapping" can be
distinguished from "esel 0".  Note that we're not saved by the fact
that host esel 0 is reserved for non-KVM use, because KVM host esel
numbering is not the raw host numbering (see to_htlb1_esel).

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-11 15:53:34 +02:00
Scott Wood c5e6cb051c kvm/powerpc/e500mc: fix tlb invalidation on cpu migration
The existing check handles the case where we've migrated to a different
core than we last ran on, but it doesn't handle the case where we're
still on the same cpu we last ran on, but some other vcpu has run on
this cpu in the meantime.

Without this, guest segfaults (and other misbehavior) have been seen in
smp guests.

Cc: stable@vger.kernel.org # 3.8.x
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-11 00:06:39 +02:00
Michael Neuling f110c0c192 powerpc: fix compiling CONFIG_PPC_TRANSACTIONAL_MEM when CONFIG_ALTIVEC=n
We can't compile a kernel with CONFIG_ALTIVEC=n when
CONFIG_PPC_TRANSACTIONAL_MEM=y.  We currently get:

arch/powerpc/kernel/tm.S:320: Error: unsupported relocation against THREAD_VSCR
arch/powerpc/kernel/tm.S:323: Error: unsupported relocation against THREAD_VR0
arch/powerpc/kernel/tm.S:323: Error: unsupported relocation against THREAD_VR0
etc.

The below fixes this with a sprinkling of #ifdefs.

This was found by mpe with kisskb:
  http://kisskb.ellerman.id.au/kisskb/buildresult/8539442/

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-04-10 08:14:39 +10:00
Michael Wolf 9fb2640159 powerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being performed before the ANDCOND test
Some versions of pHyp will perform the adjunct partition test before the
ANDCOND test.  The result of this is that H_RESOURCE can be returned and
cause the BUG_ON condition to occur. The HPTE is not removed.  So add a
check for H_RESOURCE, it is ok if this HPTE is not removed as
pSeries_lpar_hpte_remove is looking for an HPTE to remove and not a
specific HPTE to remove.  So it is ok to just move on to the next slot
and try again.

Cc: stable@vger.kernel.org
Signed-off-by: Michael Wolf <mjw@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-04-08 15:19:09 +10:00
Stuart Yoder f9294e989f powerpc: define the conditions where the ePAPR idle hcall can be supported
For 32-bit, CONFIG_EPAPR_PARAVIRT pulls in both epapr_paravirt.c
and epapr_hcalls.c which contains the 32-bit paravirt idle loop.

For 64-bit, the paravirt idle loop is in idle_book3e.S and that
source file is included only if CONFIG_PPC_BOOK3E_64 defined.

This patch makes that dependency for 64-bit explicit.

Fixes these build errors:

arch/powerpc/kernel/built-in.o: In function `restore_pblist_ptr':
ftrace.c:(.toc+0xdc0): undefined reference to `epapr_ev_idle_start'
ftrace.c:(.toc+0xdd0): undefined reference to `epapr_ev_idle'

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-03-26 08:47:27 +11:00
Chen Gang 087aa036eb powerpc: make additional room in exception vector area
The FWNMI region is fixed at 0x7000 and the vector are now overflowing
that with allmodconfig. Fix that by moving slb_miss_realmode code out
of that region as it doesn't need to be that close to the call sites
(it is a _GLOBAL function)

Fixes this build error:

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1304: Error: attempt to move .org backwards

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-03-25 17:14:21 +11:00
Linus Torvalds cd82346934 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A fair chunk of the linecount comes from a fix for a tracing bug that
  corrupts latency tracing buffers when the overwrite mode is changed on
  the fly - the rest is mostly assorted fewliner fixlets."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event
  kprobes/x86: Check Interrupt Flag modifier when registering probe
  kprobes: Make hash_64() as always inlined
  perf: Generate EXIT event only once per task context
  perf: Reset hwc->last_period on sw clock events
  tracing: Prevent buffer overwrite disabled for latency tracers
  tracing: Keep overwrite in sync between regular and snapshot buffers
  tracing: Protect tracer flags with trace_types_lock
  perf tools: Fix LIBNUMA build with glibc 2.12 and older.
  tracing: Fix free of probe entry by calling call_rcu_sched()
  perf/POWER7: Create a sysfs format entry for Power7 events
  perf probe: Fix segfault
  libtraceevent: Remove hard coded include to /usr/local/include in Makefile
  perf record: Fix -C option
  perf tools: check if -DFORTIFY_SOURCE=2 is allowed
  perf report: Fix build with NO_NEWT=1
  perf annotate: Fix build with NO_NEWT=1
  tracing: Fix race in snapshot swapping
2013-03-21 08:29:11 -07:00
Ben Collins 9997d08806 sgy-cts1000: Remove __dev* attributes
Somehow the driver snuck in with these still in it.

Signed-off-by: Ben Collins <ben.c@servergy.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-18 18:49:10 -07:00
Linus Torvalds a15cd063e1 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
 "Here's a few powerpc fixes for 3.9, mostly regressions (though not all
  from 3.9 merge window) that we've been hammering into shape over the
  last couple of weeks.  They fix booting on Cell and G5 among other
  things (yes, we've been a bit sloppy with older machines this time
  around)."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Rename USER_ESID_BITS* to ESID_BITS*
  powerpc: Update kernel VSID range
  powerpc: Make VSID_BITS* dependency explicit
  powerpc: Make sure that we alays include CONFIG_BINFMT_ELF
  powerpc/ptrace: Fix brk.len used uninitialised
  powerpc: Fix -mcmodel=medium breakage in prom_init.c
  powerpc: Remove last traces of POWER4_ONLY
  powerpc: Fix cputable entry for 970MP rev 1.0
  powerpc: Fix STAB initialization
2013-03-18 08:12:41 -07:00
Ingo Molnar a0bf225db7 perf/urgent fixes:
. perf probe: Fix segfault due to testing the wrong pointer for NULL,
   from Ananth N Mavinakayanahalli.
 
 . libtraceevent: Remove hard coded include to /usr/local/include in
   Makefile, which causes cross builds to include host header files,
   fix from Jack Mitchell.
 
 . perf record: Use the right target interface for synthesizing
   threads when --cpu/-C option is used, fix from Jiri Olsa.
 
 . Check if -DFORTIFY_SOURCE=2 is allowed, as gcc 4.7.2 defines
   it and then the build is broken when it is redefined in perf,
   fix from Marcin Slusarz.
 
 . Fix build with NO_NEWT=1, that can happen explicitely or when
   the newt-devel package is not installed, from Michael Ellerman.
 
 . perf/POWER7: Create a sysfs format entry for Power7 events, missing
   patch from a patchseries already merged, from Sukadev Bhattiprolu.
 
 . Fix LIBNUMA build with glibc 2.12 and older, from Vinson Lee.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJRQblOAAoJENZQFvNTUqpAop0P+wT/5oWqaMU1WgHpP4OTn8dU
 6FkOf9+9IvncOLvE32Kg+hcnLsf9pQWnOxpZgmIxq53d0GI51eNyGPrpedAb89ep
 aa1/C5TUoX18tPiVYB5jahM+f8cw1gZX0CGzdJSpMJF3a0qiXTPAawUnoVJOpPYr
 lPaAmR8sxvm+HwlyBTQM4FFNY36O7inPQxg/gvoW8348Ml9bl63Bh3GytHCOOGLf
 cNgceQu8NpvAG0kqWUFhlZVyYHW4ruD7q+3VUY8ZNltYuA+LAy1Z0FHC8SxJM3B2
 CSKNqRaO8NJ/9eqQ9hg1CHeZVqyNLx58Y0yByGf5LiMUd6DMN7KoQzZEtekEt+L5
 baijxKlFoIX/tneIt2droWIvxzyahhKwUTXbrcaDRPIPw4KqWvy/5+KN+S939yPs
 TnuDai6y+yxPXZMPxLCCharXVGwDWP9HiW7nq25ZLmFILWMUbL42Apb4ErMsEoik
 CRO6ALB9bFn1GkTWPCEMGGmnDGudQSRRwOlgoMEt6XdL5RALv/lgX4HMWfSQ1Wpz
 0GzXRDSokXwAu2QIp2QoqsNspvH/1f4WUjiHiwwDSoontujbJ7/hrqdXYjMuS8cU
 lh0I8p/JpLdvEnCF4vwQo2YozWZEvPtlrYNVx+PqPB1RsNow0WCepHeg/urfxHC3
 qVO/30jASFDpYLyl7h2N
 =aN1j
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

. perf probe: Fix segfault due to testing the wrong pointer for NULL,
  from Ananth N Mavinakayanahalli.

. libtraceevent: Remove hard coded include to /usr/local/include in
  Makefile, which causes cross builds to include host header files,
  fix from Jack Mitchell.

. perf record: Use the right target interface for synthesizing
  threads when --cpu/-C option is used, fix from Jiri Olsa.

. Check if -DFORTIFY_SOURCE=2 is allowed, as gcc 4.7.2 defines
  it and then the build is broken when it is redefined in perf,
  fix from Marcin Slusarz.

. Fix build with NO_NEWT=1, that can happen explicitely or when
  the newt-devel package is not installed, from Michael Ellerman.

. perf/POWER7: Create a sysfs format entry for Power7 events, missing
  patch from a patchseries already merged, from Sukadev Bhattiprolu.

. Fix LIBNUMA build with glibc 2.12 and older, from Vinson Lee.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-18 10:00:56 +01:00
Aneesh Kumar K.V af81d7878c powerpc: Rename USER_ESID_BITS* to ESID_BITS*
Now we use ESID_BITS of kernel address to build proto vsid. So rename
USER_ESIT_BITS to ESID_BITS

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.8]
2013-03-17 12:45:44 +11:00
Aneesh Kumar K.V c60ac5693c powerpc: Update kernel VSID range
This patch change the kernel VSID range so that we limit VSID_BITS to 37.
This enables us to support 64TB with 65 bit VA (37+28). Without this patch
we have boot hangs on platforms that only support 65 bit VA.

With this patch we now have proto vsid generated as below:

We first generate a 37-bit "proto-VSID". Proto-VSIDs are generated
from mmu context id and effective segment id of the address.

For user processes max context id is limited to ((1ul << 19) - 5)
for kernel space, we use the top 4 context ids to map address as below
0x7fffc -  [ 0xc000000000000000 - 0xc0003fffffffffff ]
0x7fffd -  [ 0xd000000000000000 - 0xd0003fffffffffff ]
0x7fffe -  [ 0xe000000000000000 - 0xe0003fffffffffff ]
0x7ffff -  [ 0xf000000000000000 - 0xf0003fffffffffff ]

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.8]
2013-03-17 12:39:06 +11:00
Aneesh Kumar K.V e39d1a4714 powerpc: Make VSID_BITS* dependency explicit
VSID_BITS and VSID_BITS_1T depends on the context bits  and user esid
bits. Make the dependency explicit

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.8]
2013-03-17 12:38:15 +11:00
Stephen Rothwell d812c0e1f9 powerpc: Make sure that we alays include CONFIG_BINFMT_ELF
Our kernel is not much good without BINFMT_ELF and this fixes a build
warning on 64 bit allnoconfig builds:

warning: (COMPAT) selects COMPAT_BINFMT_ELF which has unmet direct dependencies (COMPAT && BINFMT_ELF)

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-17 12:35:09 +11:00
Michael Neuling 2bb78efab4 powerpc/ptrace: Fix brk.len used uninitialised
With some CONFIGS it's possible that in ppc_set_hwdebug, brk.len is
uninitialised before being used.  It has been reported that GCC 4.2 will
produce the following error in this case:

  arch/powerpc/kernel/ptrace.c:1479: warning: 'brk.len' is used uninitialized in this function
  arch/powerpc/kernel/ptrace.c:1381: note: 'brk.len' was declared here

This patch corrects this.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-17 12:35:06 +11:00
Sukadev Bhattiprolu 3bf7b07ece perf/POWER7: Create a sysfs format entry for Power7 events
Create a sysfs entry, '/sys/bus/event_source/devices/cpu/format/event'
which describes the format of the POWER7 PMU events.

This code is based on corresponding code in x86.

Changelog[v4]:  [Michael Ellerman, Paul Mckerras] The event format is different
		for other POWER cpus. So move the code to POWER7-specific,
		power7-pmu.c Also, the POWER7 format uses bits 0-19 not 0-20.

Changelog[v2]: [Jiri Osla] Use PMU_FORMAT_ATTR rather than duplicating code.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: benh@kernel.crashing.org
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130306054826.GA14627@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-03-13 17:01:04 -03:00
Anton Blanchard 1674400aae powerpc: Fix -mcmodel=medium breakage in prom_init.c
Commit 5ac47f7a6e (powerpc: Relocate prom_init.c on 64bit) made
prom_init.c position independent by manually relocating its entries
in the TOC.

We get the address of the TOC entries with the __prom_init_toc_start
linker symbol. If __prom_init_toc_start ends up as an entry in the
TOC then we need to add an offset to get the current address. This is
the case for older toolchains.

On the other hand, if we have a newer toolchain that supports
-mcmodel=medium then __prom_init_toc_start will be created by a
relative offset from r2 (the TOC pointer). Since r2 has already been
relocated, nothing more needs to be done.  Adding an offset in this
case is wrong and Aaro Koskinen and Alexander Graf have noticed noticed
G5 and OpenBIOS breakage.

Alan Modra suggested we just use r2 to get at the TOC which is simpler
and works with both old and new toolchains.

Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-13 10:06:52 +11:00
Paul Bolle ff2d7587c7 powerpc: Remove last traces of POWER4_ONLY
The Kconfig symbol POWER4_ONLY got removed in commit
694caf0255 ("powerpc: Remove
CONFIG_POWER4_ONLY"). Remove its last traces.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-13 10:06:49 +11:00
Benjamin Herrenschmidt d63ac5f6cf powerpc: Fix cputable entry for 970MP rev 1.0
Commit 44ae3ab335 forgot to update
the entry for the 970MP rev 1.0 processor when moving some CPU
features bits to the MMU feature bit mask. This breaks booting
on some rare G5 models using that chip revision.

Reported-by: Phileas Fogg <phileas-fogg@mail.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.0+]
2013-03-13 10:06:46 +11:00
Benjamin Herrenschmidt 13938117a5 powerpc: Fix STAB initialization
Commit f5339277eb accidentally removed
more than just iSeries bits and took out the call to stab_initialize()
thus breaking support for POWER3 processors.

Put it back. (Yes, nobody noticed until now ...)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.4+]
2013-03-13 10:06:22 +11:00
Stephen Rothwell 4febd95a8a Select VIRT_TO_BUS directly where needed
In commit 887cbce0ad ("arch Kconfig: centralise ARCH_NO_VIRT_TO_BUS")
I introduced the config sybmol HAVE_VIRT_TO_BUS and selected that where
needed.  I am not sure what I was thinking.  Instead, just directly
select VIRT_TO_BUS where it is needed.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-12 11:16:40 -07:00
Linus Torvalds 72932611b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace bugfixes from Eric Biederman:
 "This is three simple fixes against 3.9-rc1.  I have tested each of
  these fixes and verified they work correctly.

  The userns oops in key_change_session_keyring and the BUG_ON triggered
  by proc_ns_follow_link were found by Dave Jones.

  I am including the enhancement for mount to only trigger requests of
  filesystem modules here instead of delaying this for the 3.10 merge
  window because it is both trivial and the kind of change that tends to
  bit-rot if left untouched for two months."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Use nd_jump_link in proc_ns_follow_link
  fs: Limit sys_mount to only request filesystem modules (Part 2).
  fs: Limit sys_mount to only request filesystem modules.
  userns: Stop oopsing in key_change_session_keyring
2013-03-09 16:51:13 -08:00
Michael Neuling 54c9b2253d powerpc: Set DSCR bit in FSCR setup
We support DSCR (Data Stream Control Register) so we should make sure we set it
in the FSCR (Facility Status & Control Register) incase some firmwares don't
set it.  If we don't set this, we'll take a facility unavailable exception when
using the DSCR.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:30 +11:00
Michael Neuling fa759e9b09 powerpc: Add DSCR FSCR register bit definition
This sets the DSCR (Data Stream Control Register) in the FSCR (Facility Status
& Control Register).

Also harmonise TAR (Target Address Register) FSCR bit definition too.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:30 +11:00
Michael Neuling 57d231678a powerpc: Fix setting FSCR for HV=0 and on secondary CPUs
Currently we only set the FSCR (Facility Status and Control Register) when HV=1
but this feature is available when HV=0 also.  This patch sets FSCR when HV=0.

Also, we currently only set the FSCR on the master CPU.  This patch also sets
the FSCR on secondary CPUs.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:29 +11:00
Tony Breeds 8170a83f15 powerpc: Wireup the kcmp syscall to sys_ni
Since kmp takes 2 unsigned long args there should be a compat wrapper.
Since one isn't provided I think it's safer just to hook this up to not
implemented.  If we need it later we can do it properly then.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:29 +11:00
Akinobu Mita a74f350b5c powerpc: Remove unused BITOP_LE_SWIZZLE macro
The BITOP_LE_SWIZZLE macro was used in the little-endian bitops functions
for powerpc.  But these functions were converted to generic bitops and
the BITOP_LE_SWIZZLE is not used anymore.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:28 +11:00
Michael Neuling 6a404806df powerpc: Avoid link stack corruption in MMU on syscall entry path
Currently we use the link register to branch up high in the early MMU on
syscall entry path.  Unfortunately, this trashes the link stack as the
address we are going to is not associated with the earlier mflr.

This patch simply converts us to used the count register (volatile over
syscalls anyway) instead.  This is much better at predicting in this
scenario and doesn't trash link stack causing a bunch of additional
branch mispredicts later.  Benchmarking this on POWER8 saves a bunch of
cycles on Anton's null syscall benchmark here:
   http://ozlabs.org/~anton/junkcode/null_syscall.c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:28 +11:00
Chen Gang 6b6680c4ea powerpc/pseries/hvcserver: Fix strncpy buffer limit in location code
the dest buf len is 80 (HVCS_CLC_LENGTH + 1).
  the src buf len is PAGE_SIZE.
  if src buf string len is more than 80, it will cause issue.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:27 +11:00
Tony Breeds 27777890d0 powerpc: Fix compile of sha1-powerpc-asm.S on 32-bit
When building with CRYPTO_SHA1_PPC enabled we fail with:

powerpc/crypto/sha1-powerpc-asm.S: Assembler messages:
powerpc/crypto/sha1-powerpc-asm.S:116: Error: can't resolve `0' {*ABS* section} - `STACKFRAMESIZE' {*UND* section}
powerpc/crypto/sha1-powerpc-asm.S:116: Error: expression too complex
powerpc/crypto/sha1-powerpc-asm.S:178: Error: unsupported relocation against STACKFRAMESIZE

Use INT_FRAME_SIZE instead of STACKFRAMESIZE.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:26 +11:00
Eric W. Biederman 7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Linus Torvalds 14cc0b55b7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal/compat fixes from Al Viro:
 "Fixes for several regressions introduced in the last signal.git pile,
  along with fixing bugs in truncate and ftruncate compat (on just about
  anything biarch at least one of those two had been done wrong)."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  compat: restore timerfd settime and gettime compat syscalls
  [regression] braino in "sparc: convert to ksignal"
  fix compat truncate/ftruncate
  switch lseek to COMPAT_SYSCALL_DEFINE
  lseek() and truncate() on sparc really need sign extension
2013-03-02 08:34:06 -08:00
Sasha Levin b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Stephen Rothwell 887cbce0ad arch Kconfig: centralise CONFIG_ARCH_NO_VIRT_TO_BUS
Change it to CONFIG_HAVE_VIRT_TO_BUS and set it in all architecures
that already provide virt_to_bus().

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: H Hartley Sweeten <hartleys@visionengravers.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:23 -08:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Al Viro e72837e3e7 default SET_PERSONALITY() in linux/elf.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:08 -05:00
Linus Torvalds 9043a2650c The sweeping change is to make add_taint() explicitly indicate whether to disable
lockdep, but it's a mechanical change.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRJAcuAAoJENkgDmzRrbjxsw0P/3eXb+LddYnx0V0uHYdKpCUf
 4vdW7X0fX3Z+aUK69IWRL/6ahoO4TpaHYGHBDjEoivyQ0GDq14X7JNWsYYt3LdMf
 3wmDgRc2cn/mZOJbFeVpNV8ox5l/xc0CUvV+iQ8tMjfQItXMXgWUFZKMECsXKSO6
 eex3lrw9M2jAX2uL8LQPp9W8xtKu24nSZRC6tH5riE/8fCzi1cZPPAqfxP5c8Lee
 ZXtbCRSyAFENZLpKyMe1PC7HvtJyi5NDn9xwOQiXULZV/VOlvP94DGBLIKCM/6dn
 4QvZxpG0P0uOlpCgRAVLyh/z7g4XY4VF/fHopLCmEcqLsvgD+V2LQpQ9zWUalLPC
 Z+pUpz2vu0gIddPU1nR8R6oGpEdJ8O12aJle62p/RSXWZGx12qUQ+Tamu0tgKcv1
 AsiJfbUGNDYfxgU6sHsoQjl2f68LTVckCU1C1LqEbW/S104EIORtGx30CHM4LRiO
 32kDC5TtgYDBKQAIqJ4bL48ZMh+9W3uX40p7xzOI5khHQjvswUKa3jcxupU0C1uv
 lx8KXo7pn8WT33QGysWC782wJCgJuzSc2vRn+KQoqoynuHGM6agaEtR59gil3QWO
 rQEcxH63BBRDgHlg4FM9IkJwwsnC3PWKL8gbX0uAWXAPMbgapJkuuGZAwt0WDGVK
 +GszxsFkCjlW0mK0egTb
 =tiSY
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module update from Rusty Russell:
 "The sweeping change is to make add_taint() explicitly indicate whether
  to disable lockdep, but it's a mechanical change."

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  MODSIGN: Add option to not sign modules during modules_install
  MODSIGN: Add -s <signature> option to sign-file
  MODSIGN: Specify the hash algorithm on sign-file command line
  MODSIGN: Simplify Makefile with a Kconfig helper
  module: clean up load_module a little more.
  modpost: Ignore ARC specific non-alloc sections
  module: constify within_module_*
  taint: add explicit flag to show whether lock dep is still OK.
  module: printk message when module signature fail taints kernel.
2013-02-25 15:41:43 -08:00
Al Viro 3f6d078d4a fix compat truncate/ftruncate
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-25 09:24:55 -05:00
Linus Torvalds 89f883372f Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Marcelo Tosatti:
 "KVM updates for the 3.9 merge window, including x86 real mode
  emulation fixes, stronger memory slot interface restrictions, mmu_lock
  spinlock hold time reduction, improved handling of large page faults
  on shadow, initial APICv HW acceleration support, s390 channel IO
  based virtio, amongst others"

* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  Revert "KVM: MMU: lazily drop large spte"
  x86: pvclock kvm: align allocation size to page size
  KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
  x86 emulator: fix parity calculation for AAD instruction
  KVM: PPC: BookE: Handle alignment interrupts
  booke: Added DBCR4 SPR number
  KVM: PPC: booke: Allow multiple exception types
  KVM: PPC: booke: use vcpu reference from thread_struct
  KVM: Remove user_alloc from struct kvm_memory_slot
  KVM: VMX: disable apicv by default
  KVM: s390: Fix handling of iscs.
  KVM: MMU: cleanup __direct_map
  KVM: MMU: remove pt_access in mmu_set_spte
  KVM: MMU: cleanup mapping-level
  KVM: MMU: lazily drop large spte
  KVM: VMX: cleanup vmx_set_cr0().
  KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
  KVM: VMX: disable SMEP feature when guest is in non-paging mode
  KVM: Remove duplicate text in api.txt
  Revert "KVM: MMU: split kvm_mmu_free_page"
  ...
2013-02-24 13:07:18 -08:00
Al Viro 561c673197 switch lseek to COMPAT_SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-24 10:52:26 -05:00
Linus Torvalds 9e2d59ad58 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
 "This is the first pile; another one will come a bit later and will
  contain SYSCALL_DEFINE-related patches.

   - a bunch of signal-related syscalls (both native and compat)
     unified.

   - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
     (fixing several potential problems with missing argument
     validation, while we are at it)

   - a lot of now-pointless wrappers killed

   - a couple of architectures (cris and hexagon) forgot to save
     altstack settings into sigframe, even though they used the
     (uninitialized) values in sigreturn; fixed.

   - microblaze fixes for delivery of multiple signals arriving at once

   - saner set of helpers for signal delivery introduced, several
     architectures switched to using those."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
  x86: convert to ksignal
  sparc: convert to ksignal
  arm: switch to struct ksignal * passing
  alpha: pass k_sigaction and siginfo_t using ksignal pointer
  burying unused conditionals
  make do_sigaltstack() static
  arm64: switch to generic old sigaction() (compat-only)
  arm64: switch to generic compat rt_sigaction()
  arm64: switch compat to generic old sigsuspend
  arm64: switch to generic compat rt_sigqueueinfo()
  arm64: switch to generic compat rt_sigpending()
  arm64: switch to generic compat rt_sigprocmask()
  arm64: switch to generic sigaltstack
  sparc: switch to generic old sigsuspend
  sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
  sparc: kill sign-extending wrappers for native syscalls
  kill sparc32_open()
  sparc: switch to use of generic old sigaction
  sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
  mips: switch to generic sys_fork() and sys_clone()
  ...
2013-02-23 18:50:11 -08:00
Linus Torvalds 5ce1a70e2f Merge branch 'akpm' (more incoming from Andrew)
Merge second patch-bomb from Andrew Morton:

 - A little DM fix

 - the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
  ksm: allocate roots when needed
  mm: cleanup "swapcache" in do_swap_page
  mm,ksm: swapoff might need to copy
  mm,ksm: FOLL_MIGRATION do migration_entry_wait
  ksm: shrink 32-bit rmap_item back to 32 bytes
  ksm: treat unstable nid like in stable tree
  ksm: add some comments
  tmpfs: fix mempolicy object leaks
  tmpfs: fix use-after-free of mempolicy object
  mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
  mm: export mmu notifier invalidates
  mm: accelerate mm_populate() treatment of THP pages
  mm: use long type for page counts in mm_populate() and get_user_pages()
  mm: accurately document nr_free_*_pages functions with code comments
  HWPOISON: change order of error_states[]'s elements
  HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
  memcg: stop warning on memcg_propagate_kmem
  net: change type of virtio_chan->p9_max_pages
  vmscan: change type of vm_total_pages to unsigned long
  fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
  ...
2013-02-23 17:50:35 -08:00
Tang Chen 0197518cd3 memory-hotplug: remove memmap of sparse-vmemmap
Introduce a new API vmemmap_free() to free and remove vmemmap
pagetables.  Since pagetable implements are different, each architecture
has to provide its own version of vmemmap_free(), just like
vmemmap_populate().

Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc.

[mhocko@suse.cz: fix implicit declaration of remove_pagetable]
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Yasuaki Ishimatsu 46723bfa54 memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap
For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by
get_page_bootmem().  So the patch searches pages of virtual mapping and
registers the pages by get_page_bootmem().

NOTE: register_page_bootmem_memmap() is not implemented for ia64,
      ppc, s390, and sparc.  So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
      and revert register_page_bootmem_info_node() when platform doesn't
      support it.

      It's implemented by adding a new Kconfig option named
      CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
      by memory-hotplug feature fully supported archs(currently only on
      x86_64).

      Since we have 2 config options called MEMORY_HOTPLUG and
      MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
      and codes in function register_page_bootmem_info_node() are only
      used for collecting infomation for hot-remove, so reside it under
      MEMORY_HOTREMOVE.

      Besides page_isolation.c selected by MEMORY_ISOLATION under
      MEMORY_HOTPLUG is also such case, move it too.

[mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
[linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
[mhocko@suse.cz: remove the arch specific functions without any implementation]
[linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
[rientjes@google.com: fix defined but not used warning]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wu Jianguo <wujianguo@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Wen Congyang 24d335ca36 memory-hotplug: introduce new arch_remove_memory() for removing page table
For removing memory, we need to remove page tables.  But it depends on
architecture.  So the patch introduce arch_remove_memory() for removing
page table.  Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
      (I don't know how to implement it for s390).

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00