Commit Graph

227 Commits

Author SHA1 Message Date
Heiko Carstens 7e0d48574e [S390] Enable flexible mmap layout for 64 bit processes
Historically 64 bit processes use the legacy address layout. However
there is no reason why 64 bit processes shouldn't benefit from the
flexible mmap layout advantages.
Therefore just enable it.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-12 09:55:24 +01:00
Heiko Carstens 9e78a13bfb [S390] reduce miminum gap between stack and mmap_base
Reduce minimum gap between stack and mmap_base to 32MB. That way there
is a bit more space for heap and mmap for tight 31 bit address spaces.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-12 09:55:24 +01:00
Heiko Carstens 9046e401e7 [S390] mmap: consider stack address randomization
Consider stack address randomization when calulating mmap_base for
flexible mmap layout . Because of address randomization the stack
address can be up to 8MB lower than STACK_TOP.
When calculating mmap_base this isn't taken into account, which could
lead to the case that the gap between the real stack top and mmap_base
is lower than what ulimit specifies for the maximum stack size.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-12 09:55:24 +01:00
Martin Schwidefsky 5e9a26928f [S390] ptrace cleanup
Overhaul program event recording and the code dealing with the ptrace
user space interface.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-05 12:47:31 +01:00
Heiko Carstens fb0a9d7e86 [S390] pfault: delay register of pfault interrupt
Use an early init call to initialize pfault. That way it is possible to
use the register_external_interrupt() instead of the early variant.
No need to enable pfault any earlier since it has only effect if user
space processes are running.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-05 12:47:26 +01:00
Heiko Carstens 052ff461c8 [S390] irq: have detailed statistics for interrupt types
Up to now /proc/interrupts only has statistics for external and i/o
interrupts but doesn't split up them any further.
This patch adds a line for every single interrupt source so that it
is possible to easier tell what the machine is/was doing.
Part of the output now looks like this;

           CPU0       CPU2       CPU4
EXT:       3898       4232       2305
I/O:        782        315        245
CLK:       1029       1964        727   [EXT] Clock Comparator
IPI:       2868       2267       1577   [EXT] Signal Processor
TMR:          0          0          0   [EXT] CPU Timer
TAL:          0          0          0   [EXT] Timing Alert
PFL:          0          0          0   [EXT] Pseudo Page Fault
[...]
NMI:          0          1          1   [NMI] Machine Checks

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-05 12:47:25 +01:00
Martin Schwidefsky 25591b0703 [S390] fix get_user_pages_fast
The check for the _PAGE_RO bit in get_user_pages_fast for write==1 is
the wrong way around. It must not be set for the fast path.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-11-10 10:05:53 +01:00
Martin Schwidefsky 14375bc4eb [S390] cleanup facility list handling
Store the facility list once at system startup with stfl/stfle and
reuse the result for all facility tests.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:21 +02:00
Christian Borntraeger e05ef9bdb8 [S390] kvm: Fix badness at include/asm/mmu_context.h:83
commit 050eef364a
    [S390] fix tlb flushing vs. concurrent /proc accesses
broke KVM on s390x. On every schedule a
Badness at include/asm/mmu_context.h:83 appears. s390_enable_sie
replaces the mm on the __running__ task, therefore, we have to
increase the attach count of the new mm.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:20 +02:00
Martin Schwidefsky f6649a7e5a [S390] cleanup lowcore access from external interrupts
Read external interrupts parameters from the lowcore in the first
level interrupt handler in entry[64].S.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:19 +02:00
Martin Schwidefsky 1e54622e04 [S390] cleanup lowcore access from program checks
Read all required fields for program checks from the lowcore in the
first level interrupt handler in entry[64].S. If the context that
caused the fault was enabled for interrupts we can now re-enable the
irqs in entry[64].S.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:19 +02:00
Martin Schwidefsky 36bf96801e [S390] fix SIGBUS handling
Raise SIGBUS with a siginfo structure. Deliver BUS_ADRERR as si_code and
the address of the fault in the si_addr field.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:19 +02:00
Heiko Carstens a20852d2b7 [S390] cmm: fix crash on case conversion
When the cmm module is compiled into the kernel it will crash when
writing to the R/O data section.
Reason is the lower to upper case conversion of the "sender" module
parameter which ignored the fact that the pointer is preinitialized.

Introduced with 41b42876 "cmm, smsgiucv_app: convert sender to
uppercase"

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:17 +02:00
Martin Schwidefsky 92f842eac7 [S390] store indication fault optimization
Use the store indication bit in the translation exception code on
page faults to avoid the protection faults that immediatly follow
the page fault if the access has been a write.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:15 +02:00
Martin Schwidefsky 80217147a3 [S390] lockless get_user_pages_fast()
Implement get_user_pages_fast without locking in the fastpath on s390.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:15 +02:00
Martin Schwidefsky 238ec4efee [S390] zero page cache synonyms
If the zero page is mapped to virtual user space addresses that differ
only in bit 2^12 or 2^13 we get L1 cache synonyms which can affect
performance. Follow the mips model and use multiple zero pages to avoid
the synonyms.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:14 +02:00
David Howells df9ee29270 Fix IRQ flag handling naming
Fix the IRQ flag handling naming.  In linux/irqflags.h under one configuration,
it maps:

	local_irq_enable() -> raw_local_irq_enable()
	local_irq_disable() -> raw_local_irq_disable()
	local_irq_save() -> raw_local_irq_save()
	...

and under the other configuration, it maps:

	raw_local_irq_enable() -> local_irq_enable()
	raw_local_irq_disable() -> local_irq_disable()
	raw_local_irq_save() -> local_irq_save()
	...

This is quite confusing.  There should be one set of names expected of the
arch, and this should be wrapped to give another set of names that are expected
by users of this facility.

Change this to have the arch provide:

	flags = arch_local_save_flags()
	flags = arch_local_irq_save()
	arch_local_irq_restore(flags)
	arch_local_irq_disable()
	arch_local_irq_enable()
	arch_irqs_disabled_flags(flags)
	arch_irqs_disabled()
	arch_safe_halt()

Then linux/irqflags.h wraps these to provide:

	raw_local_save_flags(flags)
	raw_local_irq_save(flags)
	raw_local_irq_restore(flags)
	raw_local_irq_disable()
	raw_local_irq_enable()
	raw_irqs_disabled_flags(flags)
	raw_irqs_disabled()
	raw_safe_halt()

with type checking on the flags 'arguments', and then wraps those to provide:

	local_save_flags(flags)
	local_irq_save(flags)
	local_irq_restore(flags)
	local_irq_disable()
	local_irq_enable()
	irqs_disabled_flags(flags)
	irqs_disabled()
	safe_halt()

with tracing included if enabled.

The arch functions can now all be inline functions rather than some of them
having to be macros.

Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300]
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile]
Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze]
Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR]
Acked-by: Tony Luck <tony.luck@intel.com> [IA-64]
Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R]
Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS]
Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC]
Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390]
Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score]
Acked-by: Matt Fleming <matt@console-pimps.org> [SH]
Acked-by: David S. Miller <davem@davemloft.net> [Sparc]
Acked-by: Chris Zankel <chris@zankel.net> [Xtensa]
Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha]
Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300]
Cc: starvik@axis.com [CRIS]
Cc: jesper.nilsson@axis.com [CRIS]
Cc: linux-cris-kernel@axis.com
2010-10-07 14:08:55 +01:00
Martin Schwidefsky 050eef364a [S390] fix tlb flushing vs. concurrent /proc accesses
The tlb flushing code uses the mm_users field of the mm_struct to
decide if each page table entry needs to be flushed individually with
IPTE or if a global flush for the mm_struct is sufficient after all page
table updates have been done. The comment for mm_users says "How many
users with user space?" but the /proc code increases mm_users after it
found the process structure by pid without creating a new user process.
Which makes mm_users useless for the decision between the two tlb
flusing methods. The current code can be confused to not flush tlb
entries by a concurrent access to /proc files if e.g. a fork is in
progres. The solution for this problem is to make the tlb flushing
logic independent from the mm_users field.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-24 09:26:34 +02:00
Linus Torvalds 0d6ffdb8f1 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] dasd: tunable missing interrupt handler
  [S390] dasd: allocate fallback cqr for reserve/release
  [S390] topology: use default MC domain initializer
  [S390] initrd: change default load address
  [S390] cmm, smsgiucv_app: convert sender to uppercase
  [S390] cmm: add missing __init/__exit annotations
  [S390] cio: use all available paths for some internal I/O
  [S390] ccwreq: add ability to use all paths
  [S390] cio: ccw_device_online_store return -EINVAL in case of missing driver
  [S390] cio: Log the response from the unit check handler
  [S390] cio: CHSC SIOSL Support
2010-08-10 14:01:26 -07:00
Heiko Carstens a1b200e27c mm: provide init_mm mm_context initializer
Provide an INIT_MM_CONTEXT intializer macro which can be used to
statically initialize mm_struct:mm_context of init_mm.  This way we can
get rid of code which will do the initialization at run time (on s390).

In addition the current code can be found at a place where it is not
expected.  So let's have a common initializer which architectures
can use if needed.

This is based on a patch from Suzuki Poulose.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Suzuki Poulose <suzuki@in.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:44:54 -07:00
Hendrik Brueckner 41b4287677 [S390] cmm, smsgiucv_app: convert sender to uppercase
The sender kernel parameter contains a z/VM user ID where
alphabetic characters must be specified in uppercase.

Allow users to specify lowercase characters and convert the
sender string to uppercase at module initialization.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-09 18:12:54 +02:00
Heiko Carstens 2e85ba510e [S390] cmm: add missing __init/__exit annotations
Add missing __init and __exit annoations for module init and exit
functions. This will save us huge amounts of memory... sort of.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-09 18:12:54 +02:00
Heiko Carstens c2f0e8c803 [S390] appldata/extmem/kvm: add missing GFP_KERNEL flag
Add missing GFP flag to memory allocations. The part in cio only
changes a comment.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-06-08 18:58:23 +02:00
Heiko Carstens cf9daf4a73 [S390] cmm: get rid of CMM_PROC config option
All distros have this option switched on, so lets get rid of at least
one of the tons of config options that are available.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-05-26 23:27:10 +02:00
Heiko Carstens db705e831a [S390] cmm: remove superfluous EXPORT_SYMBOLs plus cleanups
Remove superfluous EXPORT_SYMBOLS and do coding style cleanup while
being at it.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-05-26 23:27:10 +02:00
Heiko Carstens 1ef6acf597 [S390] cmm: fix crash on module unload
There might be a scheduled cmm_timer if the cmm module gets unloaded.
That timer was not deleted during module unload and thus could lead
to system crash later on.
Besides that reorder function calls in module init and exit code to
avoid a couple of other races which could lead to accesses to
uninitialized data.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-05-26 23:26:29 +02:00
Heiko Carstens ab3c68ee5f [S390] debug: enable exception-trace debug facility
The exception-trace facility on x86 and other architectures prints
traces to dmesg whenever a user space application crashes.
s390 has such a feature since ages however it is called
userprocess_debug and is enabled differently.
This patch makes sure that whenever one of the two procfs files

/proc/sys/kernel/userprocess_debug
/proc/sys/debug/exception-trace

is modified the contents of the second one changes as well.
That way we keep backwards compatibilty but also support the same
interface like other architectures do.
Besides that the output of the traces is improved since it will now
also contain the corresponding filename of the vma (when available)
where the process caused a fault or trap.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-05-17 10:00:17 +02:00
Christian Borntraeger 6af7eea2ae [S390] s390: disable change bit override
commit 6a985c6194
([S390] s390: use change recording override for kernel mapping)
deactivated the change bit recording for the kernel mapping to
improve the performance. This works most of the time, but there
are cases (e.g. kernel runs in home space, futex atomic compare xcmg)
where we modify user memory with the kernel mapping instead of the
user mapping.
Instead of fixing these cases, this patch just deactivates change bit
override to avoid future problems with other kernel code that might
use the kernel mapping for user memory.

CC: stable@kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-04-09 13:43:02 +02:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Michael Holzheu 92fe31329c [S390] zcore: CPU registers are not saved under LPAR
To save the registers for all CPUs a sigp "store status" is done that
stores the registers to address absolute zero. To access storage at
absolute zero, normally the address of the prefix register of the
accessing CPU has to be used. This does not work when large pages are
active (currently only under LPAR). In order to fix that problem,
instead of memcpy memcpy_real is used, which switches to real mode
where prefixing works.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-03-24 11:49:53 +01:00
Hendrik Brueckner 09003ed90a [S390] smsgiucv: declare char pointers as "const"
Declare the smsgiucv prefix char pointer as "const" and use
use const char pointers in callback functions.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-03-08 12:26:28 +01:00
Heiko Carstens 22e0a04672 [S390] use kprobes_built_in() in mm/fault code
Use kprobes_built_in() to avoid ifdefs like most other architectures do.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-26 22:37:32 +01:00
Heiko Carstens cbb870c822 [S390] Cleanup struct _lowcore usage and defines.
Use asm offsets to make sure the offset defines to struct _lowcore and
its layout don't get out of sync.
Also add a BUILD_BUG_ON() which checks that the size of the structure
is sane.
And while being at it change those sites which use odd casts to access
the current lowcore. These should use S390_lowcore instead.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-26 22:37:31 +01:00
Heiko Carstens d96221ab1e [S390] free_initmem: reduce code duplication
free_initmem() and free_initrd_mem() are nearly identical. So make them
call a common function.
Also fixes a bug: if the initrd wouldn't start on a page boundary also
memory after the initrd would be initialized with the poison value.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-26 22:37:31 +01:00
Heiko Carstens b8e660b83d [S390] Replace ENOTSUPP usage with EOPNOTSUPP
ENOTSUPP is not supposed to leak to userspace so lets just use
EOPNOTSUPP everywhere.
Doesn't fix a bug, but makes future reviews easier.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-26 22:37:31 +01:00
Jiri Slaby a58c26bba9 [S390] use helpers for rlimits
Make sure compiler won't do weird things with limits. E.g. fetching
them twice may return 2 different values after writable limits are
implemented.

I.e. either use rlimit helpers added in
3e10e716ab
or ACCESS_ONCE if not applicable.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-01-13 20:44:45 +01:00
Linus Torvalds 67dd2f5a66 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (72 commits)
  [S390] 3215/3270 console: remove wrong comment
  [S390] dasd: remove BKL from extended error reporting code
  [S390] vmlogrdr: remove BKL
  [S390] vmur: remove BKL
  [S390] zcrypt: remove BKL
  [S390] 3270: remove BKL
  [S390] vmwatchdog: remove lock_kernel() from open() function
  [S390] monwriter: remove lock_kernel() from open() function
  [S390] monreader: remove lock_kernel() from open() function
  [S390] s390: remove unused nfsd #includes
  [S390] ftrace: build ftrace.o when CONFIG_FTRACE_SYSCALLS is set for s390
  [S390] etr/stp: put correct per cpu variable
  [S390] tty3270: move keyboard compat ioctls
  [S390] sclp: improve servicability setting
  [S390] s390: use change recording override for kernel mapping
  [S390] MAINTAINERS: Add s390 drivers block
  [S390] use generic sockios.h header file
  [S390] use generic termbits.h header file
  [S390] smp: remove unused typedef and defines
  [S390] cmm: free pages on hibernate.
  ...
2009-12-09 19:01:47 -08:00
Christian Borntraeger 6a985c6194 [S390] s390: use change recording override for kernel mapping
We dont need the dirty bit if a write access is done via the kernel
mapping. In that case SetPageDirty and friends are used anyway, no
need to do that a second time. We can use the change-recording
overide function for the kernel mapping, if available.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:37 +01:00
Martin Schwidefsky 52b169c864 [S390] cmm: free pages on hibernate.
The pages allocated by the cmm memory balloon should be freed before
the hibernation image is created. Otherwise the memory reserved by the
balloon gets written to the swap device but there is no content in
these pages that need to be preserved.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:37 +01:00
Gerald Schaefer 6c1e3e7943 [S390] Use do_exception() in pagetable walk usercopy functions.
The pagetable walk usercopy functions have used a modified copy of the
do_exception() function for fault handling. This lead to inconsistencies
with recent changes to do_exception(), e.g. performance counters. This
patch changes the pagetable walk usercopy code to call do_exception()
directly, eliminating the redundancy. A new parameter is added to
do_exception() to specify the fault address.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:34 +01:00
Martin Schwidefsky 1ab947de29 [S390] fault handler access flags check.
Simplify the check of the vma->flags in do_exception for the
different fault types.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:34 +01:00
Martin Schwidefsky 50d7280d43 [S390] fault handler performance optimization.
Slim down the do_exception function to handle only the fast path of a
fault and move the exceptional cases into a new function. That slightly
increases the performance of the fault handling.

Build fix for !CONFIG_COMPAT by
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:33 +01:00
Martin Schwidefsky 7ecb344ae8 [S390] Improve notify_page_fault implementation.
notify_page_fault does a preempt_disable/preempt_enable for each
fault generated by a kernel access to user space. If kprobes
is not active that is unnecessary since the interrupts are not
reenabled yet. To play safe repeat the kprobe_running check after
preempt_disable().

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:33 +01:00
Martin Schwidefsky b11b533427 [S390] Improve address space mode selection.
Introduce user_mode to replace the two variables switch_amode and
s390_noexec. There are three valid combinations of the old values:
  1) switch_amode == 0 && s390_noexec == 0
  2) switch_amode == 1 && s390_noexec == 0
  3) switch_amode == 1 && s390_noexec == 1
They get replaced by
  1) user_mode == HOME_SPACE_MODE
  2) user_mode == PRIMARY_SPACE_MODE
  3) user_mode == SECONDARY_SPACE_MODE
The new kernel parameter user_mode=[primary,secondary,home] lets
you choose the address space mode the user space processes should
use. In addition the CONFIG_S390_SWITCH_AMODE config option
is removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:33 +01:00
Martin Schwidefsky 61365e132e [S390] Improve address space check.
A data access in access-register mode always is a user mode access,
the code to inspect the access-registers can be removed. The second
change is to use a different test to check for no-execute fault.
The third change is to pass the translation exception identification
as parameter, in theory the trans_exc_code in the lowcore could have
been overwritten by the time the call to check_space from do_no_context
is done.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:33 +01:00
Eric W. Biederman 6d4561110a sysctl: Drop & in front of every proc_handler.
For consistency drop & in front of every proc_handler.  Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-18 08:37:40 -08:00
Eric W. Biederman b05fd35d91 sysctl s390: Remove dead sysctl binary support
Now that sys_sysctl is a generic wrapper around /proc/sys  .ctl_name
and .strategy members of sysctl tables are dead code.  Remove them.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-12 02:05:01 -08:00
Martin Schwidefsky 52a21f2cee [S390] fix build breakage with CONFIG_AIO=n
next-20090925 randconfig build breaks on s390x, with CONFIG_AIO=n.

arch/s390/mm/pgtable.c: In function 's390_enable_sie':
arch/s390/mm/pgtable.c:282: error: 'struct mm_struct' has no member named 'ioctx_list'
arch/s390/mm/pgtable.c:298: error: 'struct mm_struct' has no member named 'ioctx_list'
make[1]: *** [arch/s390/mm/pgtable.o] Error 1

Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-10-06 10:35:05 +02:00
Alexey Dobriyan 8d65af789f sysctl: remove "struct file *" argument of ->proc_handler
It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Linus Torvalds 9fd815b55f Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (22 commits)
  [S390] Update default configuration.
  [S390] hibernate: Do real CPU swap at resume time
  [S390] dasd: tolerate devices that have no feature codes
  [S390] zcrypt: Do not add/remove devices in s/r callbacks
  [S390] hibernate: make sure pfn_is_nosave handles lowcore pages
  [S390] smp: introduce LC_ORDER and simplify lowcore handling
  [S390] ptrace: use common code for simple peek/poke operations
  [S390] fix disabled_wait inline assembly clobber list
  [S390] Change kernel_page_present coding style.
  [S390] hibernation: reset system after resume
  [S390] hibernation: fix guest page hinting related crash
  [S390] Get rid of init_module/delete_module compat functions.
  [S390] Convert sys_execve to function with parameters.
  [S390] Convert sys_clone to function with parameters.
  [S390] qdio: change state of all primed input buffers
  [S390] qdio: reduce per device debug messages
  [S390] cio: introduce consistent subchannel scanning
  [S390] cio: idset use actual number of ssids
  [S390] cio: dont kfree vmalloced memory
  [S390] cio: introduce css_settle
  ...
2009-09-23 10:02:14 -07:00
Heiko Carstens 87458ff458 [S390] Change kernel_page_present coding style.
Make the inline assembly look like all others.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-22 22:58:44 +02:00
Heiko Carstens 846955c8af [S390] hibernation: fix guest page hinting related crash
On resume the system that loads the to be resumed image might have
unstable pages.
When the resume image is copied back and a write access happen to an
unstable page this causes an exception and the system crashes.

To fix this set all free pages to stable before copying the resumed
image data. Also after everything has been restored set all free
pages of the resumed system to unstable again.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-22 22:58:44 +02:00
Geert Uytterhoeven cc013a8890 arches: drop superfluous casts in nr_free_pages() callers
Commit 9617729941 ("Drop free_pages()")
modified nr_free_pages() to return 'unsigned long' instead of 'unsigned
int'.  This made the casts to 'unsigned long' in most callers superfluous,
so remove them.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <zankel@tensilica.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:34 -07:00
Ingo Molnar cdd6c482c9 perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:28:04 +02:00
Heiko Carstens bde69af2ab [S390] Wire up page fault events for software perf counters.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11 10:29:56 +02:00
Heiko Carstens 2ddddf3e0a [S390] Enable guest page hinting by default.
Get rid of the PAGE_STATES config option and enable guest page hinting
by default.
It can be disabled by specifying "cmma=off" at the command line.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11 10:29:54 +02:00
Martin Schwidefsky 50aa98bad0 [S390] fix recursive locking on page_table_lock
Suzuki Poulose reported the following recursive locking bug on s390:

Here is the stack trace : (see Appendix I for more info)

  [<0000000000406ed6>] _spin_lock+0x52/0x94
  [<0000000000103bde>] crst_table_free+0x14e/0x1a4
  [<00000000001ba684>] __pmd_alloc+0x114/0x1ec
  [<00000000001be8d0>] handle_mm_fault+0x2cc/0xb80
  [<0000000000407d62>] do_dat_exception+0x2b6/0x3a0
  [<0000000000114f8c>] sysc_return+0x0/0x8
  [<00000200001642b2>] 0x200001642b2

The page_table_lock is already acquired in __pmd_alloc (mm/memory.c) and
it tries to populate the pud/pgd with a new pmd allocated. If another
thread populates it before we get a chance, we free the pmd using
pmd_free().

On s390x, pmd_free(even pud_free ) is #defined to crst_table_free(),
which acquires the page_table_lock to protect the crst_table index updates.

Hence this ends up in a recursive locking of the page_table_lock.

The solution suggested by Dave Hansen is to use a new spin lock in the mmu
context to protect the access to the crst_list and the pgtable_list.

Reported-by: Suzuki Poulose <suzuki@in.ibm.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11 10:29:53 +02:00
Alexey Dobriyan 405f55712d headers: smp_lock.h redux
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-12 12:22:34 -07:00
Linus Torvalds d06063cc22 Move FAULT_FLAG_xyz into handle_mm_fault() callers
This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault().  All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-21 13:08:22 -07:00
Hans-Joachim Picht 7db11a363f [S390] pm: add kernel_page_present
Fix the following build failure caused by make allyesconfig using
CONFIG_HIBERNATION and CONFIG_DEBUG_PAGEALLOC

kernel/built-in.o: In function `saveable_page':
kernel/power/snapshot.c:897: undefined reference to `kernel_page_present'
kernel/built-in.o: In function `safe_copy_page':
kernel/power/snapshot.c:948: undefined reference to `kernel_page_present'
make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Hans-Joachim Picht <hans@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-16 10:31:11 +02:00
Heiko Carstens 88df125fd6 [S390] maccess: arch specific probe_kernel_write() implementation
Add an s390 specific probe_kernel_write() function which allows to
write to the kernel text segment even if write protection is enabled.
This is implemented using the lra (load real address) and stura (store
using real address) instructions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-12 10:27:37 +02:00
Heiko Carstens 239a64255f [S390] vmalloc: add vmalloc kernel parameter support
With the kernel parameter 'vmalloc=<size>' the size of the vmalloc area
can be specified. This can be used to increase or decrease the size of
the area. Works in the same way as on some other architectures.
This can be useful for features which make excessive use of vmalloc and
wouldn't work otherwise.
The default sizes remain unchanged: 96MB for 31 bit kernels and 1GB for
64 bit kernels.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-12 10:27:33 +02:00
Heiko Carstens 7757591ab4 [S390] implement is_compat_task
Implement is_compat_task and use it all over the place.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-12 10:27:30 +02:00
Rusty Russell 005f8eee6f [S390] cpumask: use mm_cpumask() wrapper
Makes code futureproof against the impending change to mm->cpu_vm_mask.

It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-03-26 15:24:34 +01:00
Heiko Carstens 1485c5c884 [S390] move EXPORT_SYMBOLs to definitions
Move all EXPORT_SYMBOLs to their corresponding definitions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-03-26 15:24:11 +01:00
Carsten Otte 702d9e584f [S390] check addressing mode in s390_enable_sie
The sie instruction requires address spaces to be switched
to run proper. This patch verifies that this is the case
in s390_enable_sie, otherwise the kernel would crash badly
as soon as the process runs into sie.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-03-26 15:24:09 +01:00
Heiko Carstens 59fa4392dd [S390] page fault: invoke oom-killer
s390 arch backend for 1c0fe6e3bd
"mm: invoke oom-killer from page fault".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-03-26 15:24:03 +01:00
Martin Schwidefsky 0fb1d9bcbc [S390] make page table upgrade work again
After TASK_SIZE now gives the current size of the address space the
upgrade of a 64 bit process from 3 to 4 levels of page table  needs
to use the arch_mmap_check hook to catch large mmap lengths. The
get_unmapped_area* functions need to check for -ENOMEM from the
arch_get_unmapped_area*, upgrade the page table and retry.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-03-18 13:28:13 +01:00
Martin Schwidefsky f481bfafd3 [S390] make page table walking more robust
Make page table walking on s390 more robust. The current code requires
that the pgd/pud/pmd/pte loop is only done for address ranges that are
below the end address of the last vma of the address space. But this
is not always true, e.g. the generic page table walker does not guarantee
this. Change TASK_SIZE/TASK_SIZE_OF to reflect the current size of the
address space. This makes the generic page table walker happy but it
breaks the upgrade of a 3 level page table to a 4 level page table.
To make the upgrade work again another fix is required.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-03-18 13:28:13 +01:00
Gary Hade c04fc586c1 mm: show node to memory section relationship with symlinks in sysfs
Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
  - Provides information needed to determine the specific node
    on which a defective DIMM is located.  This will reduce system
    downtime when the node or defective DIMM is swapped out.
  - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM.  This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node.  The consequences of reintroducing the defective memory
    could be ugly.
  - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
Future:
  - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems.  Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:00 -08:00
Jens Axboe abf137dd77 aio: make the lookup_ioctx() lockless
The mm->ioctx_list is currently protected by a reader-writer lock,
so we always grab that lock on the read side for doing ioctx
lookups. As the workload is extremely reader biased, turn this into
an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
the rwlock and use a spinlock for providing update side exclusion.

There's usually only 1 entry on this list, so it doesn't make sense
to look into fancier data structures.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:29:50 +01:00
Hongjie Yang 93098bf015 [S390] convert dcssblk and extmem printks messages to pr_xxx macros.
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-25 13:39:23 +01:00
Christian Borntraeger 250cf776f7 [S390] pgtables: Fix race in enable_sie vs. page table ops
The current enable_sie code sets the mm->context.pgstes bit to tell
dup_mm that the new mm should have extended page tables. This bit is also
used by the s390 specific page table primitives to decide about the page
table layout - which means context.pgstes has two meanings. This can cause
any kind of bugs. For example  - e.g. shrink_zone can call
ptep_clear_flush_young while enable_sie is running. ptep_clear_flush_young
will test for context.pgstes. Since enable_sie changed that value of the old
struct mm without changing the page table layout ptep_clear_flush_young will
do the wrong thing.
The solution is to split pgstes into two bits
- one for the allocation
- one for the current state

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-10-28 11:12:03 +01:00
Badari Pulavarty 71088785c6 mm: cleanup to make remove_memory() arch-neutral
There is nothing architecture specific about remove_memory().
remove_memory() function is common for all architectures which support
hotplug memory remove.  Instead of duplicating it in every architecture,
collapse them into arch neutral function.

[akpm@linux-foundation.org: fix the export]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:50:25 -07:00
Hongjie Yang b2300b9efe [S390] dcssblk: add >2G DCSSs support and stacked contiguous DCSSs support.
The DCSS block device driver is modified to add >2G DCSSs support and
allow a DCSS block device to map to a set of contiguous DCSSs.  The
extmem code is also modified to use new Diagnose x'64' subcodes for
>2G DCSSs.

Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-10-10 21:33:57 +02:00
Gerald Schaefer 7e9238fbc1 [S390] Add support for memory hot-remove.
This patch enables memory hot-remove on s390.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-08-01 16:39:33 +02:00
Johannes Weiner c55281dee0 s390: use generic show_mem()
Remove arch-specific show_mem() in favor of the generic version.

This also removes the following redundant information display:

	- pages in swapcache, printed by show_swap_cache_info()

where show_mem() calls show_free_areas(), which calls
show_swap_cache_info().

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:11 -07:00
Andi Kleen ceb8687961 hugetlb: introduce pud_huge
Straight forward extensions for huge pages located in the PUD instead of
PMDs.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:18 -07:00
Andi Kleen a551643895 hugetlb: modular state for hugetlb page size
The goal of this patchset is to support multiple hugetlb page sizes.  This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg.  huge page
size, number of huge pages currently allocated, etc).

The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.

This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.

Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Heiko Carstens 421c175c4d [S390] Add support for memory hot-add.
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-07-14 10:02:16 +02:00
Linus Torvalds a4df1ac12d Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: MMU: Fix is_empty_shadow_page() check
  KVM: MMU: Fix printk() format string
  KVM: IOAPIC: only set remote_irr if interrupt was injected
  KVM: MMU: reschedule during shadow teardown
  KVM: VMX: Clear CR4.VMXE in hardware_disable
  KVM: migrate PIT timer
  KVM: ppc: Report bad GFNs
  KVM: ppc: Use a read lock around MMU operations, and release it on error
  KVM: ppc: Remove unmatched kunmap() call
  KVM: ppc: add lwzx/stwz emulation
  KVM: ppc: Remove duplicate function
  KVM: s390: Fix race condition in kvm_s390_handle_wait
  KVM: s390: Send program check on access error
  KVM: s390: fix interrupt delivery
  KVM: s390: handle machine checks when guest is running
  KVM: s390: fix locking order problem in enable_sie
  KVM: s390: use yield instead of schedule to implement diag 0x44
  KVM: x86 emulator: fix hypercall return value on AMD
  KVM: ia64: fix zero extending for mmio ld1/2/4 emulation in KVM
2008-06-11 10:35:44 -07:00
Heiko Carstens ee0ddadd08 [S390] vmemmap: fix off-by-one bug.
If a memory range is supposed to be added to the 1:1 mapping and it
ends just below the maximum supported physical address it won't
succeed. This is because a test doesn't consider that the end address
is 1 smaller than start + size.
Fix the comparison.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-06-10 10:03:27 +02:00
Christian Borntraeger 74b6b522ec KVM: s390: fix locking order problem in enable_sie
There are potential locking problem in enable_sie. We take the task_lock
and the mmap_sem. As exit_mm uses the same locks vice versa, this triggers
a lockdep warning.
The second problem is that dup_mm and mmput might sleep, so we must not
hold the task_lock at that moment.

The solution is to dup the mm unconditional and use the task_lock before and
afterwards to check  if we can use the new mm. dup_mm and mmput are called
outside the task_lock, but we run update_mm while holding the task_lock,
protection us against ptrace.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:08:26 +03:00
Heiko Carstens c1bb7f31ea [S390] showmem: Only walk spanned pages.
Convert show_mem() so its nearly the same as on x86/powerpc.
Gives us proper locking and we get also rid of the only use of max_mapnr.
Also the number of pages was contained in an int which might not be
sufficient not too far in the future.

Cc: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-05-30 10:03:34 +02:00
Heiko Carstens 67060d9c1f [S390] Fix section mismatch warnings.
This fixes the last remaining section mismatch warnings in s390
architecture code. It reveals also a real bug introduced by... me
with git commit 2069e978d5
("[S390] sparsemem vmemmap: initialize memmap.")

Calling the generic vmemmap_alloc_block() function to get initialized
memory is a nice idea, however that function is __meminit annotated
and therefore the function might be gone if we try to call it later.
This can happen if a DCSS segment gets added.

So basically revert the patch and clear the memmap explicitly to fix
the original bug.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-05-30 10:03:34 +02:00
Heiko Carstens 2069e978d5 [S390] sparsemem vmemmap: initialize memmap.
Let's just use the generic vmmemmap_alloc_block() function which
always returns initialized memory.

Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-05-15 16:52:38 +02:00
Martin Schwidefsky 45e576b1c3 [S390] guest page hinting light
Use the existing arch_alloc_page/arch_free_page callbacks to do
the guest page state transitions between stable and unused.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-05-07 09:23:02 +02:00
Heiko Carstens 17f3458085 [S390] Convert to SPARSEMEM & SPARSEMEM_VMEMMAP
Convert s390 to SPARSEMEM and SPARSEMEM_VMEMMAP. We do a select
of SPARSEMEM_VMEMMAP since it is configurable. This is because
SPARSEMEM without SPARSEMEM_VMEMMAP gives us a hell of broken
include dependencies that I don't want to fix.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:48 +02:00
Gerald Schaefer 53492b1de4 [S390] System z large page support.
This adds hugetlbfs support on System z, using both hardware large page
support if available and software large page emulation on older hardware.
Shared (large) page tables are implemented in software emulation mode,
by using page->index of the first tail page from a compound large page
to store page table information.

Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:47 +02:00
Heiko Carstens 8fc6365868 [S390] vmemmap: use clear_table to initialise page tables.
Always use clear_table to initialise page tables. The overlapping
memcpy is just a leftover of a previous version that wasn't fully
converted to clear_table.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:46 +02:00
Carsten Otte 402b08622d s390: KVM preparation: provide hook to enable pgstes in user pagetable
The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.

Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.

This patch has a small common code hit, namely making dup_mm non-static.

Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:40 +03:00
Martin Schwidefsky ca68305bf3 [S390] Remove code duplication from monreader / dcssblk.
Move the function that prints the segment warning messages found in the
monreader driver and the dcssblk driver to the extmem base code.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2008-04-17 07:47:07 +02:00
Heiko Carstens a806170e29 [S390] Fix a lot of sparse warnings.
Most noteable part of this commit is the new local header file entry.h
which contains all the function declarations of functions that get only
called from asm code or are arch internal. That way we can avoid extern
declarations in C files.
This is more or less the same that was done for sparc64.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2008-04-17 07:47:06 +02:00
Johannes Weiner 1e42f32785 [S390] remove redundant display of free swap space in show_mem()
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2008-04-17 07:47:04 +02:00
Heiko Carstens c2e8b8531b [S390] exec_protect: Fix incorrect extern declarations.
sys_sigreturn and sys_rt_sigreturn don't take any arguments. So luckily
this resulted only in unneeded instead of incorrect code.
But still this clearly shows why one should not put extern declarations
in C files (will be fixed with a larger sparse patch).

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2008-04-17 07:46:59 +02:00
Martin Schwidefsky 6252d702c5 [S390] dynamic page tables.
Add support for different number of page table levels dependent
on the highest address used for a process. This will cause a 31 bit
process to use a two level page table instead of the four level page
table that is the default after the pud has been introduced. Likewise
a normal 64 bit process will use three levels instead of four. Only
if a process runs out of the 4 tera bytes which can be addressed with
a three level page table the fourth level is dynamically added. Then
the process can use up to 8 peta byte.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-02-09 18:24:41 +01:00
Martin Schwidefsky 5a216a2083 [S390] Add four level page tables for CONFIG_64BIT=y.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-02-09 18:24:40 +01:00
Martin Schwidefsky 146e4b3c8b [S390] 1K/2K page table pages.
This patch implements 1K/2K page table pages for s390.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-02-09 18:24:40 +01:00
Martin Schwidefsky 2f569afd9c CONFIG_HIGHPTE vs. sub-page page tables.
Background: I've implemented 1K/2K page tables for s390.  These sub-page
page tables are required to properly support the s390 virtualization
instruction with KVM.  The SIE instruction requires that the page tables
have 256 page table entries (pte) followed by 256 page status table entries
(pgste).  The pgstes are only required if the process is using the SIE
instruction.  The pgstes are updated by the hardware and by the hypervisor
for a number of reasons, one of them is dirty and reference bit tracking.
To avoid wasting memory the standard pte table allocation should return
1K/2K (31/64 bit) and 2K/4K if the process is using SIE.

Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
the s390 version for pte_alloc_one cannot return a pointer to a struct
page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
cannot return a pointer to a pte either, since that would require more than
32 bit for the return value of pte_alloc_one (and the pte * would not be
accessible since its not kmapped).

Solution: The only solution I found to this dilemma is a new typedef: a
pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
later patch.  For everybody else it will be a (struct page *).  The
additional problem with the initialization of the ptl lock and the
NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
a destructor pgtable_page_dtor.  The page table allocation and free
functions need to call these two whenever a page table page is allocated or
freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
 To get the pgtable_t back from a pmd entry that has been installed with
pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
call in free_pte_range and apply_to_pte_range.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:42 -08:00
Heiko Carstens 0189103c69 [S390] Remove BUILD_BUG_ON() in vmem code.
Remove BUILD_BUG_ON() in vmem code since it causes build failures if
the size of struct page increases. Instead calculate at compile time
the address of the highest physical address that can be added to the
1:1 mapping.
This supposed to fix a build failure with the page owner tracking leak
detector patches as reported by akpm.

page-owner-tracking-leak-detector-broken-on-s390.patch can be removed
from -mm again when this is merged.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-02-05 16:51:01 +01:00