Commit Graph

853 Commits

Author SHA1 Message Date
Paul E. McKenney 9b1d82fa16 rcu: "Tiny RCU", The Bloatwatch Edition
This patch is a version of RCU designed for !SMP provided for a
small-footprint RCU implementation.  In particular, the
implementation of synchronize_rcu() is extremely lightweight and
high performance. It passes rcutorture testing in each of the
four relevant configurations (combinations of NO_HZ and PREEMPT)
on x86.  This saves about 1K bytes compared to old Classic RCU
(which is no longer in mainline), and more than three kilobytes
compared to Hierarchical RCU (updated to 2.6.30):

	CONFIG_TREE_RCU:

	   text	   data	    bss	    dec	    filename
	    183       4       0     187     kernel/rcupdate.o
	   2783     520      36    3339     kernel/rcutree.o
				   3526 Total (vs 4565 for v7)

	CONFIG_TREE_PREEMPT_RCU:

	   text	   data	    bss	    dec	    filename
	    263       4       0     267     kernel/rcupdate.o
	   4594     776      52    5422     kernel/rcutree.o
	   			   5689 Total (6155 for v7)

	CONFIG_TINY_RCU:

	   text	   data	    bss	    dec	    filename
	     96       4       0     100     kernel/rcupdate.o
	    734      24       0     758     kernel/rcutiny.o
	    			    858 Total (vs 848 for v7)

The above is for x86.  Your mileage may vary on other platforms.
Further compression is possible, but is being procrastinated.

Changes from v7 (http://lkml.org/lkml/2009/10/9/388)

o	Apply Lai Jiangshan's review comments (aside from
might_sleep() 	in synchronize_sched(), which is covered by SMP builds).

o	Fix up expedited primitives.

Changes from v6 (http://lkml.org/lkml/2009/9/23/293).

o	Forward ported to put it into the 2.6.33 stream.

o	Added lockdep support.

o	Make lightweight rcu_barrier.

Changes from v5 (http://lkml.org/lkml/2009/6/23/12).

o	Ported to latest pre-2.6.32 merge window kernel.

	- Renamed rcu_qsctr_inc() to rcu_sched_qs().
	- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
	- Provided trivial rcu_cpu_notify().
	- Provided trivial exit_rcu().
	- Provided trivial rcu_needs_cpu().
	- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.

o	Removed the dependence on EMBEDDED, with a view to making
	TINY_RCU default for !SMP at some time in the future.

o	Added (trivial) support for expedited grace periods.

Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:

o	Squeeze the size down a bit further by removing the
	->completed field from struct rcu_ctrlblk.

o	This permits synchronize_rcu() to become the empty function.
	Previous concerns about rcutorture were unfounded, as
	rcutorture correctly handles a constant value from
	rcu_batches_completed() and rcu_batches_completed_bh().

Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:

o	Changed rcu_batches_completed(), rcu_batches_completed_bh()
	rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
	rcu_nmi_exit(), to be static inlines, as suggested by David
	Howells.  Doing this saves about 100 bytes from rcutiny.o.
	(The numbers between v3 and this v4 of the patch are not directly
	comparable, since they are against different versions of Linux.)

Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:

o	Fix whitespace issues.

o	Change short-circuit "||" operator to instead be "+" in order
to 	fix performance bug noted by "kraai" on LWN.

		(http://lwn.net/Articles/324348/)

Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:

o	This version depends on EMBEDDED as well as !SMP, as suggested
	by Ingo.

o	Updated rcu_needs_cpu() to unconditionally return zero,
	permitting the CPU to enter dynticks-idle mode at any time.
	This works because callbacks can be invoked upon entry to
	dynticks-idle mode.

o	Paul is now OK with this being included, based on a poll at
the 	Kernel Miniconf at linux.conf.au, where about ten people said
	that they cared about saving 900 bytes on single-CPU systems.

o	Applies to both mainline and tip/core/rcu.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351355-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-26 09:40:29 +01:00
Linus Torvalds f579bbcd9b Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: fix requeue_pi key imbalance
  futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
  rcu: Place root rcu_node structure in separate lockdep class
  rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
  rcu: Move rcu_barrier() to rcutree
  futex: Move exit_pi_state() call to release_mm()
  futex: Nullify robust lists after cleanup
  futex: Fix locking imbalance
  panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
  rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
  rcu: Clean up code based on review feedback from Josh Triplett, part 4
  rcu: Clean up code based on review feedback from Josh Triplett, part 3
  rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
  rcu: Clean up code to address Ingo's checkpatch feedback
  rcu: Clean up code based on review feedback from Josh Triplett, part 2
  rcu: Clean up code based on review feedback from Josh Triplett
2009-10-08 12:16:35 -07:00
Peter Zijlstra 906010b213 perf_event: Provide vmalloc() based mmap() backing
Some architectures such as Sparc, ARM and MIPS (basically
everything with flush_dcache_page()) need to deal with dcache
aliases by carefully placing pages in both kernel and user maps.

These architectures typically have to use vmalloc_user() for this.

However, on other architectures, vmalloc() is not needed and has
the downsides of being more restricted and slower than regular
allocations.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1254830228.21044.272.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-06 14:21:50 +02:00
Linus Torvalds 94a8d5caba Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
  cpumask: Move deprecated functions to end of header.
  cpumask: remove unused deprecated functions, avoid accusations of insanity
  cpumask: use new-style cpumask ops in mm/quicklist.
  cpumask: use mm_cpumask() wrapper: x86
  cpumask: use mm_cpumask() wrapper: um
  cpumask: use mm_cpumask() wrapper: mips
  cpumask: use mm_cpumask() wrapper: mn10300
  cpumask: use mm_cpumask() wrapper: m32r
  cpumask: use mm_cpumask() wrapper: arm
  cpumask: Use accessors for cpu_*_mask: um
  cpumask: Use accessors for cpu_*_mask: powerpc
  cpumask: Use accessors for cpu_*_mask: mips
  cpumask: Use accessors for cpu_*_mask: m32r
  cpumask: remove arch_send_call_function_ipi
  cpumask: arch_send_call_function_ipi_mask: s390
  cpumask: arch_send_call_function_ipi_mask: powerpc
  cpumask: arch_send_call_function_ipi_mask: mips
  cpumask: arch_send_call_function_ipi_mask: m32r
  cpumask: arch_send_call_function_ipi_mask: alpha
  cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
  ...
2009-09-23 18:14:11 -07:00
Alexey Dobriyan 2bcd57ab61 headers: utsname.h redux
* remove asm/atomic.h inclusion from linux/utsname.h --
   not needed after kref conversion
 * remove linux/utsname.h inclusion from files which do not need it

NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 18:13:10 -07:00
Rusty Russell 72d78d05cb cpumask: remove unused cpu_mask_all
It's only defined for NR_CPUS > BITS_PER_LONG; cpu_all_mask is always
defined (and const).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24 09:34:36 +09:30
Linus Torvalds c37efa9325 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
  Use macros for .data.page_aligned section.
  Use macros for .bss.page_aligned section.
  Use new __init_task_data macro in arch init_task.c files.
  kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
  arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
  kbuild: add static to prototypes
  kbuild: fail build if recordmcount.pl fails
  kbuild: set -fconserve-stack option for gcc 4.5
  kbuild: echo the record_mcount command
  gconfig: disable "typeahead find" search in treeviews
  kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
  checkincludes.pl: add option to remove duplicates in place
  markup_oops: use modinfo to avoid confusion with underscored module names
  checkincludes.pl: provide usage helper
  checkincludes.pl: close file as soon as we're done with it
  ctags: usability fix
  kernel hacking: move STRIP_ASM_SYMS from General
  gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
  kbuild: Check if linker supports the -X option
  kbuild: introduce ld-option
  ...

Fix trivial conflict in scripts/basic/fixdep.c
2009-09-23 15:37:02 -07:00
Paul E. McKenney 1eba8f8438 rcu: Clean up code based on review feedback from Josh Triplett, part 2
These issues identified during an old-fashioned face-to-face code
review extending over many hours.

o	Add comments for tricky parts of code, and correct comments
	that have passed their sell-by date.

o	Get rid of the vestiges of rcu_init_sched(), which is no
	longer needed now that PREEMPT_RCU is gone.

o	Move the #include of rcutree_plugin.h to the end of
	rcutree.c, which means that, rather than having a random
	collection of forward declarations, the new set of forward
	declarations document the set of plugins.  The new home for
	this #include also allows __rcu_init_preempt() to move into
	rcutree_plugin.h.

o	Fix rcu_preempt_check_callbacks() to be static.

Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12537246443924-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra <peterz@infradead.org>
2009-09-23 19:46:29 +02:00
Linus Torvalds 746942d06a Merge branch 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6
* 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6:
  SFI: remove unneeded includes
  sfi: Remove unused code
  SFI: Hook PCI MMCONFIG
  x86: add arch-specific SFI support
  SFI: add capability to parse ACPI tables
  SFI: add platform-independent core support
  SFI: create linux/sfi.h
  SFI: Simple Firmware Interface - MAINTAINERS, Kconfig
2009-09-23 09:34:07 -07:00
Jan Beulich 4481374ce8 mm: replace various uses of num_physpages by totalram_pages
Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages.  The amount of what actually is usable as storage
should instead be used as a basis here.

Some of the calculations (i.e.  those not intending to use high memory)
should likely even use (totalram_pages - totalhigh_pages).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:38 -07:00
Linus Torvalds 43c1266ce4 Merge branch 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Tidy up after the big rename
  perf: Do the big rename: Performance Counters -> Performance Events
  perf_counter: Rename 'event' to event_id/hw_event
  perf_counter: Rename list_entry -> group_entry, counter_list -> group_list

Manually resolved some fairly trivial conflicts with the tracing tree in
include/trace/ftrace.h and kernel/trace/trace_syscalls.c.
2009-09-21 09:15:07 -07:00
Linus Torvalds b8c7f1dc5c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: Fix whitespace inconsistencies
  rcu: Fix thinko, actually initialize full tree
  rcu: Apply results of code inspection of kernel/rcutree_plugin.h
  rcu: Add WARN_ON_ONCE() consistency checks covering state transitions
  rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU
  rcu: Simplify rcu_read_unlock_special() quiescent-state accounting
  rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods
  rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down
  rcutorture: Occasionally delay readers enough to make RCU force_quiescent_state
  rcu: Initialize multi-level RCU grace periods holding locks
  rcu: Need to update rnp->gpnum if preemptable RCU is to be reliable
2009-09-21 09:06:52 -07:00
Ingo Molnar 57c0c15b52 perf: Tidy up after the big rename
- provide compatibility Kconfig entry for existing PERF_COUNTERS .config's

 - provide courtesy copy of old perf_counter.h, for user-space projects

 - small indentation fixups

 - fix up MAINTAINERS

 - fix small x86 printout fallout

 - fix up small PowerPC comment fallout (use 'counter' as in register)

Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:34:11 +02:00
Ingo Molnar cdd6c482c9 perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:28:04 +02:00
Randy Dunlap 99657c7857 kernel hacking: move STRIP_ASM_SYMS from General
Sam suggested moving STRIP_ASM_SYMS into the Kernel hacking menu
from the General Setup menu.  It makes more sense there.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-09-20 12:27:42 +02:00
Len Brown c602c65b2f Merge branch 'linus' into sfi-release
Conflicts:
	arch/x86/kernel/setup.c
	drivers/acpi/power.c
	init/main.c

Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-19 00:11:26 -04:00
Christoph Hellwig fc5377668c tracing: Remove markers
Now that the last users of markers have migrated to the event
tracer we can kill off the (now orphan) support code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090917173527.GA1699@lst.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18 21:22:08 +02:00
Paul E. McKenney bbe3eae8bb rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down
To quote Valdis:

    This leaves somebody who has a laptop wondering which
    choice is best for a system with only one or two cores that
    has CONFIG_PREEMPT defined. One choice says it scales down
    nicely, the other explicitly has a 'depends on PREEMPT'
    attached to it...

So add "scales down nicely" to TREE_PREEMPT_RCU to match that of
TREE_RCU.

Suggested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12528585112362-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18 00:05:53 +02:00
Linus Torvalds ab86e5765d Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
  debugfs: Modify default debugfs directory for debugging pktcdvd.
  debugfs: Modified default dir of debugfs for debugging UHCI.
  debugfs: Change debugfs directory of IWMC3200
  debugfs: Change debuhgfs directory of trace-events-sample.h
  debugfs: Fix mount directory of debugfs by default in events.txt
  hpilo: add poll f_op
  hpilo: add interrupt handler
  hpilo: staging for interrupt handling
  driver core: platform_device_add_data(): use kmemdup()
  Driver core: Add support for compatibility classes
  uio: add generic driver for PCI 2.3 devices
  driver-core: move dma-coherent.c from kernel to driver/base
  mem_class: fix bug
  mem_class: use minor as index instead of searching the array
  driver model: constify attribute groups
  UIO: remove 'default n' from Kconfig
  Driver core: Add accessor for device platform data
  Driver core: move dev_get/set_drvdata to drivers/base/dd.c
  Driver core: add new device to bus's list before probing
2009-09-16 08:27:10 -07:00
Kay Sievers 2b2af54a5b Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
very early at kernel initialization, before any driver-core device
is registered. Every device with a major/minor will provide a
device node in devtmpfs.

Devtmpfs can be changed and altered by userspace at any time,
and in any way needed - just like today's udev-mounted tmpfs.
Unmodified udev versions will run just fine on top of it, and will
recognize an already existing kernel-created device node and use it.
The default node permissions are root:root 0600. Proper permissions
and user/group ownership, meaningful symlinks, all other policy still
needs to be applied by userspace.

If a node is created by devtmps, devtmpfs will remove the device node
when the device goes away. If the device node was created by
userspace, or the devtmpfs created node was replaced by userspace, it
will no longer be removed by devtmpfs.

If it is requested to auto-mount it, it makes init=/bin/sh work
without any further userspace support. /dev will be fully populated
and dynamic, and always reflect the current device state of the kernel.
With the commonly used dynamic device numbers, it solves the problem
where static devices nodes may point to the wrong devices.

It is intended to make the initial bootup logic simpler and more robust,
by de-coupling the creation of the inital environment, to reliably run
userspace processes, from a complex userspace bootstrap logic to provide
a working /dev.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Tested-By: Harald Hoyer <harald@redhat.com>
Tested-By: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15 09:50:49 -07:00
Linus Torvalds ada3fa1505 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
  powerpc64: convert to dynamic percpu allocator
  sparc64: use embedding percpu first chunk allocator
  percpu: kill lpage first chunk allocator
  x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
  percpu: update embedding first chunk allocator to handle sparse units
  percpu: use group information to allocate vmap areas sparsely
  vmalloc: implement pcpu_get_vm_areas()
  vmalloc: separate out insert_vmalloc_vm()
  percpu: add chunk->base_addr
  percpu: add pcpu_unit_offsets[]
  percpu: introduce pcpu_alloc_info and pcpu_group_info
  percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
  percpu: add @align to pcpu_fc_alloc_fn_t
  percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
  percpu: drop @static_size from first chunk allocators
  percpu: generalize first chunk allocator selection
  percpu: build first chunk allocators selectively
  percpu: rename 4k first chunk allocator to page
  percpu: improve boot messages
  percpu: fix pcpu_reclaim() locking
  ...

Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-15 09:39:44 -07:00
Linus Torvalds 774a694f8c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
  sched: Fix sched::sched_stat_wait tracepoint field
  sched: Disable NEW_FAIR_SLEEPERS for now
  sched: Keep kthreads at default priority
  sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
  sched: Turn off child_runs_first
  sched: Ensure that a child can't gain time over it's parent after fork()
  sched: enable SD_WAKE_IDLE
  sched: Deal with low-load in wake_affine()
  sched: Remove short cut from select_task_rq_fair()
  sched: Turn on SD_BALANCE_NEWIDLE
  sched: Clean up topology.h
  sched: Fix dynamic power-balancing crash
  sched: Remove reciprocal for cpu_power
  sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
  sched: Try to deal with low capacity
  sched: Scale down cpu_power due to RT tasks
  sched: Implement dynamic cpu_power
  sched: Add smt_gain
  sched: Update the cpu_power sum during load-balance
  sched: Add SD_PREFER_SIBLING
  ...
2009-09-11 13:23:18 -07:00
Paul E. McKenney 7db905e636 rcu: Move end of special early-boot RCU operation earlier
Ingo was getting warnings from rcu_scheduler_starting()
indicating that context switches had occurred before RCU ended
its special early-boot handling of grace periods.

This is a dangerous condition, as it indicates that RCU might
have prematurely ended grace periods.  This exploratory fix
moves rcu_scheduler_starting() earlier in boot.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:29:34 +02:00
Ingo Molnar 29e2035bdd Merge branch 'linus' into core/rcu
Merge reason: Avoid fuzz in init/main.c and update from rc6 to rc8.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:29:05 +02:00
Feng Tang 6ae6996a46 SFI: add platform-independent core support
drivers/sfi/sfi_core.c contains the generic SFI implementation.
It has a private header, sfi_core.h, for its own use and the
private use of future files in drivers/sfi/

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-08-28 19:57:33 -04:00
Thomas Gleixner fa84e9eecf init: Move sched_clock_init after late_time_init
Some architectures initialize clocks and timers in late_time_init and
x86 wants to do the same to avoid FIXMAP hackery for calibrating the
TSC. That would result in undefined sched_clock readout and wreckaged
printk timestamps again. We probably have those already on archs which
do all their time/clock setup in late_time_init.

There is no harm to move that after late_time_init except that a few
more boot timestamps are stale. The scheduler is not active at that
point so no real wreckage is expected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: linux-arch@vger.kernel.org
2009-08-27 16:38:18 +02:00
Linus Torvalds 7d63e6359a Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Fix too large stack usage in do_one_initcall()
  tracing: handle broken names in ftrace filter
  ftrace: Unify effect of writing to trace_options and option/*
2009-08-25 11:23:43 -07:00
Paul E. McKenney 6b3ef48adf rcu: Remove CONFIG_PREEMPT_RCU
Now that CONFIG_TREE_PREEMPT_RCU is in place, there is no
further need for CONFIG_PREEMPT_RCU.  Remove it, along with
whatever subtle bugs it may (or may not) contain.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <125097461396-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 10:32:40 +02:00
Paul E. McKenney f41d911f8c rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.

This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU.  Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.

The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 10:32:40 +02:00
Ingo Molnar 4a683bf94b tracing: Fix too large stack usage in do_one_initcall()
One of my testboxes triggered this nasty stack overflow crash
during SCSI probing:

[    5.874004] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    5.875004] device: 'sda': device_add
[    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
[    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
[    5.878004] *pde = 00000000
[    5.878004] Thread overran stack, or stack corrupted
[    5.878004] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[    5.878004] last sysfs file:
[    5.878004]
[    5.878004] Pid: 1, comm: swapper Not tainted (2.6.31-rc6-tip-01272-g9919e28-dirty #5685)
[    5.878004] EIP: 0060:[<b1008321>] EFLAGS: 00010083 CPU: 0
[    5.878004] EIP is at print_context_stack+0x81/0x110
[    5.878004] EAX: cf8a3000 EBX: cf8a3fe4 ECX: 00000049 EDX: 00000000
[    5.878004] ESI: b1cfce84 EDI: 00000000 EBP: cf8a3018 ESP: cf8a2ff4
[    5.878004]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    5.878004] Process swapper (pid: 1, ti=cf8a2000 task=cf8a8000 task.ti=cf8a3000)
[    5.878004] Stack:
[    5.878004]  b1004867 fffff000 cf8a3ffc
[    5.878004] Call Trace:
[    5.878004]  [<b1004867>] ? kernel_thread_helper+0x7/0x10
[    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
[    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
[    5.878004] *pde = 00000000
[    5.878004] Thread overran stack, or stack corrupted
[    5.878004] Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC

The oops did not reveal any more details about the real stack
that we have and the system got into an infinite loop of
recursive pagefaults.

So i booted with CONFIG_STACK_TRACER=y and the 'stacktrace' boot
parameter. The box did not crash (timings/conditions probably
changed a tiny bit to trigger the catastrophic crash), but the
/debug/tracing/stack_trace file was rather revealing:

        Depth    Size   Location    (72 entries)
        -----    ----   --------
  0)     3704      52   __change_page_attr+0xb8/0x290
  1)     3652      24   __change_page_attr_set_clr+0x43/0x90
  2)     3628      60   kernel_map_pages+0x108/0x120
  3)     3568      40   prep_new_page+0x7d/0x130
  4)     3528      84   get_page_from_freelist+0x106/0x420
  5)     3444     116   __alloc_pages_nodemask+0xd7/0x550
  6)     3328      36   allocate_slab+0xb1/0x100
  7)     3292      36   new_slab+0x1c/0x160
  8)     3256      36   __slab_alloc+0x133/0x2b0
  9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
 10)     3216     108   create_object+0x28/0x250
 11)     3108      40   kmemleak_alloc+0x81/0xc0
 12)     3068      24   kmem_cache_alloc+0x162/0x1d0
 13)     3044      52   scsi_pool_alloc_command+0x29/0x70
 14)     2992      20   scsi_host_alloc_command+0x22/0x70
 15)     2972      24   __scsi_get_command+0x1b/0x90
 16)     2948      28   scsi_get_command+0x35/0x90
 17)     2920      24   scsi_setup_blk_pc_cmnd+0xd4/0x100
 18)     2896     128   sd_prep_fn+0x332/0xa70
 19)     2768      36   blk_peek_request+0xe7/0x1d0
 20)     2732      56   scsi_request_fn+0x54/0x520
 21)     2676      12   __generic_unplug_device+0x2b/0x40
 22)     2664      24   blk_execute_rq_nowait+0x59/0x80
 23)     2640     172   blk_execute_rq+0x6b/0xb0
 24)     2468      32   scsi_execute+0xe0/0x140
 25)     2436      64   scsi_execute_req+0x152/0x160
 26)     2372      60   scsi_vpd_inquiry+0x6c/0x90
 27)     2312      44   scsi_get_vpd_page+0x112/0x160
 28)     2268      52   sd_revalidate_disk+0x1df/0x320
 29)     2216      92   rescan_partitions+0x98/0x330
 30)     2124      52   __blkdev_get+0x309/0x350
 31)     2072       8   blkdev_get+0xf/0x20
 32)     2064      44   register_disk+0xff/0x120
 33)     2020      36   add_disk+0x6e/0xb0
 34)     1984      44   sd_probe_async+0xfb/0x1d0
 35)     1940      44   __async_schedule+0xf4/0x1b0
 36)     1896       8   async_schedule+0x12/0x20
 37)     1888      60   sd_probe+0x305/0x360
 38)     1828      44   really_probe+0x63/0x170
 39)     1784      36   driver_probe_device+0x5d/0x60
 40)     1748      16   __device_attach+0x49/0x50
 41)     1732      32   bus_for_each_drv+0x5b/0x80
 42)     1700      24   device_attach+0x6b/0x70
 43)     1676      16   bus_attach_device+0x47/0x60
 44)     1660      76   device_add+0x33d/0x400
 45)     1584      52   scsi_sysfs_add_sdev+0x6a/0x2c0
 46)     1532     108   scsi_add_lun+0x44b/0x460
 47)     1424     116   scsi_probe_and_add_lun+0x182/0x4e0
 48)     1308      36   __scsi_add_device+0xd9/0xe0
 49)     1272      44   ata_scsi_scan_host+0x10b/0x190
 50)     1228      24   async_port_probe+0x96/0xd0
 51)     1204      44   __async_schedule+0xf4/0x1b0
 52)     1160       8   async_schedule+0x12/0x20
 53)     1152      48   ata_host_register+0x171/0x1d0
 54)     1104      60   ata_pci_sff_activate_host+0xf3/0x230
 55)     1044      44   ata_pci_sff_init_one+0xea/0x100
 56)     1000      48   amd_init_one+0xb2/0x190
 57)      952       8   local_pci_probe+0x13/0x20
 58)      944      32   pci_device_probe+0x68/0x90
 59)      912      44   really_probe+0x63/0x170
 60)      868      36   driver_probe_device+0x5d/0x60
 61)      832      20   __driver_attach+0x89/0xa0
 62)      812      32   bus_for_each_dev+0x5b/0x80
 63)      780      12   driver_attach+0x1e/0x20
 64)      768      72   bus_add_driver+0x14b/0x2d0
 65)      696      36   driver_register+0x6e/0x150
 66)      660      20   __pci_register_driver+0x53/0xc0
 67)      640       8   amd_init+0x14/0x16
 68)      632     572   do_one_initcall+0x2b/0x1d0
 69)       60      12   do_basic_setup+0x56/0x6a
 70)       48      20   kernel_init+0x84/0xce
 71)       28      28   kernel_thread_helper+0x7/0x10

There's a lot of fat functions on that stack trace, but
the largest of all is do_one_initcall(). This is due to
the boot trace entry variables being on the stack.

Fixing this is relatively easy, initcalls are fundamentally
serialized, so we can move the local variables to file scope.

Note that this large stack footprint was present for a
couple of months already - what pushed my system over
the edge was the addition of kmemleak to the call-chain:

  6)     3328      36   allocate_slab+0xb1/0x100
  7)     3292      36   new_slab+0x1c/0x160
  8)     3256      36   __slab_alloc+0x133/0x2b0
  9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
 10)     3216     108   create_object+0x28/0x250
 11)     3108      40   kmemleak_alloc+0x81/0xc0
 12)     3068      24   kmem_cache_alloc+0x162/0x1d0
 13)     3044      52   scsi_pool_alloc_command+0x29/0x70

This pushes the total to ~3800 bytes, only a tiny bit
more was needed to corrupt the on-kernel-stack thread_info.

The fix reduces the stack footprint from 572 bytes
to 28 bytes.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-21 13:03:22 +02:00
Ingo Molnar fa08661af8 Merge commit 'v2.6.31-rc6' into core/rcu
Merge reason: the branch was on pre-rc1 .30, update to latest.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-15 18:56:13 +02:00
Tejun Heo 384be2b18a Merge branch 'percpu-for-linus' into percpu-for-next
Conflicts:
	arch/sparc/kernel/smp_64.c
	arch/x86/kernel/cpu/perf_counter.c
	arch/x86/kernel/setup_percpu.c
	drivers/cpufreq/cpufreq_ondemand.c
	mm/percpu.c

Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids.  As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-14 14:45:31 +09:00
Tejun Heo d6647bdf98 init: set nr_cpu_ids before setup_per_cpu_areas()
nr_cpu_ids is dependent only on cpu_possible_map and
setup_per_cpu_areas() already depends on cpu_possible_map and will use
nr_cpu_ids.  Initialize nr_cpu_ids before setting up percpu areas.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-14 12:56:54 +09:00
Ingo Molnar f26542600e perf_counter: Set the CONFIG_PERF_COUNTERS default to y if CONFIG_PROFILING=y
If user has already enabled profiling support in the kernel
(for oprofile, old-style profiling of ftrace) then offer up
perfcounters with a y default in interactive kconfig sessions.

Still keep it off by default otherwise.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-04 18:19:30 +02:00
Peter Zijlstra 470a1396c2 tracing, perf_counter: Add help text to CONFIG_EVENT_PROFILE
Explain what tracepoint profiling sources are about.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jeff Garzik <jeff@garzik.org>
LKML-Reference: <1248856508.6987.3041.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 13:47:53 +02:00
Chris Wilson d4d7d0b954 perf_counter: Fix the tracepoint channel to perfcounters
Fix a missed rename in EVENT_PROFILE support so that it gets
built and allows tracepoint tracing from the 'perf' tool.

Fix a typo in the (never before built & enabled) portion in
perf_counter.c as well, and update that code to the
attr.config changes as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Gamari <bgamari.foss@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1246869094-21237-1-git-send-email-chris@chris-wilson.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-13 09:23:10 +02:00
Tejun Heo c43768cbb7 Merge branch 'master' into for-next
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes.  As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.

Conflicts:
	arch/alpha/include/asm/percpu.h
	arch/mn10300/kernel/vmlinux.lds.S
	include/linux/percpu-defs.h
2009-07-04 07:13:18 +09:00
Paul E. McKenney c17ef45342 rcu: Remove Classic RCU
Remove Classic RCU, given that the combination of Tree RCU and
the proposed Bloatwatch RCU do everything that Classic RCU can
with fewer bugs.

Tree RCU has been default in x86 builds for almost six months,
and seems to be quite reliable, so there does not seem to be
much justification for keeping the Classic RCU code and config
complexity around anymore.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: akpm@linux-foundation.org
Cc: niv@us.ibm.com
Cc: dvhltc@us.ibm.com
Cc: dipankar@in.ibm.com
Cc: dhowells@redhat.com
Cc: lethal@linux-sh.org
Cc: kernel@wantstofly.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24 15:05:13 +02:00
Tejun Heo e74e396204 percpu: use dynamic percpu allocator as the default percpu allocator
This patch makes most !CONFIG_HAVE_SETUP_PER_CPU_AREA archs use
dynamic percpu allocator.  The first chunk is allocated using
embedding helper and 8k is reserved for modules.  This ensures that
the new allocator behaves almost identically to the original allocator
as long as static percpu variables are concerned, so it shouldn't
introduce much breakage.

s390 and alpha use custom SHIFT_PERCPU_PTR() to work around addressing
range limit the addressing model imposes.  Unfortunately, this breaks
if the address is specified using a variable, so for now, the two
archs aren't converted.

The following architectures are affected by this change.

* sh
* arm
* cris
* mips
* sparc(32)
* blackfin
* avr32
* parisc (broken, under investigation)
* m32r
* powerpc(32)

As this change makes the dynamic allocator the default one,
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is replaced with its invert -
CONFIG_HAVE_LEGACY_PER_CPU_AREA, which is added to yet-to-be converted
archs.  These archs implement their own setup_per_cpu_areas() and the
conversion is not trivial.

* powerpc(64)
* sparc(64)
* ia64
* alpha
* s390

Boot and batch alloc/free tests on x86_32 with debug code (x86_32
doesn't use default first chunk initialization).  Compile tested on
sparc(32), powerpc(32), arm and alpha.

Kyle McMartin reported that this change breaks parisc.  The problem is
still under investigation and he is okay with pushing this patch
forward and fixing parisc later.

[ Impact: use dynamic allocator for most archs w/o custom percpu setup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-24 15:13:35 +09:00
Len Brown fbe8cddd2d Merge branches 'acerhdf', 'acpi-pci-bind', 'bjorn-pci-root', 'bugzilla-12904', 'bugzilla-13121', 'bugzilla-13396', 'bugzilla-13533', 'bugzilla-13612', 'c3_lock', 'hid-cleanups', 'misc-2.6.31', 'pdc-leak-fix', 'pnpacpi', 'power_nocheck', 'thinkpad_acpi', 'video' and 'wmi' into release 2009-06-24 01:19:50 -04:00
Linus Torvalds 31950eb66f mm/init: cpu_hotplug_init() must be initialized before SLAB
SLAB uses get/put_online_cpus() which use a mutex which is itself only
initialized when cpu_hotplug_init() is called.  Currently we hang suring
boot in SLAB due to doing that too late.

Reported by James Bottomley and Sachin Sant (and possibly others).
Debugged by Benjamin Herrenschmidt.

This just removes the dynamic initialization of the data structures, and
replaces it with a static one, avoiding this dependency entirely, and
removing one unnecessary special initcall.

Tested-by: Sachin Sant <sachinp@in.ibm.com>
Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:18:12 -07:00
Benjamin Herrenschmidt dcce284a25 mm: Extend gfp masking to the page allocator
The page allocator also needs the masking of gfp flags during boot,
so this moves it out of slab/slub and uses it with the page allocator
as well.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:12:57 -07:00
Peter Oberparleiter b99b87f70c kernel: constructor support
Call constructors (gcc-generated initcall-like functions) during kernel
start and module load.  Constructors are e.g.  used for gcov data
initialization.

Disable constructor support for usermode Linux to prevent conflicts with
host glibc.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Li Wei <W.Li@Sun.COM>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:57 -07:00
Benjamin Herrenschmidt c868d55011 mm: Move pgtable_cache_init() earlier
Some architectures need to initialize SLAB caches to be able
to allocate page tables. They do that from pgtable_cache_init()
so the later should be called earlier now, best is before
vmalloc_init().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 21:14:42 -07:00
Linus Torvalds 517d08699b Merge branch 'akpm'
* akpm: (182 commits)
  fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
  fbdev: *bfin*: fix __dev{init,exit} markings
  fbdev: *bfin*: drop unnecessary calls to memset
  fbdev: bfin-t350mcqb-fb: drop unused local variables
  fbdev: blackfin has __raw I/O accessors, so use them in fb.h
  fbdev: s1d13xxxfb: add accelerated bitblt functions
  tcx: use standard fields for framebuffer physical address and length
  fbdev: add support for handoff from firmware to hw framebuffers
  intelfb: fix a bug when changing video timing
  fbdev: use framebuffer_release() for freeing fb_info structures
  radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
  s3c-fb: CPUFREQ frequency scaling support
  s3c-fb: fix resource releasing on error during probing
  carminefb: fix possible access beyond end of carmine_modedb[]
  acornfb: remove fb_mmap function
  mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
  mb862xxfb: restrict compliation of platform driver to PPC
  Samsung SoC Framebuffer driver: add Alpha Channel support
  atmel-lcdc: fix pixclock upper bound detection
  offb: use framebuffer_alloc() to allocate fb_info struct
  ...

Manually fix up conflicts due to kmemcheck in mm/slab.c
2009-06-16 19:50:13 -07:00
Miao Xie 58568d2a82 cpuset,mm: update tasks' mems_allowed in time
Fix allocating page cache/slab object on the unallowed node when memory
spread is set by updating tasks' mems_allowed after its cpuset's mems is
changed.

In order to update tasks' mems_allowed in time, we must modify the code of
memory policy.  Because the memory policy is applied in the process's
context originally.  After applying this patch, one task directly
manipulates anothers mems_allowed, and we use alloc_lock in the
task_struct to protect mems_allowed and memory policy of the task.

But in the fast path, we didn't use lock to protect them, because adding a
lock may lead to performance regression.  But if we don't add a lock,the
task might see no nodes when changing cpuset's mems_allowed to some
non-overlapping set.  In order to avoid it, we set all new allowed nodes,
then clear newly disallowed ones.

[lee.schermerhorn@hp.com:
  The rework of mpol_new() to extract the adjusting of the node mask to
  apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
  with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
  allocation.  Fix this by adding the check for MPOL_PREFERRED and empty
  node mask to mpol_new_mpolicy().

  Remove the now unneeded 'nodes = NULL' from mpol_new().

  Note that mpol_new_mempolicy() is always called with a non-NULL
  'nodes' parameter now that it has been removed from mpol_new().
  Therefore, we don't need to test nodes for NULL before testing it for
  'empty'.  However, just to be extra paranoid, add a VM_BUG_ON() to
  verify this assumption.]
[lee.schermerhorn@hp.com:

  I don't think the function name 'mpol_new_mempolicy' is descriptive
  enough to differentiate it from mpol_new().

  This function applies cpuset set context, usually constraining nodes
  to those allowed by the cpuset.  However, when the 'RELATIVE_NODES flag
  is set, it also translates the nodes.  So I settled on
  'mpol_set_nodemask()', because the comment block for mpol_new() mentions
  that we need to call this function to "set nodes".

  Some additional minor line length, whitespace and typo cleanup.]
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:31 -07:00
Linus Torvalds b3fec0fe35 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits)
  signal: fix __send_signal() false positive kmemcheck warning
  fs: fix do_mount_root() false positive kmemcheck warning
  fs: introduce __getname_gfp()
  trace: annotate bitfields in struct ring_buffer_event
  net: annotate struct sock bitfield
  c2port: annotate bitfield for kmemcheck
  net: annotate inet_timewait_sock bitfields
  ieee1394/csr1212: fix false positive kmemcheck report
  ieee1394: annotate bitfield
  net: annotate bitfields in struct inet_sock
  net: use kmemcheck bitfields API for skbuff
  kmemcheck: introduce bitfield API
  kmemcheck: add opcode self-testing at boot
  x86: unify pte_hidden
  x86: make _PAGE_HIDDEN conditional
  kmemcheck: make kconfig accessible for other architectures
  kmemcheck: enable in the x86 Kconfig
  kmemcheck: add hooks for the page allocator
  kmemcheck: add hooks for page- and sg-dma-mappings
  kmemcheck: don't track page tables
  ...
2009-06-16 13:09:51 -07:00
Kay Sievers f6ee649f4b driver core: set default SYSFS_DEPRECATED=n
All recent distros depend on the non-deprecated sysfs layout, so
change the default value of the option to reflect that.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15 21:30:23 -07:00
Vegard Nossum 722f2a6c87 Merge commit 'linus/master' into HEAD
Conflicts:
	MAINTAINERS

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:50:49 +02:00
Vegard Nossum 3b5c760efc fs: fix do_mount_root() false positive kmemcheck warning
This false positive is due to the fact that do_mount_root() fakes a
mount option (which is normally read from userspace), and the kernel
unconditionally reads a whole page for the mount option.

Hide the false positive by using the new __getname_gfp() with the
__GFP_NOTRACK_FALSE_POSITIVE flag.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:49:42 +02:00
Linus Torvalds 45e3e1935e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (53 commits)
  .gitignore: ignore *.lzma files
  kbuild: add generic --set-str option to scripts/config
  kbuild: simplify argument loop in scripts/config
  kbuild: handle non-existing options in scripts/config
  kallsyms: generalize text region handling
  kallsyms: support kernel symbols in Blackfin on-chip memory
  documentation: make version fix
  kbuild: fix a compile warning
  gitignore: Add GNU GLOBAL files to top .gitignore
  kbuild: fix delay in setlocalversion on readonly source
  README: fix misleading pointer to the defconf directory
  vmlinux.lds.h update
  kernel-doc: cleanup perl script
  Improve vmlinux.lds.h support for arch specific linker scripts
  kbuild: fix headers_exports with boolean expression
  kbuild/headers_check: refine extern check
  kbuild: fix "Argument list too long" error for "make headers_check",
  ignore *.patch files
  Remove bashisms from scripts
  menu: fix embedded menu presentation
  ...
2009-06-14 14:12:18 -07:00
Vegard Nossum dfec072ecd kmemcheck: add the kmemcheck core
General description: kmemcheck is a patch to the linux kernel that
detects use of uninitialized memory. It does this by trapping every
read and write to memory that was allocated dynamically (e.g. using
kmalloc()). If a memory address is read that has not previously been
written to, a message is printed to the kernel log.

Thanks to Andi Kleen for the set_memory_4k() solution.

Andrew Morton suggested documenting the shadow member of struct page.

Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

[export kmemcheck_mark_initialized]
[build fix for setup_max_cpus]
Signed-off-by: Ingo Molnar <mingo@elte.hu>

[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
2009-06-13 15:37:30 +02:00
Len Brown 4a7a16dc06 ACPI: move declaration acpi_early_init() to acpi.h
Signed-off-by: Len Brown <len.brown@intel.com>
2009-06-12 20:50:38 -04:00
Linus Torvalds 4ddbac9898 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter: Start documenting HAVE_PERF_COUNTERS requirements
  perf_counter: Add forward/backward attribute ABI compatibility
  perf record: Explicity program a default counter
  perf_counter: Remove PERF_TYPE_RAW special casing
  perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too
  powerpc, perf_counter: Fix performance counter event types
  perf_counter/x86: Add a quirk for Atom processors
  perf_counter tools: Remove one L1-data alias
2009-06-12 13:16:52 -07:00
Mike Frysinger 018df72dd0 perf_counter: Start documenting HAVE_PERF_COUNTERS requirements
Help out arch porters who want to support perf counters by listing some
basic requirements.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1244827063-24046-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-12 19:37:30 +02:00
Pekka Enberg 7e85ee0c1d slab,slub: don't enable interrupts during early boot
As explained by Benjamin Herrenschmidt:

  Oh and btw, your patch alone doesn't fix powerpc, because it's missing
  a whole bunch of GFP_KERNEL's in the arch code... You would have to
  grep the entire kernel for things that check slab_is_available() and
  even then you'll be missing some.

  For example, slab_is_available() didn't always exist, and so in the
  early days on powerpc, we used a mem_init_done global that is set form
  mem_init() (not perfect but works in practice). And we still have code
  using that to do the test.

Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
in early boot code to avoid enabling interrupts.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12 18:53:33 +03:00
KAMEZAWA Hiroyuki ca371c0d7e memcg: fix page_cgroup fatal error in FLATMEM
Now, SLAB is configured in very early stage and it can be used in
init routine now.

But replacing alloc_bootmem() in FLAT/DISCONTIGMEM's page_cgroup()
initialization breaks the allocation, now.
(Works well in SPARSEMEM case...it supports MEMORY_HOTPLUG and
 size of page_cgroup is in reasonable size (< 1 << MAX_ORDER.)

This patch revive FLATMEM+memory cgroup by using alloc_bootmem.

In future,
We stop to support FLATMEM (if no users) or rewrite codes for flatmem
completely.But this will adds more messy codes and overheads.

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12 11:00:54 +03:00
Linus Torvalds 3bb66d7f8c Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  fsnotify: allow groups to set freeing_mark to null
  inotify/dnotify: should_send_event shouldn't match on FS_EVENT_ON_CHILD
  dnotify: do not bother to lock entry->lock when reading mask
  dnotify: do not use ?true:false when assigning to a bool
  fsnotify: move events should indicate the event was on a child
  inotify: reimplement inotify using fsnotify
  fsnotify: handle filesystem unmounts with fsnotify marks
  fsnotify: fsnotify marks on inodes pin them in core
  fsnotify: allow groups to add private data to events
  fsnotify: add correlations between events
  fsnotify: include pathnames with entries when possible
  fsnotify: generic notification queue and waitq
  dnotify: reimplement dnotify using fsnotify
  fsnotify: parent event notification
  fsnotify: add marks to inodes so groups can interpret how to handle those inodes
  fsnotify: unified filesystem notification backend
2009-06-11 14:22:55 -07:00
Linus Torvalds 512626a04e Merge branch 'for-linus' of git://linux-arm.org/linux-2.6
* 'for-linus' of git://linux-arm.org/linux-2.6:
  kmemleak: Add the corresponding MAINTAINERS entry
  kmemleak: Simple testing module for kmemleak
  kmemleak: Enable the building of the memory leak detector
  kmemleak: Remove some of the kmemleak false positives
  kmemleak: Add modules support
  kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
  kmemleak: Add the vmalloc memory allocation/freeing hooks
  kmemleak: Add the slub memory allocation/freeing hooks
  kmemleak: Add the slob memory allocation/freeing hooks
  kmemleak: Add the slab memory allocation/freeing hooks
  kmemleak: Add documentation on the memory leak detector
  kmemleak: Add the base support

Manual conflict resolution (with the slab/earlyboot changes) in:
	drivers/char/vt.c
	init/main.c
	mm/slab.c
2009-06-11 14:15:57 -07:00
Linus Torvalds 8a1ca8cedd Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits)
  perf_counter: Turn off by default
  perf_counter: Add counter->id to the throttle event
  perf_counter: Better align code
  perf_counter: Rename L2 to LL cache
  perf_counter: Standardize event names
  perf_counter: Rename enums
  perf_counter tools: Clean up u64 usage
  perf_counter: Rename perf_counter_limit sysctl
  perf_counter: More paranoia settings
  perf_counter: powerpc: Implement generalized cache events for POWER processors
  perf_counters: powerpc: Add support for POWER7 processors
  perf_counter: Accurate period data
  perf_counter: Introduce struct for sample data
  perf_counter tools: Normalize data using per sample period data
  perf_counter: Annotate exit ctx recursion
  perf_counter tools: Propagate signals properly
  perf_counter tools: Small frequency related fixes
  perf_counter: More aggressive frequency adjustment
  perf_counter/x86: Fix the model number of Intel Core2 processors
  perf_counter, x86: Correct some event and umask values for Intel processors
  ...
2009-06-11 14:01:07 -07:00
Eric Paris 63c882a054 inotify: reimplement inotify using fsnotify
Reimplement inotify_user using fsnotify.  This should be feature for feature
exactly the same as the original inotify_user.  This does not make any changes
to the in kernel inotify feature used by audit.  Those patches (and the eventual
removal of in kernel inotify) will come after the new inotify_user proves to be
working correctly.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
2009-06-11 14:57:54 -04:00
Pekka Enberg 444f478f65 init: introduce mm_init()
As suggested by Christoph Lameter, introduce mm_init() now that we initialize
all the kernel memory allocations together.

Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11 19:27:03 +03:00
Pekka Enberg 43ebdac42f vmalloc: use kzalloc() instead of alloc_bootmem()
We can call vmalloc_init() after kmem_cache_init() and use kzalloc() instead of
the bootmem allocator when initializing vmalloc data structures.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11 19:17:05 +03:00
Pekka Enberg 83b519e8b9 slab: setup allocators earlier in the boot sequence
This patch makes kmalloc() available earlier in the boot sequence so we can get
rid of some bootmem allocations. The bulk of the changes are due to
kmem_cache_init() being called with interrupts disabled which requires some
changes to allocator boostrap code.

Note: 32-bit x86 does WP protect test in mem_init() so we must setup traps
before we call mem_init() during boot as reported by Ingo Molnar:

  We have a hard crash in the WP-protect code:

  [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...BUG: Int 14: CR2 ffcff000
  [    0.000000]      EDI 00000188  ESI 00000ac7  EBP c17eaf9c  ESP c17eaf8c
  [    0.000000]      EBX 000014e0  EDX 0000000e  ECX 01856067  EAX 00000001
  [    0.000000]      err 00000003  EIP c10135b1   CS 00000060  flg 00010002
  [    0.000000] Stack: c17eafa8 c17fd410 c16747bc c17eafc4 c17fd7e5 000011fd f8616000 c18237cc
  [    0.000000]        00099800 c17bb000 c17eafec c17f1668 000001c5 c17f1322 c166e039 c1822bf0
  [    0.000000]        c166e033 c153a014 c18237cc 00020800 c17eaff8 c17f106a 00020800 01ba5003
  [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip-02161-g7a74539-dirty #52203
  [    0.000000] Call Trace:
  [    0.000000]  [<c15357c2>] ? printk+0x14/0x16
  [    0.000000]  [<c10135b1>] ? do_test_wp_bit+0x19/0x23
  [    0.000000]  [<c17fd410>] ? test_wp_bit+0x26/0x64
  [    0.000000]  [<c17fd7e5>] ? mem_init+0x1ba/0x1d8
  [    0.000000]  [<c17f1668>] ? start_kernel+0x164/0x2f7
  [    0.000000]  [<c17f1322>] ? unknown_bootoption+0x0/0x19c
  [    0.000000]  [<c17f106a>] ? __init_begin+0x6a/0x6f

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11 19:15:56 +03:00
Catalin Marinas 3c7b4e6b8b kmemleak: Add the base support
This patch adds the base support for the kernel memory leak
detector. It traces the memory allocation/freeing in a way similar to
the Boehm's conservative garbage collector, the difference being that
the unreferenced objects are not freed but only shown in
/sys/kernel/debug/kmemleak. Enabling this feature introduces an
overhead to memory allocations.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2009-06-11 17:03:28 +01:00
Ingo Molnar 940010c5a3 Merge branch 'linus' into perfcounters/core
Conflicts:
	arch/x86/kernel/irqinit.c
	arch/x86/kernel/irqinit_64.c
	arch/x86/kernel/traps.c
	arch/x86/mm/fault.c
	include/linux/sched.h
	kernel/exit.c
2009-06-11 17:55:42 +02:00
Ingo Molnar 8dc8e5e8bc perf_counter: Turn off by default
Perfcounters were enabled by default to help testing - but now that we
are submitting it upstream, make it default-disabled.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:46 +02:00
Linus Torvalds 8623661180 Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
  Revert "x86, bts: reenable ptrace branch trace support"
  tracing: do not translate event helper macros in print format
  ftrace/documentation: fix typo in function grapher name
  tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
  tracing: add protection around module events unload
  tracing: add trace_seq_vprint interface
  tracing: fix the block trace points print size
  tracing/events: convert block trace points to TRACE_EVENT()
  ring-buffer: fix ret in rb_add_time_stamp
  ring-buffer: pass in lockdep class key for reader_lock
  tracing: add annotation to what type of stack trace is recorded
  tracing: fix multiple use of __print_flags and __print_symbolic
  tracing/events: fix output format of user stack
  tracing/events: fix output format of kernel stack
  tracing/trace_stack: fix the number of entries in the header
  ring-buffer: discard timestamps that are at the start of the buffer
  ring-buffer: try to discard unneeded timestamps
  ring-buffer: fix bug in ring_buffer_discard_commit
  ftrace: do not profile functions when disabled
  tracing: make trace pipe recognize latency format flag
  ...
2009-06-10 19:53:40 -07:00
Linus Torvalds 20f3f3ca49 Merge branch 'rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: rcu_sched_grace_period(): kill the bogus flush_signals()
  rculist: use list_entry_rcu in places where it's appropriate
  rculist.h: introduce list_entry_rcu() and list_first_entry_rcu()
  rcu: Update RCU tracing documentation for __rcu_pending
  rcu: Add __rcu_pending tracing to hierarchical RCU
  RCU: make treercu be default
2009-06-10 19:50:03 -07:00
Randy Dunlap a9eb522375 menu: fix embedded menu presentation
The STRIP_ASM_SYMS kconfig symbol mucks up the embedded menu because
STRIP_ASM_SYMS is in the middle of the embedded menu items but it does not
depend on EMBEDDED.  Move it to beyond the end of the embedded menu so
that the menu is presented correctly.

Or if STRIP_ASM_SYMS should depend on EMBEDDED, that can also be fixed.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-06-09 22:37:54 +02:00
Ingo Molnar 23db9f430b Merge branch 'linus' into perfcounters/core
Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6
              based - to pick up the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-01 10:01:39 +02:00
Alex Riesen 657cafa6b0 Use a format for linux_banner
There is no format specifiers left in the linux_banner, and gcc-4.3
complains seeing the printk.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-24 11:19:11 -07:00
Ingo Molnar 1079cac0f4 Merge commit 'v2.6.30-rc6' into tracing/core
Merge reason: we were on an -rc4 base, sync up to -rc6

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 10:15:35 +02:00
Ingo Molnar dc3f81b129 Merge commit 'v2.6.30-rc6' into perfcounters/core
Merge reason: this branch was on an -rc4 base, merge it up to -rc6
              to get the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 07:37:49 +02:00
Ingo Molnar 44347d947f Merge branch 'linus' into tracing/core
Merge reason: tracing/core was on a .30-rc1 base and was missing out on
              on a handful of tracing fixes present in .30-rc5-almost.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-07 11:17:34 +02:00
Eric Piel a1e6b6c1a6 initramfs: clean up messages related to initramfs unpacking
With the removal of duplicate unpack_to_rootfs() (commit
df52092f3c) the messages displayed do not
actually correspond to what the kernel is doing.  In addition, depending
if ramdisks are supported or not, the messages are not at all the same.

So keep the messages more in sync with what is really doing the kernel,
and only display a second message in case of failure.  This also ensure
that the printk message cannot be split by other printk's.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-06 16:36:10 -07:00
Ingo Molnar e7fd5d4b3d Merge branch 'linus' into perfcounters/core
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up
              the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:47:05 +02:00
Magnus Damm 13977091a9 Driver Core: early platform driver
V3 of the early platform driver implementation.

Platform drivers are great for embedded platforms because we can separate
driver configuration from the actual driver.  So base addresses,
interrupts and other configuration can be kept with the processor or board
code, and the platform driver can be reused by many different platforms.

For early devices we have nothing today.  For instance, to configure early
timers and early serial ports we cannot use platform devices.  This
because the setup order during boot.  Timers are needed before the
platform driver core code is available.  The same goes for early printk
support.  Early in this case means before initcalls.

These early drivers today have their configuration either hard coded or
they receive it using some special configuration method.  This is working
quite well, but if we want to support both regular kernel modules and
early devices then we need to have two ways of configuring the same
driver.  A single way would be better.

The early platform driver patch is basically a set of functions that allow
drivers to register themselves and architecture code to locate them and
probe.  Registration happens through early_param().  The time for the
probe is decided by the architecture code.

See Documentation/driver-model/platform.txt for more details.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: David Brownell <david-b@pacbell.net>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-04-16 16:17:10 -07:00
Ingo Molnar 05cfbd66d0 Merge branch 'core/urgent' into core/rcu
Merge reason: new patches to be queued up depend on:

   ef631b0: rcu: Make hierarchical RCU less IPI-happy

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 11:32:30 +02:00
Randy Robertson d20d5a7456 initramfs: fix initramfs to work with hardlinked init
Change cb6ff20807 ("NOMMU: Support XIP on
initramfs") seems to have broken booting from initramfs with /sbin/init
being a hardlink.

It seems like the logic required for XIP on nommu, i.e.  ftruncate to
reported cpio header file size (body_len) is broken for hardlinks, which
have a reported size of 0, and the truncate thus nukes the contents of the
file (in my case busybox), making boot impossible and ending with runaway
loop modprobe binfmt-0000 - and of course 0000 is not a valid binary
format.

My fix is to only call ftruncate if size is non-zero which fixes things
for me, but I'm not certain whether this will break XIP for those files on
nommu systems, although I would guess not.

Signed-off-by: Randy Robertson <rmrobert@vmware.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13 15:04:31 -07:00
Nikanth Karthikesan b52bb3712a init/initramfs: fix warning with CONFIG_BLK_DEV_RAM=n
init/initramfs.c:520: warning: 'clean_rootfs' defined but not used

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13 15:04:28 -07:00
Ingo Molnar b5c851a88a Merge branch 'linus' into tracing/core
Merge reason: merge latest tracing fixes to avoid conflicts in
              kernel/trace/trace_events_filter.c with upcoming change

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:02:22 +02:00
Zhaolei 02af61bb50 tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part
Impact: refactor code for future changes

Current kmemtrace.h is used both as header file of kmemtrace and kmem's
tracepoints definition.

Tracepoints' definition file may be used by other code, and should only have
definition of tracepoint.

We can separate include/trace/kmemtrace.h into 2 files:

  include/linux/kmemtrace.h: header file for kmemtrace
  include/trace/kmem.h:      definition of kmem tracepoints

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <49DEE68A.5040902@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 15:22:55 +02:00
David Howells 5d7d18f5bc kbuild: make it possible for the linker to discard local symbols from vmlinux
Make it possible for the linker to discard local symbols from vmlinux as
they cause vmlinux to balloon when CONFIG_KALLSYMS=y and they cause
dump_stack() and get_wchan() to produce useless information under some
circumstances.

With this we add a config option (CONFIG_STRIP_ASM_SYMS) that will cause
the build to supply -X to the linker to tell it to strip temporary local
symbols.

This doesn't seem to cause gdb any problems.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-04-11 08:18:10 +02:00
Ingo Molnar 5ea472a77f Merge commit 'v2.6.30-rc1' into perfcounters/core
Conflicts:
	arch/powerpc/include/asm/systbl.h
	arch/powerpc/include/asm/unistd.h
	include/linux/init_task.h

Merge reason: the conflicts are non-trivial: PowerPC placement
              of sys_perf_counter_open has to be mixed with the
	      new preadv/pwrite syscalls.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 10:35:30 +02:00
Serge E. Hallyn bdc8e5f85f namespaces: mqueue namespace: adapt sysctl
Largely inspired from ipc/ipc_sysctl.c.  This patch isolates the mqueue
sysctl stuff in its own file.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:09 -07:00
Serge E. Hallyn 614b84cf4e namespaces: mqueue ns: move mqueue_mnt into struct ipc_namespace
Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
posix message queue namespaces and the SYSV ipc namespaces.

The sysctl code will be fixed separately in patch 3.  After just this
patch, making a change to posix mqueue tunables always changes the values
in the initial ipc namespace.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:09 -07:00
Ingo Molnar 6c009ecef8 Merge branch 'linus' into perfcounters/core
Merge reason: need the upstream facility added by:

  7f1e2ca: hrtimer: fix rq->lock inversion (again)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 12:05:25 +02:00
David Howells 1c2d008c9e Make CONFIG_SLOW_WORK an automatic rather than manual config option
Make CONFIG_SLOW_WORK an automatic rather than manual config option so that
people configuring their kernels don't have to make the choice.  It can be
selected automatically by those things that require it (such as FS-Cache).

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06 14:31:28 -07:00
Peter Zijlstra e077df4f43 perf_counter: hook up the tracepoint events
Impact: new perfcounters feature

Enable usage of tracepoints as perf counter events.

tracepoint event ids can be found in /debug/tracing/event/*/*/id
and (for now) are represented as -65536+id in the type field.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194233.744044174@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:16 +02:00
Ingo Molnar f541ae326f Merge branch 'linus' into perfcounters/core-v2
Merge reason: we have gathered quite a few conflicts, need to merge upstream

Conflicts:
	arch/powerpc/kernel/Makefile
	arch/x86/ia32/ia32entry.S
	arch/x86/include/asm/hardirq.h
	arch/x86/include/asm/unistd_32.h
	arch/x86/include/asm/unistd_64.h
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/irq.c
	arch/x86/kernel/syscall_table_32.S
	arch/x86/mm/iomap_32.c
	include/linux/sched.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:02:57 +02:00
Linus Torvalds 714f83d5d9 Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
  tracing, net: fix net tree and tracing tree merge interaction
  tracing, powerpc: fix powerpc tree and tracing tree interaction
  ring-buffer: do not remove reader page from list on ring buffer free
  function-graph: allow unregistering twice
  trace: make argument 'mem' of trace_seq_putmem() const
  tracing: add missing 'extern' keywords to trace_output.h
  tracing: provide trace_seq_reserve()
  blktrace: print out BLK_TN_MESSAGE properly
  blktrace: extract duplidate code
  blktrace: fix memory leak when freeing struct blk_io_trace
  blktrace: fix blk_probes_ref chaos
  blktrace: make classic output more classic
  blktrace: fix off-by-one bug
  blktrace: fix the original blktrace
  blktrace: fix a race when creating blk_tree_root in debugfs
  blktrace: fix timestamp in binary output
  tracing, Text Edit Lock: cleanup
  tracing: filter fix for TRACE_EVENT_FORMAT events
  ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
  x86: kretprobe-booster interrupt emulation code fix
  ...

Fix up trivial conflicts in
 arch/parisc/include/asm/ftrace.h
 include/linux/memory.h
 kernel/extable.c
 kernel/module.c
2009-04-05 11:04:19 -07:00
Linus Torvalds 811158b147 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
  trivial: Update my email address
  trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
  trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
  trivial: Fix misspelling of "Celsius".
  trivial: remove unused variable 'path' in alloc_file()
  trivial: fix a pdlfush -> pdflush typo in comment
  trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
  trivial: wusb: Storage class should be before const qualifier
  trivial: drivers/char/bsr.c: Storage class should be before const qualifier
  trivial: h8300: Storage class should be before const qualifier
  trivial: fix where cgroup documentation is not correctly referred to
  trivial: Give the right path in Documentation example
  trivial: MTD: remove EOL from MODULE_DESCRIPTION
  trivial: Fix typo in bio_split()'s documentation
  trivial: PWM: fix of #endif comment
  trivial: fix typos/grammar errors in Kconfig texts
  trivial: Fix misspelling of firmware
  trivial: cgroups: documentation typo and spelling corrections
  trivial: Update contact info for Jochen Hein
  trivial: fix typo "resgister" -> "register"
  ...
2009-04-03 15:24:35 -07:00
Linus Torvalds 3cc50ac0db Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (41 commits)
  NFS: Add mount options to enable local caching on NFS
  NFS: Display local caching state
  NFS: Store pages from an NFS inode into a local cache
  NFS: Read pages from FS-Cache into an NFS inode
  NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching
  NFS: Add read context retention for FS-Cache to call back with
  NFS: FS-Cache page management
  NFS: Add some new I/O counters for FS-Cache doing things for NFS
  NFS: Invalidate FsCache page flags when cache removed
  NFS: Use local disk inode cache
  NFS: Define and create inode-level cache objects
  NFS: Define and create superblock-level objects
  NFS: Define and create server-level objects
  NFS: Register NFS for caching and retrieve the top-level index
  NFS: Permit local filesystem caching to be enabled for NFS
  NFS: Add FS-Cache option bit and debug bit
  NFS: Add comment banners to some NFS functions
  FS-Cache: Make kAFS use FS-Cache
  CacheFiles: A cache that backs onto a mounted filesystem
  CacheFiles: Export things for CacheFiles
  ...
2009-04-03 10:07:43 -07:00
Paul E. McKenney 31c9a24ec8 RCU: make treercu be default
Impact: switch default config from CLASSIC_RCU to TREE_RCU

Given that I have not gotten any complaints or bug reports on treercu
recently, this patch makes it be the default.  There are a number of
other defconfig files that explicitly call out CLASSIC_RCU, but which
have comment headers saying not to edit them.  Probably holdovers from
one of the flavors of "make config", but...

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: akpm@linux-foundation.org
Cc: dipankar@in.ibm.com
Cc: niv@us.ibm.com
Cc: manfred@colorfullife.com
Cc: peterz@infradead.org
LKML-Reference: <20090403040625.GA9473@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-03 18:27:37 +02:00
Linus Torvalds 223cdea4c4 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (53 commits)
  md/raid5 revise rules for when to update metadata during reshape
  md/raid5: minor code cleanups in make_request.
  md: remove CONFIG_MD_RAID_RESHAPE config option.
  md/raid5: be more careful about write ordering when reshaping.
  md: don't display meaningless values in sysfs files resync_start and sync_speed
  md/raid5: allow layout and chunksize to be changed on active array.
  md/raid5: reshape using largest of old and new chunk size
  md/raid5: prepare for allowing reshape to change layout
  md/raid5: prepare for allowing reshape to change chunksize.
  md/raid5: clearly differentiate 'before' and 'after' stripes during reshape.
  Documentation/md.txt update
  md: allow number of drives in raid5 to be reduced
  md/raid5: change reshape-progress measurement to cope with reshaping backwards.
  md: add explicit method to signal the end of a reshape.
  md/raid5: enhance raid5_size to work correctly with negative delta_disks
  md/raid5: drop qd_idx from r6_state
  md/raid6: move raid6 data processing to raid6_pq.ko
  md: raid5 run(): Fix max_degraded for raid level 4.
  md: 'array_size' sysfs attribute
  md: centralize ->array_sectors modifications
  ...
2009-04-03 09:08:19 -07:00
David Howells 07fe7cb7c7 Create a dynamically sized pool of threads for doing very slow work items
Create a dynamically sized pool of threads for doing very slow work items, such
as invoking mkdir() or rmdir() - things that may take a long time and may
sleep, holding mutexes/semaphores and hogging a thread, and are thus unsuitable
for workqueues.

The number of threads is always at least a settable minimum, but more are
started when there's more work to do, up to a limit.  Because of the nature of
the load, it's not suitable for a 1-thread-per-CPU type pool.  A system with
one CPU may well want several threads.

This is used by FS-Cache to do slow caching operations in the background, such
as looking up, creating or deleting cache objects.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:35 +01:00
Linus Torvalds 8fe74cf053 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Remove two unneeded exports and make two symbols static in fs/mpage.c
  Cleanup after commit 585d3bc06f
  Trim includes of fdtable.h
  Don't crap into descriptor table in binfmt_som
  Trim includes in binfmt_elf
  Don't mess with descriptor table in load_elf_binary()
  Get rid of indirect include of fs_struct.h
  New helper - current_umask()
  check_unsafe_exec() doesn't care about signal handlers sharing
  New locking/refcounting for fs_struct
  Take fs_struct handling to new file (fs/fs_struct.c)
  Get rid of bumping fs_struct refcount in pivot_root(2)
  Kill unsharing fs_struct in __set_personality()
2009-04-02 21:09:10 -07:00
Paul Menage db7f47cf48 cpusets: allow cpusets to be configured/built on non-SMP systems
Allow cpusets to be configured/built on non-SMP systems

Currently it's impossible to build cpusets under UML on x86-64, since
cpusets depends on SMP and x86-64 UML doesn't support SMP.

There's code in cpusets that doesn't depend on SMP.  This patch surrounds
the minimum amount of cpusets code with #ifdef CONFIG_SMP in order to
allow cpusets to build/run on UP systems (for testing purposes under UML).

Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Paul Menage <menage@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:04:57 -07:00
KAMEZAWA Hiroyuki 627991a20b memcg: remove redundant message at swapon
It's pointed out that swap_cgroup's message at swapon() is nonsense.
Because

  * It can be calculated very easily if all necessary information is
    written in Kconfig.

  * It's not necessary to annoying people at every swapon().

In other view, now, memory usage per swp_entry is reduced to 2bytes from
8bytes(64bit) and I think it's reasonably small.

Reported-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:04:56 -07:00
Simon Kitching c1c490e017 initramfs: prevent initramfs printk message being split by messages from other code.
initramfs uses printk without a linefeed, then does some work, then uses
printk to finish the message off.  However if some other code does a
printk in between, then the messages get mixed together.  Better for each
message to be an independent line...

Example of problem that this fixes:

    checking if image is initramfs...<7>Switched to high resolution mode on CPU 1
    Switched to high resolution mode on CPU 0
    it is

Signed-off-by: Simon Kitching <skitching@apache.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:04:51 -07:00
Ingo Molnar 8302294f43 Merge branch 'tracing/core-v2' into tracing-for-linus
Conflicts:
	include/linux/slub_def.h
	lib/Kconfig.debug
	mm/slob.c
	mm/slub.c
2009-04-02 00:49:02 +02:00
Hannes Eder acdd052a28 init/main.c: fix sparse warnings: context imbalance
Impact: Attribute function 'init_post' with __releases(...).

Fix these sparse warnings:
  init/main.c:805:21: warning: context imbalance in 'init_post' - unexpected unlock
  init/main.c:899:9: warning: context imbalance in 'kernel_init' - wrong count at exit

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01 08:59:18 -07:00
Al Viro 5ad4e53bd5 Get rid of indirect include of fs_struct.h
Don't pull it in sched.h; very few files actually need it and those
can include directly.  sched.h itself only needs forward declaration
of struct fs_struct;

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-31 23:00:27 -04:00
NeilBrown bff61975b3 md: move lots of #include lines out of .h files and into .c
This makes the includes more explicit, and is preparation for moving
md_k.h to drivers/md/md.h

Remove include/raid/md.h as its only remaining use was to #include
other files.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-03-31 14:33:13 +11:00
NeilBrown 8b2b5c217c md: move LEVEL_* definition from md_k.h to md_u.h
.. as they are part of the user-space interface.
Also move MdpMinorShift into there so we can remove duplication.

Lastly move mdp_major in.  It is less obviously part of the user-space
interface, but do_mounts_md.c uses it, and it is acting a bit like
user-space.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-03-31 14:27:03 +11:00
Thadeu Lima de Souza Cascardo 21acb9caa2 trivial: fix where cgroup documentation is not correctly referred to
cgroup documentation was moved to Documentation/cgroups/. There are some
places that still refer to Documentation/controllers/,
Documentation/cgroups.txt and Documentation/cpusets.txt. Fix those.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-03-30 15:22:02 +02:00
Matt LaPlante 692105b8ac trivial: fix typos/grammar errors in Kconfig texts
Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-03-30 15:22:01 +02:00
Rusty Russell 2b17fa506c cpumask: use set_cpu_active in init/main.c
cpu_active_map is deprecated in favor of cpu_active_mask, which is
const for safety: we use accessors now (set_cpu_active) is we really
want to make a change.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-30 22:05:12 +10:30
Rusty Russell 1a2142afa5 cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL
Impact: cleanup

(Thanks to Al Viro for reminding me of this, via Ingo)

CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:

	#define CPU_MASK_ALL (cpumask_t) { { ... } }

Taking the address of such a temporary is questionable at best,
unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
CPU_MASK_ALL_PTR:

	#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)

Which formalizes this practice.  One day gcc could bite us over this
usage (though we seem to have gotten away with it so far).

So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR
with the modern "cpu_all_mask" (a real const struct cpumask *).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
2009-03-30 22:05:11 +10:30
Linus Torvalds eedf2c5296 Merge git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async-for-30
* git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async-for-30:
  fastboot: remove duplicate unpack_to_rootfs()
  ide/net: flip the order of SATA and network init
  async: remove the temporary (2.6.29) "async is off by default" code

Fix up conflicts in init/initramfs.c manually
2009-03-28 14:00:33 -07:00
Li, Shaohua df52092f3c fastboot: remove duplicate unpack_to_rootfs()
we check if initrd is initramfs first and then do the real unpack. The check
isn't required, we can directly do unpack.  If the initrd isn't an
initramfs, we can remove the garbage.  In my laptop, this saves 0.1s boot
time.

This patch penalizes non-initramfs initrd case, but nowadays, initramfs is
the most widely used method for initrds.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-28 13:06:22 -07:00
Ingo Molnar 6e15cf0486 Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2
Conflicts:
	arch/parisc/kernel/irq.c
	arch/x86/include/asm/fixmap_64.h
	arch/x86/include/asm/setup.h
	kernel/irq/handle.c

Semantic merge:
        arch/x86/include/asm/fixmap.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-27 17:28:43 +01:00
Lai Jiangshan 759ee0915d init,cpuset: fix initialize order
Impact: cpuset_wq should be initialized after init_workqueues()

When I read /debugfs/tracing/trace_stat/workqueues,
I got this:

 # CPU  INSERTED  EXECUTED   NAME
 # |      |         |          |

   0      0          0       cpuset
   0    285        285       events/0
   0      2          2       work_on_cpu/0
   0   1115       1115       khelper
   0    325        325       kblockd/0
   0      0          0       kacpid
   0      0          0       kacpi_notify
   0      0          0       ata/0
   0      0          0       ata_aux
   0      0          0       ksuspend_usbd
   0      0          0       aio/0
   0      0          0       nfsiod
   0      0          0       kpsmoused
   0      0          0       kstriped
   0      0          0       kondemand/0
   0      1          1       hid_compat
   0      0          0       rpciod/0

   1     64         64       events/1
   1      2          2       work_on_cpu/1
   1      5          5       kblockd/1
   1      0          0       ata/1
   1      0          0       aio/1
   1      0          0       kondemand/1
   1      0          0       rpciod/1

I found "cpuset" is at the earliest.

I found a create_singlethread_workqueue() is earlier than
init_workqueues():

kernel_init()
->cpuset_init_smp()
  ->create_singlethread_workqueue()
->do_basic_setup()
  ->init_workqueues()

I think it's better that create_singlethread_workqueue() is called
after workqueue subsystem has been initialized.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Menage <menage@google.com>
Cc: miaoxie <miaox@cn.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49C9F416.1050707@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 18:32:35 +01:00
Ingo Molnar f6411fe7e0 Merge branches 'sched/clock', 'sched/urgent' and 'linus' into sched/core 2009-03-13 04:50:44 +01:00
Ingo Molnar 480c93df5b Merge branch 'core/locking' into tracing/ftrace 2009-03-13 01:33:21 +01:00
Randy Dunlap b943c460ff menu: fix embedded menu snafu
The COMPAT_BRK kconfig symbol does not depend on EMBEDDED, but it is in
the midst of the EMBEDDED menu symbols, so it mucks up the EMBEDDED menu.
Fix by moving it to just after all of the EMBEDDED menu symbols.  Also,
ANON_INODES has a similar problem, so move it to just above the EMBEDDED
menu items since it is used in the EMBEDDED menu.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-10 15:55:10 -07:00
Ingo Molnar f0ef039851 Merge branch 'x86/core' into tracing/textedit
Conflicts:
	arch/x86/Kconfig
	block/blktrace.c
	kernel/irq/handle.c

Semantic conflict:
	kernel/trace/blktrace.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 16:45:01 +01:00
Ingo Molnar a1413c89ae Merge branch 'x86/urgent' into x86/core
Conflicts:
	arch/x86/include/asm/fixmap_64.h
Semantic merge:
	arch/x86/include/asm/fixmap.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 21:48:50 +01:00
Ingo Molnar 49d2d266ad Merge commit 'v2.6.29-rc7' into sched/core 2009-03-05 11:59:10 +01:00
Ingo Molnar 8163d88c79 Merge commit 'v2.6.29-rc7' into perfcounters/core
Conflicts:
	arch/x86/mm/iomap_32.c
2009-03-04 11:42:31 +01:00
Ingo Molnar a1be621dfa Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/core 2009-03-04 11:14:47 +01:00
Ingo Molnar 8b0e5860cb Merge branches 'x86/apic', 'x86/cpu', 'x86/fixmap', 'x86/mm', 'x86/sched', 'x86/setup-lzma', 'x86/signal' and 'x86/urgent' into x86/core 2009-03-04 02:22:31 +01:00
Linus Torvalds 2450cf51a1 Revert "menu: fix embedded menu snafu"
This reverts commit 155b25bcc2, which was
totally wrong - the "embedded" options still exists (very much so) even
on non-embedded platforms.

It's just that we don't bother with actually asking about them when
we're not embedded, we just take their default values (which is usually
'y' - the options add features that may not be worth it in a constrained
environment).

Noticed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-02 16:23:33 -08:00
Randy Dunlap 155b25bcc2 menu: fix embedded menu snafu
The COMPAT_BRK kconfig symbol does not depend on EMBEDDED, but it is in
the midst of the EMBEDDED menu symbols, so it mucks up the EMBEDDED
menu.  Fix by moving it to just after all of the EMBEDDED menu symbols.

Also, surround all of the EMBEDDED symbols with "if EMBEDDED"/"endif" so
that this EMBEDDED block is clearer.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-02 15:49:16 -08:00
Ingo Molnar 4434e51564 Merge branches 'sched/cleanups', 'sched/urgent' and 'linus' into sched/core 2009-02-26 13:22:13 +01:00
Ingo Molnar 8e818179eb Merge branch 'x86/core' into perfcounters/core
Conflicts:
	arch/x86/kernel/apic/apic.c
	arch/x86/kernel/irqinit_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 13:02:23 +01:00
Paul E. McKenney a682604838 rcu: Teach RCU that idle task is not quiscent state at boot
This patch fixes a bug located by Vegard Nossum with the aid of
kmemcheck, updated based on review comments from Nick Piggin,
Ingo Molnar, and Andrew Morton.  And cleans up the variable-name
and function-name language.  ;-)

The boot CPU runs in the context of its idle thread during boot-up.
During this time, idle_cpu(0) will always return nonzero, which will
fool Classic and Hierarchical RCU into deciding that a large chunk of
the boot-up sequence is a big long quiescent state.  This in turn causes
RCU to prematurely end grace periods during this time.

This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
function to ignore the idle task as a quiescent state until the
system has started up the scheduler in rest_init(), introducing a
new non-API function rcu_idle_now_means_idle() to inform RCU of this
transition.  RCU maintains an internal rcu_idle_cpu_truthful variable
to track this state, which is then used by rcu_check_callback() to
determine if it should believe idle_cpu().

Because this patch has the effect of disallowing RCU grace periods
during long stretches of the boot-up sequence, this patch also introduces
Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
no-op if num_online_cpus() returns 1.  This allows boot-time code that
calls synchronize_rcu() to proceed normally.  Note, however, that RCU
callbacks registered by call_rcu() will likely queue up until later in
the boot sequence.  Although rcuclassic and rcutree can also use this
same optimization after boot completes, rcupreempt must restrict its
use of this optimization to the portion of the boot sequence before the
scheduler starts up, given that an rcupreempt RCU read-side critical
section may be preeempted.

In addition, this patch takes Nick Piggin's suggestion to make the
system_state global variable be __read_mostly.

Changes since v4:

o	Changes the name of the introduced function and variable to
	be less emotional.  ;-)

Changes since v3:

o	WARN_ON(nr_context_switches() > 0) to verify that RCU
	switches out of boot-time mode before the first context
	switch, as suggested by Nick Piggin.

Changes since v2:

o	Created rcu_blocking_is_gp() internal-to-RCU API that
	determines whether a call to synchronize_rcu() is itself
	a grace period.

o	The definition of rcu_blocking_is_gp() for rcuclassic and
	rcutree checks to see if but a single CPU is online.

o	The definition of rcu_blocking_is_gp() for rcupreempt
	checks to see both if but a single CPU is online and if
	the system is still in early boot.

	This allows rcupreempt to again work correctly if running
	on a single CPU after booting is complete.

o	Added check to rcupreempt's synchronize_sched() for there
	being but one online CPU.

Tested all three variants both SMP and !SMP, booted fine, passed a short
rcutorture test on both x86 and Power.

Located-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 04:08:14 +01:00
Ingo Molnar a7f4463e03 Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc6' into tracing/core 2009-02-24 18:22:39 +01:00
Ingo Molnar fc6fc7f1b1 Merge branch 'linus' into x86/apic
Conflicts:
	arch/x86/mach-default/setup.c

Semantic conflict resolution:
	arch/x86/kernel/setup.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 20:05:19 +01:00
Arjan van de Ven 216773a787 Consolidate driver_probe_done() loops into one place
there's a few places that currently loop over driver_probe_done(), and
I'm about to add another one. This patch abstracts it into a helper
to reduce duplication.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21 14:17:17 -08:00
Frederic Weisbecker 91f73f90d9 tracing/markers: make markers select tracepoints
Sometimes it happens that KConfig dependencies are not handled
like in the following scenario:

- config A
   bool

- config B
   bool
   depends on A

- config C
   bool
   select B

If one selects C, then it will select B without checking its
dependency to A, if A hasn't been selected elsewhere, it will
result in a build failure.

This is what happens on the following build error:

 kernel/built-in.o: In function `marker_update_probe_range':
 (.text+0x52f64): undefined reference to `tracepoint_probe_register_noupdate'
 kernel/built-in.o: In function `marker_update_probe_range':
 (.text+0x52f74): undefined reference to `tracepoint_probe_unregister_noupdate'
 kernel/built-in.o: In function `marker_update_probe_range':
 (.text+0x52fb9): undefined reference to `tracepoint_probe_unregister_noupdate'
 kernel/built-in.o: In function `marker_update_probes':
 marker.c:(.text+0x530ba): undefined reference to `tracepoint_probe_update_all'

CONFIG_KVM_TRACE will select CONFIG_MARKER, but the latter
depends on CONFIG_TRACEPOINTS which will not be selected.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 19:30:04 +01:00
Ingo Molnar b1864e9a1a Merge branch 'x86/core' into perfcounters/core
Conflicts:
	arch/x86/Kconfig
	arch/x86/kernel/apic.c
	arch/x86/kernel/setup_percpu.c
2009-02-13 09:49:38 +01:00
Ingo Molnar 95fd4845ed Merge commit 'v2.6.29-rc4' into perfcounters/core
Conflicts:
	arch/x86/kernel/setup_percpu.c
	arch/x86/mm/fault.c
	drivers/acpi/processor_idle.c
	kernel/irq/handle.c
2009-02-11 09:22:04 +01:00
Ingo Molnar 140573d33b Merge branches 'sched/rt' and 'sched/urgent' into sched/core 2009-02-08 20:12:46 +01:00
Ingo Molnar 65a4e574d2 smp, generic: introduce arch_disable_smp_support() instead of disable_ioapic_setup()
Impact: cleanup

disable_ioapic_setup() in init/main.c is ugly as the function is
x86-specific. The #ifdef inline prototype there is ugly too.

Replace it with a generic arch_disable_smp_support() function - which
has a weak alias for non-x86 architectures and for non-ioapic x86 builds.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05 22:27:54 +01:00
Ingo Molnar dc573f9b20 Merge branches 'tracing/ftrace', 'tracing/kmemtrace' and 'linus' into tracing/core 2009-02-03 06:25:38 +01:00
Ingo Molnar 3ddeb51d9c Merge branch 'linus' into core/percpu
Conflicts:
	arch/x86/kernel/setup_percpu.c
2009-01-27 12:01:51 +01:00
Linus Torvalds 2034563ca3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
  kbuild: fix kbuild.txt typos
  kbuild: print usage with no arguments in scripts/config
  Revert "kbuild: strip generated symbols from *.ko"
2009-01-26 15:10:37 -08:00
Linus Torvalds 924d26df6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (92 commits)
  gianfar: Revive VLAN support
  vlan: Export symbols as non GPL symbols.
  bnx2x: tx_has_work should not wait for FW
  netxen: reduce memory footprint
  netxen: fix vlan tso/checksum offload
  net: Fix linux/if_frad.h's suitability for userspace.
  net: Move config NET_NS to from net/Kconfig to init/Kconfig
  isdn: Fix missing ifdef in isdn_ppp
  networking: document "nc" in addition to "netcat" in netconsole.txt
  e1000e: workaround hw errata
  af_key: initialize xfrm encap_oa
  virtio_net: Fix MAX_PACKET_LEN to support 802.1Q VLANs
  lcs: fix compilation for !CONFIG_IP_MULTICAST
  rtl8187: Add termination packet to prevent stall
  iwlwifi: fix rs_get_rate WARN_ON()
  p54usb: fix packet loss with first generation devices
  sctp: Fix another socket race during accept/peeloff
  sctp: Properly timestamp outgoing data chunks for rtx purposes
  sctp: Correctly start rtx timer on new packet transmissions.
  sctp: Fix crc32c calculations on big-endian arhes.
  ...
2009-01-26 15:09:20 -08:00
Matt Helsley d6eb633fe6 net: Move config NET_NS to from net/Kconfig to init/Kconfig
Make NET_NS available underneath the generic Namespaces config option
since all of the other namespace options are there.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-26 12:25:55 -08:00
Ingo Molnar bfe2a3c3b5 Merge branch 'core/percpu' into perfcounters/core
Conflicts:
	arch/x86/include/asm/hardirq_32.h
	arch/x86/include/asm/hardirq_64.h

Semantic merge:
	arch/x86/include/asm/hardirq.h
	[ added apic_perf_irqs field. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 10:20:15 +01:00
Thomas Gleixner 6552ebae25 Merge branch 'core/debugobjects' into core/urgent 2009-01-22 10:03:02 +01:00
Ingo Molnar 77835492ed Merge commit 'v2.6.29-rc2' into perfcounters/core
Conflicts:
	include/linux/syscalls.h
2009-01-21 16:37:27 +01:00
Ingo Molnar 198030782c Merge branch 'x86/mm' into core/percpu
Conflicts:
	arch/x86/mm/fault.c
2009-01-21 10:39:51 +01:00
Ingo Molnar 4092762aeb Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc2' into tracing/core 2009-01-18 20:15:05 +01:00
Ingo Molnar b2b062b816 Merge branch 'core/percpu' into stackprotector
Conflicts:
	arch/x86/include/asm/pda.h
	arch/x86/include/asm/system.h

Also, moved include/asm-x86/stackprotector.h to arch/x86/include/asm.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-18 18:37:14 +01:00
Peter Zijlstra ceacc2c1c8 sched: make plist a library facility
Ingo Molnar wrote:

> here's a new build failure with tip/sched/rt:
>
>   LD      .tmp_vmlinux1
> kernel/built-in.o: In function `set_curr_task_rt':
> sched.c:(.text+0x3675): undefined reference to `plist_del'
> kernel/built-in.o: In function `pick_next_task_rt':
> sched.c:(.text+0x37ce): undefined reference to `plist_del'
> kernel/built-in.o: In function `enqueue_pushable_task':
> sched.c:(.text+0x381c): undefined reference to `plist_del'

Eliminate the plist library kconfig and make it available
unconditionally.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-16 15:01:31 +01:00
Li Zefan 45ce80fb6b cgroups: consolidate cgroup documents
Move Documentation/cpusets.txt and Documentation/controllers/* to
Documentation/cgroups/

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:37 -08:00
Li Zefan 23964d2d02 cgroups: clean up Kconfig
- move CONFIG_PROC_PID_CPUSET into cgroup menu
- move MM_OWNER to the bottom for better menu indent
- fix typos
- use tabs not spaces

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:37 -08:00
Mike Travis c903ff8379 rcu: move Kconfig menu
Move RCU Kconfig options from top-level menu to an "RCU Subsystem"
menu under the "General Setup" menu.

Signed-off-by: Mike Travis <travis@sgi.com>
Tested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15 23:12:40 +01:00
Sam Ravnborg 2ea038917b Revert "kbuild: strip generated symbols from *.ko"
This reverts commit ad7a953c52.

And commit: ("allow stripping of generated symbols under CONFIG_KALLSYMS_ALL")
            9bb482476c

These stripping patches has caused a set of issues:

1) People have reported compatibility issues with binutils due to
   lack of support for `--strip-unneeded-symbols' with objcopy 2.15.92.0.2
   Reported by: Wenji
2) ccache and distcc no longer works as expeced
   Reported by: Ted, Roland, + others
3) The installed modules increased a lot in size
   Reported by: Ted, Davej + others

Reported-by: Wenji Huang <wenji.huang@oracle.com>
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-01-14 21:38:20 +01:00
H. Peter Anvin 73310a169a init: make initrd/initramfs decompression failure a KERN_EMERG event
Impact: More consistent behaviour, avoid policy in the kernel

Upgrade/downgrade initrd/initramfs decompression failure from
inconsistently a panic or a KERN_ALERT message to a KERN_EMERG event.
It is, however, possible do design a system which can recover from
this (using the kernel builtin code and/or the internal initramfs),
which means this is policy, not a technical necessity.

A good way to handle this would be to have a panic-level=X option, to
force a panic on a printk above a certain level.  That is a separate
patch, however.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-01-14 11:28:35 -08:00
H. Peter Anvin 23a22d57a8 bzip2/lzma: comprehensible error messages for missing decompressor
Instead of failing to identify a compressed image with a decompressor
that we don't have compiled in, identify it and fail with a
comprehensible panic message.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-01-12 14:34:31 -08:00
Ingo Molnar 99cd707489 Merge commit 'v2.6.29-rc1' into tracing/urgent 2009-01-11 03:43:52 +01:00
Ingo Molnar 506c10f26c Merge commit 'v2.6.29-rc1' into perfcounters/core
Conflicts:
	include/linux/kernel_stat.h
2009-01-11 02:42:53 +01:00
Ingo Molnar 736f93236c bzip2/lzma: make flush_buffer() unconditional
Impact: build fix

flush_buffer() is used unconditionally:

  init/initramfs.c:456: error: 'flush_buffer' undeclared (first use in this function)
  init/initramfs.c:456: error: (Each undeclared identifier is reported only once
  init/initramfs.c:456: error: for each function it appears in.)

So remove the decompressor #ifdefs from around it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-10 12:06:19 +01:00
Ingo Molnar b17304245f Merge branch 'linus' into x86/setup-lzma
Conflicts:
	init/do_mounts_rd.c
2009-01-10 12:04:41 +01:00
Linus Torvalds 31aeb6c815 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  MAINTAINERS: squashfs entry
  Squashfs: documentation
  Squashfs: initrd support
  Squashfs: Kconfig entry
  Squashfs: Makefiles
  Squashfs: header files
  Squashfs: block operations
  Squashfs: cache operations
  Squashfs: uid/gid lookup operations
  Squashfs: fragment block operations
  Squashfs: export operations
  Squashfs: super block operations
  Squashfs: symlink operations
  Squashfs: regular file operations
  Squashfs: directory readdir operations
  Squashfs: directory lookup operations
  Squashfs: inode operations
2009-01-09 15:18:49 -08:00
Linus Torvalds c40f6f8bbc Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu:
  NOMMU: Support XIP on initramfs
  NOMMU: Teach kobjsize() about VMA regions.
  FLAT: Don't attempt to expand the userspace stack to fill the space allocated
  FDPIC: Don't attempt to expand the userspace stack to fill the space allocated
  NOMMU: Improve procfs output using per-MM VMAs
  NOMMU: Make mmap allocation page trimming behaviour configurable.
  NOMMU: Make VMAs per MM as for MMU-mode linux
  NOMMU: Delete askedalloc and realalloc variables
  NOMMU: Rename ARM's struct vm_region
  NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area()
2009-01-09 14:00:58 -08:00
H. Peter Anvin 889c92d21d bzip2/lzma: centralize format detection
Centralize the compression format detection to a common routine in the
lib directory, and use it for both initramfs and initrd.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-08 15:14:17 -08:00
KAMEZAWA Hiroyuki c077719be8 memcg: mem+swap controller Kconfig
Config and control variable for mem+swap controller.

This patch adds CONFIG_CGROUP_MEM_RES_CTLR_SWAP
(memory resource controller swap extension.)

For accounting swap, it's obvious that we have to use additional memory to
remember "who uses swap".  This adds more overhead.  So, it's better to
offer "choice" to users.  This patch adds 2 choices.

This patch adds 2 parameters to enable swap extension or not.
  - CONFIG
  - boot option

Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
Li Zefan c9d5409f8d memcg: fix a typo in Kconfig
s/contoller/controller/

Signed-of-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:02 -08:00
KAMEZAWA Hiroyuki 5cdc38f985 cgroups: make cgroup config a submenu
Making CGROUP related configs be a sub-menu.

This patch make CGROUP related configs be a sub-menu and makes 1st level
configs of "General Setup" shorter.

 including following additional changes
  - add help comment about CGROUPS and GROUP_SCHED.
  - moved MM_OWNER config to the bottom.
    (for good indent in menuconfig)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:01 -08:00
David Howells cb6ff20807 NOMMU: Support XIP on initramfs
Support XIP on files unpacked from the initramfs image on NOMMU systems.  This
simply requires the length of the file to be preset so that the ramfs fs can
attempt to garner sufficient contiguous storage to store the file (NOMMU mmap
can only map contiguous RAM).

All the other bits to do XIP on initramfs files are present:

 (1) ramfs's truncate attempts to allocate a contiguous run of pages when a
     file is truncated upwards from nothing.

 (2) ramfs sets BDI on its files to indicate direct mapping is possible, and
     that its files can be mapped for read, write and exec.

 (3) NOMMU mmap() will use the above bits to determine that it can do XIP.
     Possibly this needs better controls, because it will _always_ try and do
     XIP.

One disadvantage of this very simplistic approach is that sufficient space
will be allocated to store the whole file, and not just the bit that would be
XIP'd.  To deal with this, though, the initramfs unpacker would have to be
able to parse the file contents.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
2009-01-08 12:04:48 +00:00
Linus Torvalds 67acd8b4b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async
* git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async:
  async: don't do the initcall stuff post boot
  bootchart: improve output based on Dave Jones' feedback
  async: make the final inode deletion an asynchronous event
  fastboot: Make libata initialization even more async
  fastboot: make the libata port scan asynchronous
  fastboot: make scsi probes asynchronous
  async: Asynchronous function calls to speed up kernel boot
2009-01-07 15:35:47 -08:00
Linus Torvalds 57c44c5f6f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
  trivial: chack -> check typo fix in main Makefile
  trivial: Add a space (and a comma) to a printk in 8250 driver
  trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
  trivial: Fix misspelling of "firmware" in powerpc Makefile
  trivial: Fix misspelling of "firmware" in usb.c
  trivial: Fix misspelling of "firmware" in qla1280.c
  trivial: Fix misspelling of "firmware" in a100u2w.c
  trivial: Fix misspelling of "firmware" in megaraid.c
  trivial: Fix misspelling of "firmware" in ql4_mbx.c
  trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
  trivial: Fix misspelling of "firmware" in ipw2100.c
  trivial: Fix misspelling of "firmware" in atmel.c
  trivial: Fix misspelled firmware in Kconfig
  trivial: fix an -> a typos in documentation and comments
  trivial: fix then -> than typos in comments and documentation
  trivial: update Jesper Juhl CREDITS entry with new email
  trivial: fix singal -> signal typo
  trivial: Fix incorrect use of "loose" in event.c
  trivial: printk: fix indentation of new_text_line declaration
  trivial: rtc-stk17ta8: fix sparse warning
  ...
2009-01-07 11:31:52 -08:00
Arjan van de Ven 22a9d64567 async: Asynchronous function calls to speed up kernel boot
Right now, most of the kernel boot is strictly synchronous, such that
various hardware delays are done sequentially.

In order to make the kernel boot faster, this patch introduces
infrastructure to allow doing some of the initialization steps
asynchronously, which will hide significant portions of the hardware delays
in practice.

In order to not change device order and other similar observables, this
patch does NOT do full parallel initialization.

Rather, it operates more in the way an out of order CPU does; the work may
be done out of order and asynchronous, but the observable effects
(instruction retiring for the CPU) are still done in the original sequence.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2009-01-07 08:45:46 -08:00
Alain Knaff a26ee60f90 bzip2/lzma: fix built-in initramfs vs CONFIG_RD_GZIP
Impact: Resolves build failures in some configurations

Makes it possible to disable CONFIG_RD_GZIP . In that case, the
built-in initramfs will be compressed by whatever compressor is
available (bzip2 or lzma) or left uncompressed if none is available.

It also removes a couple of warnings which occur when no ramdisk
compression at all is chosen.

It also restores the select ZLIB_INFLATE in drivers/block/Kconfig
which somehow came missing. This is needed to activate compilation of
the stuff in zlib_deflate.

Signed-off-by: Alain Knaff <alain@knaff.lu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-07 00:10:27 -08:00
Linus Torvalds 40d7ee5d16 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (60 commits)
  uio: make uio_info's name and version const
  UIO: Documentation for UIO ioport info handling
  UIO: Pass information about ioports to userspace (V2)
  UIO: uio_pdrv_genirq: allow custom irq_flags
  UIO: use pci_ioremap_bar() in drivers/uio
  arm: struct device - replace bus_id with dev_name(), dev_set_name()
  libata: struct device - replace bus_id with dev_name(), dev_set_name()
  avr: struct device - replace bus_id with dev_name(), dev_set_name()
  block: struct device - replace bus_id with dev_name(), dev_set_name()
  chris: struct device - replace bus_id with dev_name(), dev_set_name()
  dmi: struct device - replace bus_id with dev_name(), dev_set_name()
  gadget: struct device - replace bus_id with dev_name(), dev_set_name()
  gpio: struct device - replace bus_id with dev_name(), dev_set_name()
  gpu: struct device - replace bus_id with dev_name(), dev_set_name()
  hwmon: struct device - replace bus_id with dev_name(), dev_set_name()
  i2o: struct device - replace bus_id with dev_name(), dev_set_name()
  IA64: struct device - replace bus_id with dev_name(), dev_set_name()
  i7300_idle: struct device - replace bus_id with dev_name(), dev_set_name()
  infiniband: struct device - replace bus_id with dev_name(), dev_set_name()
  ISDN: struct device - replace bus_id with dev_name(), dev_set_name()
  ...
2009-01-06 17:02:07 -08:00
Jan Beulich d2e3192b6e init/main.c: mark late_time_init as __initdata
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:14 -08:00
Mike Frysinger ff083c8372 autodetect_raid: add missing __init marking
The function autodetect_raid is only used by __init functions, and it refers
to __initdata, so it needs __init markings.  Fixes this error:
The function autodetect_raid() references
the variable __initdata raid_noautodetect.
This is often because autodetect_raid lacks a __initdata
annotation or the annotation of raid_noautodetect is wrong.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:12 -08:00
Alexey Dobriyan f1883f86de Remove remaining unwinder code
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Gabor Gombas <gombasg@sztaki.hu>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:11 -08:00
Marton Balint bca1033b09 do_mounts: add device info to mount message
In the past, I used the root=...  command line parameter to specify the
root filesystem to the kernel.  Now it seems that specifying it is not
necessary.  The kernel detects the root filesystem even if the kernel
command line is empty.  My root fs is on a raid1 device by the way, and I
am not using initrd for the boot process.

If the kernel detects the root filesystem somehow, I think it should print
out the result of this detection, otherwise I will not know which device
has the root filesystem.  Or is there an easy way to get this information
on a running system?  I had a quick look at the /proc and /sys
filesystems, but haven't found anything useful there.

Signed-off-by: Marton Balint <cus@fazekas.hu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:11 -08:00
Rakib Mullick f99ebf0a86 init: properly placing noinline keyword
checkpatch warns about 'static void noinline'.  It wants `static noinline
void'.

Both are permissible, but the kernel consistently uses `static inline' and
`static noinline', and consistency is good.  Hence let's keep the
checkpatch warning and fix up this code site.

[akpm@linux-foundation.org: rewrote changelog]
Signed-off-by: Md.Rakib H. Mullick <rakib.mullick@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:10 -08:00
Matt Mackall 853ac43ab1 shmem: unify regular and tiny shmem
tiny-shmem shares most of its 130 lines of code with shmem and tends to
break when particular bits of shmem get modified.  Unifying saves code and
makes keeping these two in sync much easier.

before:
  14367	    392	     24	  14783	   39bf	mm/shmem.o
    396      72       8     476	    1dc	mm/tiny-shmem.o

after:
  14367	    392	     24	  14783	   39bf	mm/shmem.o
    412	     72       8     492	    1ec	mm/shmem.o tiny

Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:08 -08:00
Kay Sievers fce3e804cf sysfs: clarify SYSFS_DEPRECATED help text
This should make the help text of SYSFS_DEPRECATED more clear, that this
is _not_ about (what some people think it is) suppressing a few symlinks
and variables, but a different sysfs _layout_ with new features.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:31 -08:00
Ron Lee 24d431d06a trivial: add missing printk loglevel in start_kernel
Add missing printk loglevel in start_kernel

Signed-off-by: Ron Lee <ron@debian.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06 11:28:05 +01:00
Ingo Molnar 3d7a96f5a4 Merge branch 'linus' into tracing/kmemtrace2 2009-01-06 09:53:05 +01:00
Phillip Lougher b8fed87d32 Squashfs: initrd support
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2009-01-05 08:46:28 +00:00
H. Peter Anvin 2e9f3bddcb bzip2/lzma: make config machinery an arch configurable
Impact: Bug fix (we should not show this menu on irrelevant architectures)

Make the config machinery to drive the gzip/bzip2/lzma selection
dependent on the architecture advertising HAVE_KERNEL_* so that we
don't display this for architectures where it doesn't matter.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-04 15:53:35 -08:00
H. Peter Anvin b172fd882d bzip2/lzma: use a table to search for initramfs compression formats
Impact: Code simplification

Instead of open-coding testing for initramfs compression formats, use
a table.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-04 15:53:35 -08:00
Alain Knaff 30d65dbfe3 bzip2/lzma: config and initramfs support for bzip2/lzma decompression
Impact: New code for initramfs decompression, new features

This is the second part of the bzip2/lzma patch

The bzip patch is based on an idea by Christian Ludwig, includes support for
compressing the kernel with bzip2 or lzma rather than gzip. Both
compressors give smaller sizes than gzip.  Lzma's decompresses faster
than bzip2.

It also supports ramdisks and initramfs' compressed using these two
compressors.

The functionality has been successfully used for a couple of years by
the udpcast project

This version applies to "tip" kernel 2.6.28

This part contains:
- support for new compressions (bzip2 and lzma) in initramfs and
old-style ramdisk
- config dialog for kernel compression (but new kernel compressions
not yet supported)

Signed-off-by: Alain Knaff <alain@knaff.lu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-04 15:53:35 -08:00
Johannes Berg a327ca2c26 remove CONFIG_KMOD
Now that nothing depends on it any more, remove CONFIG_KMOD.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05 08:40:10 +10:30
Linus Torvalds 7d3b56ba37 Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
  x86: setup_per_cpu_areas() cleanup
  cpumask: fix compile error when CONFIG_NR_CPUS is not defined
  cpumask: use alloc_cpumask_var_node where appropriate
  cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
  x86: use cpumask_var_t in acpi/boot.c
  x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
  sched: put back some stack hog changes that were undone in kernel/sched.c
  x86: enable cpus display of kernel_max and offlined cpus
  ia64: cpumask fix for is_affinity_mask_valid()
  cpumask: convert RCU implementations, fix
  xtensa: define __fls
  mn10300: define __fls
  m32r: define __fls
  h8300: define __fls
  frv: define __fls
  cris: define __fls
  cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
  cpumask: zero extra bits in alloc_cpumask_var_node
  cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
  cpumask: convert mm/
  ...
2009-01-03 12:04:39 -08:00
Linus Torvalds b840d79631 Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
  x86: export vector_used_by_percpu_irq
  x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
  sched: nominate preferred wakeup cpu, fix
  x86: fix lguest used_vectors breakage, -v2
  x86: fix warning in arch/x86/kernel/io_apic.c
  sched: fix warning in kernel/sched.c
  sched: move test_sd_parent() to an SMP section of sched.h
  sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  sched: activate active load balancing in new idle cpus
  sched: bias task wakeups to preferred semi-idle packages
  sched: nominate preferred wakeup cpu
  sched: favour lower logical cpu number for sched_mc balance
  sched: framework for sched_mc/smt_power_savings=N
  sched: convert BALANCE_FOR_xx_POWER to inline functions
  x86: use possible_cpus=NUM to extend the possible cpus allowed
  x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
  x86: update io_apic.c to the new cpumask code
  x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
  x86: xen: use smp_call_function_many()
  x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
  ...

Fixed up trivial conflict in kernel/time/tick-sched.c manually
2009-01-02 11:44:09 -08:00
Ingo Molnar 609e5b71d0 kbuild: Remove gcc 4.1.0 quirk from init/main.c
Impact: cleanup

We now have a cleaner check for gcc 4.1.0/4.1.1 trouble in
include/linux/compiler-gcc4.h, so remove the 4.1.0 quirk from
init/main.c.

Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:09:27 -08:00
Rusty Russell e0c0ba7365 cpumask: Use find_last_bit()
Impact: cleanup

There's one obvious place to use it: to find the highest possible cpu.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-01 10:12:19 +10:30
Rusty Russell 915441b601 cpumask: Use accessors code in core
Impact: use new API

cpu_*_map are going away in favour of cpu_*_mask, but const pointers.
So we have accessors where we really do want to frob them.  Archs
will also need the (trivial) conversion before we can finally remove
cpu_*_map.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-01 10:12:15 +10:30
Linus Torvalds db200df0b3 Merge branch 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sparseirq: move __weak symbols into separate compilation unit
  sparseirq: work around __weak alias bug
  sparseirq: fix hang with !SPARSE_IRQ
  sparseirq: set lock_class for legacy irq when sparse_irq is selected
  sparseirq: work around compiler optimizing away __weak functions
  sparseirq: fix desc->lock init
  sparseirq: do not printk when migrating IRQ descriptors
  sparseirq: remove duplicated arch_early_irq_init()
  irq: simplify for_each_irq_desc() usage
  proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c
  irq: for_each_irq_desc() move to irqnr.h
  hrtimer: remove #include <linux/irq.h>
2008-12-31 09:00:59 -08:00
Rusty Russell 2ca1a61583 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	arch/x86/kernel/io_apic.c
2008-12-31 23:05:57 +10:30
Ingo Molnar a9de18eb76 Merge branch 'linus' into stackprotector
Conflicts:
	arch/x86/include/asm/pda.h
	kernel/fork.c
2008-12-31 08:31:57 +01:00
Ingo Molnar 818fa7f390 Merge branch 'tracing/kmemtrace' into tracing/kmemtrace2 2008-12-31 08:19:48 +01:00
Ingo Molnar 5fdf7e5975 Merge branch 'linus' into tracing/kmemtrace
Conflicts:
	mm/slub.c
2008-12-31 08:14:29 +01:00
Linus Torvalds 179475a3b4 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, sparseirq: clean up Kconfig entry
  x86: turn CONFIG_SPARSE_IRQ off by default
  sparseirq: fix numa_migrate_irq_desc dependency and comments
  sparseirq: add kernel-doc notation for new member in irq_desc, -v2
  locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP
  sparseirq, xen: make sure irq_desc is allocated for interrupts
  sparseirq: fix !SMP building, #2
  x86, sparseirq: move irq_desc according to smp_affinity, v7
  proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ
  sparse irqs: add irqnr.h to the user headers list
  sparse irqs: handle !GENIRQ platforms
  sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build
  sparseirq: fix Alpha build failure
  sparseirq: fix typo in !CONFIG_IO_APIC case
  x86, MSI: pass irq_cfg and irq_desc
  x86: MSI start irq numbering from nr_irqs_gsi
  x86: use NR_IRQS_LEGACY
  sparse irq_desc[] array: core kernel and x86 changes
  genirq: record IRQ_LEVEL in irq_desc[]
  irq.h: remove padding from irq_desc on 64bits
2008-12-30 16:20:19 -08:00
Linus Torvalds 5f34fe1cfc Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  stacktrace: provide save_stack_trace_tsk() weak alias
  rcu: provide RCU options on non-preempt architectures too
  printk: fix discarding message when recursion_bug
  futex: clean up futex_(un)lock_pi fault handling
  "Tree RCU": scalable classic RCU implementation
  futex: rename field in futex_q to clarify single waiter semantics
  x86/swiotlb: add default swiotlb_arch_range_needs_mapping
  x86/swiotlb: add default phys<->bus conversion
  x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
  x86: add swiotlb allocation functions
  swiotlb: consolidate swiotlb info message printing
  swiotlb: support bouncing of HighMem pages
  swiotlb: factor out copy to/from device
  swiotlb: add arch hook to force mapping
  swiotlb: allow architectures to override phys<->bus<->phys conversions
  swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
  rcu: fix rcutorture behavior during reboot
  resources: skip sanity check of busy resources
  swiotlb: move some definitions to header
  swiotlb: allow architectures to override swiotlb pool allocation
  ...

Fix up trivial conflicts in
  arch/x86/kernel/Makefile
  arch/x86/mm/init_32.c
  include/linux/hardirq.h
as per Ingo's suggestions.
2008-12-30 16:10:19 -08:00
Frederic Weisbecker 36994e58a4 tracing/kmemtrace: normalize the raw tracer event to the unified tracing API
Impact: new tracer plugin

This patch adapts kmemtrace raw events tracing to the unified tracing API.

To enable and use this tracer, just do the following:

 echo kmemtrace > /debugfs/tracing/current_tracer
 cat /debugfs/tracing/trace

You will have the following output:

 # tracer: kmemtrace
 #
 #
 # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
 # FREE   |      |     |       |              |   |            |        |
 # |

type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584

That was to stay backward compatible with the format output produced in
inux/tracepoint.h.

This is the default ouput, but note that I tried something else.

If you change an option:

echo kmem_minimalistic > /debugfs/trace_options

and then cat /debugfs/trace, you will have the following output:

 # tracer: kmemtrace
 #
 #
 # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
 # FREE   |      |     |       |              |   |            |        |
 # |

   -      C                            0xffff88007c088780          file_free_rcu
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
   +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   -      C                            0xffff88007cad6000          putname
   +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
   +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
   +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
   +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
   +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp

Yeah I shall confess kmem_minimalistic should be: kmem_alternative.

Whatever, I find it more readable but this a personal opinion of course.
We can drop it if you want.

On the ALLOC/FREE column, + means an allocation and - a free.

On the type column, you have K = kmalloc, C = cache, P = page

I would like the flags to be GFP_* strings but that would not be easy to not
break the column with strings....

About the node...it seems to always be -1. I don't know why but that shouldn't
be difficult to find.

I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
be more easy to find the tracer headers if they are all in their common
directory.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-30 09:36:13 +01:00
Rusty Russell 33edcf133b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-12-30 08:02:35 +10:30
Ingo Molnar 2ff9f9d962 Merge branch 'topic/kmemtrace' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 into tracing/kmemtrace 2008-12-29 15:16:24 +01:00
Eduard - Gabriel Munteanu b9ce08c010 kmemtrace: Core implementation.
kmemtrace provides tracing for slab allocator functions, such as kmalloc,
kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed
to the userspace application in order to analyse allocation hotspots,
internal fragmentation and so on, making it possible to see how well an
allocator performs, as well as debug and profile kernel code.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-12-29 15:34:01 +02:00
Yinghai Lu 43a256322a sparseirq: move __weak symbols into separate compilation unit
GCC has a bug with __weak alias functions: if the functions are in
the same compilation unit as their call site, GCC can decide to
inline them - and thus rob the linker of the opportunity to override
the weak alias with the real thing.

So move all the IRQ handling related __weak symbols to kernel/irq/chip.c.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 12:15:49 +01:00
Ingo Molnar e1df957670 Merge branch 'linus' into perfcounters/core
Conflicts:
	fs/exec.c
	include/linux/init_task.h

Simple context conflicts.
2008-12-29 09:45:15 +01:00
Linus Torvalds 96faec945f Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits)
  allow stripping of generated symbols under CONFIG_KALLSYMS_ALL
  kbuild: strip generated symbols from *.ko
  kbuild: simplify use of genksyms
  kernel-doc: check for extra kernel-doc notations
  kbuild: add headerdep used to detect inclusion cycles in header files
  kbuild: fix string equality testing in tags.sh
  kbuild: fix make tags/cscope
  kbuild: fix make incompatibility
  kbuild: remove TAR_IGNORE
  setlocalversion: add git-svn support
  setlocalversion: print correct subversion revision
  scripts: improve the decodecode script
  scripts/package: allow custom options to rpm
  genksyms: allow to ignore symbol checksum changes
  genksyms: track symbol checksum changes
  tags and cscope support really belongs in a shell script
  kconfig: fix options to check-lxdialog.sh
  kbuild: gen_init_cpio expands shell variables in file names
  remove bashisms from scripts/extract-ikconfig
  kbuild: teach mkmakfile to be silent
  ...
2008-12-28 15:13:48 -08:00
Linus Torvalds b0f4b285d7 Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
  sched, trace: update trace_sched_wakeup()
  tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
  Revert "x86: disable X86_PTRACE_BTS"
  ring-buffer: prevent false positive warning
  ring-buffer: fix dangling commit race
  ftrace: enable format arguments checking
  x86, bts: memory accounting
  x86, bts: add fork and exit handling
  ftrace: introduce tracing_reset_online_cpus() helper
  tracing: fix warnings in kernel/trace/trace_sched_switch.c
  tracing: fix warning in kernel/trace/trace.c
  tracing/ring-buffer: remove unused ring_buffer size
  trace: fix task state printout
  ftrace: add not to regex on filtering functions
  trace: better use of stack_trace_enabled for boot up code
  trace: add a way to enable or disable the stack tracer
  x86: entry_64 - introduce FTRACE_ frame macro v2
  tracing/ftrace: add the printk-msg-only option
  tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
  x86, bts: correctly report invalid bts records
  ...

Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
being already partly merged by the SH merge.
2008-12-28 12:21:10 -08:00
Yinghai Lu 13a0c3c269 sparseirq: work around compiler optimizing away __weak functions
Impact: fix panic on null pointer with sparseirq

Some GCC versions seem to inline the weak global function,
when that function is empty.

Work it around, by making the functions return a (dummy) integer.

Signed-off-by: Yinghai <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-27 13:24:00 +01:00
Ingo Molnar 12d79bafb7 rcu: provide RCU options on non-preempt architectures too
Impact: build fix

Some old architectures still do not use kernel/Kconfig.preempt, so the
moving of the RCU options there broke their build:

 In file included from /home/mingo/tip/include/linux/sem.h:81,
                 from /home/mingo/tip/include/linux/sched.h:69,
                 from /home/mingo/tip/arch/alpha/kernel/asm-offsets.c:9:
 /home/mingo/tip/include/linux/rcupdate.h:62:2: error: #error "Unknown RCU implementation specified to kernel configuration"

Move these options back to init/Kconfig, which every architecture
includes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-25 09:31:28 +01:00
Jan Beulich 9bb482476c allow stripping of generated symbols under CONFIG_KALLSYMS_ALL
Building upon parts of the module stripping patch, this patch
introduces similar stripping for vmlinux when CONFIG_KALLSYMS_ALL=y.
Using CONFIG_KALLSYMS_STRIP_GENERATED reduces the overhead of
CONFIG_KALLSYMS_ALL from 245k/310k to 65k/80k for the (i386/x86-64)
kernels I tested with.

The patch also does away with the need to special case the kallsyms-
internal symbols by making them available even in the first linking
stage.

While it is a generated file, the patch includes the changes to
scripts/genksyms/keywords.c_shipped, as I'm unsure what the procedure
here is.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-12-19 22:47:10 +01:00
Paul E. McKenney 64db4cfff9 "Tree RCU": scalable classic RCU implementation
This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs.  Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.

This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion.  This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.

Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.

Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

o	Fixes from remainder of line-by-line code walkthrough,
	including comment spelling, initialization, undesirable
	narrowing due to type conversion, removing redundant memory
	barriers, removing redundant local-variable initialization,
	and removing redundant local variables.

	I do not believe that any of these fixes address the CPU-hotplug
	issues that Andi Kleen was seeing, but please do give it a whirl
	in case the machine is smarter than I am.

	A writeup from the walkthrough may be found at the following
	URL, in case you are suffering from terminal insomnia or
	masochism:

	http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

o	Made rcutree tracing use seq_file, as suggested some time
	ago by Lai Jiangshan.

o	Added a .csv variant of the rcudata debugfs trace file, to allow
	people having thousands of CPUs to drop the data into
	a spreadsheet.	Tested with oocalc and gnumeric.  Updated
	documentation to suit.

Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

o	Fix a theoretical race between grace-period initialization and
	force_quiescent_state() that could occur if more than three
	jiffies were required to carry out the grace-period
	initialization.  Which it might, if you had enough CPUs.

o	Apply Ingo's printk-standardization patch.

o	Substitute local variables for repeated accesses to global
	variables.

o	Fix comment misspellings and redundant (but harmless) increments
	of ->n_rcu_pending (this latter after having explicitly added it).

o	Apply checkpatch fixes.

Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

o	Fixed a number of problems noted by Gautham Shenoy, including
	the cpu-stall-detection bug that he was having difficulty
	convincing me was real.  ;-)

o	Changed cpu-stall detection to wait for ten seconds rather than
	three in order to reduce false positive, as suggested by Ingo
	Molnar.

o	Produced a design document (http://lwn.net/Articles/305782/).
	The act of writing this document uncovered a number of both
	theoretical and "here and now" bugs as noted below.

o	Fix dynticks_nesting accounting confusion, simplify WARN_ON()
	condition, fix kerneldoc comments, and add memory barriers
	in dynticks interface functions.

o	Add more data to tracing.

o	Remove unused "rcu_barrier" field from rcu_data structure.

o	Count calls to rcu_pending() from scheduling-clock interrupt
	to use as a surrogate timebase should jiffies stop counting.

o	Fix a theoretical race between force_quiescent_state() and
	grace-period initialization.  Yes, initialization does have to
	go on for some jiffies for this race to occur, but given enough
	CPUs...

Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

o	Fix a number of checkpatch.pl complaints.

o	Apply review comments from Ingo Molnar and Lai Jiangshan
	on the stall-detection code.

o	Fix several bugs in !CONFIG_SMP builds.

o	Fix a misspelled config-parameter name so that RCU now announces
	at boot time if stall detection is configured.

o	Run tests on numerous combinations of configurations parameters,
	which after the fixes above, now build and run correctly.

Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

o	Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
	changeset some time ago, and finally got around to retesting
	this option).

o	Fix some tracing bugs in rcupreempt that caused incorrect
	totals to be printed.

o	I now test with a more brutal random-selection online/offline
	script (attached).  Probably more brutal than it needs to be
	on the people reading it as well, but so it goes.

o	A number of optimizations and usability improvements:

	o	Make rcu_pending() ignore the grace-period timeout when
		there is no grace period in progress.

	o	Make force_quiescent_state() avoid going for a global
		lock in the case where there is no grace period in
		progress.

	o	Rearrange struct fields to improve struct layout.

	o	Make call_rcu() initiate a grace period if RCU was
		idle, rather than waiting for the next scheduling
		clock interrupt.

	o	Invoke rcu_irq_enter() and rcu_irq_exit() only when
		idle, as suggested by Andi Kleen.  I still don't
		completely trust this change, and might back it out.

	o	Make CONFIG_RCU_TRACE be the single config variable
		manipulated for all forms of RCU, instead of the prior
		confusion.

	o	Document tracing files and formats for both rcupreempt
		and rcutree.

Updates from v4 for those missing v5 given its bad subject line:

o	Separated dynticks interface so that NMIs and irqs call separate
	functions, greatly simplifying it.  In particular, this code
	no longer requires a proof of correctness.  ;-)

o	Separated dynticks state out into its own per-CPU structure,
	avoiding the duplicated accounting.

o	The case where a dynticks-idle CPU runs an irq handler that
	invokes call_rcu() is now correctly handled, forcing that CPU
	out of dynticks-idle mode.

o	Review comments have been applied (thank you all!!!).
	For but one example, fixed the dynticks-ordering issue that
	Manfred pointed out, saving me much debugging.  ;-)

o	Adjusted rcuclassic and rcupreempt to handle dynticks changes.

Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel.  It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space.  That said,
I have already gratefully stolen quite a few of Manfred's ideas.

This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy.  Defaults to 32 on 32-bit machines and 64 on
64-bit machines.  If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy.  By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware.  Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems.  I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future.  (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)

In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors.  This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.

Some shortcomings:

o	More bugs will probably surface as a result of an ongoing
	line-by-line code inspection.

	Patches will be provided as required.

o	There are probably hangs, rcutorture failures, &c.  Seems
	quite stable on a 128-CPU machine, but that is kind of small
	compared to 4096 CPUs.  However, seems to do better than
	mainline.

	Patches will be provided as required.

o	The memory footprint of this version is several KB larger
	than rcuclassic.

	A separate UP-only rcutiny patch will be provided, which will
	reduce the memory footprint significantly, even compared
	to the old rcuclassic.  One such patch passes light testing,
	and has a memory footprint smaller even than rcuclassic.
	Initial reaction from various embedded guys was "it is not
	worth it", so am putting it aside.

Credits:

o	Manfred Spraul for ideas, review comments, and bugs spotted,
	as well as some good friendly competition.  ;-)

o	Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
	Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
	for reviews and comments.

o	Thomas Gleixner for much-needed help with some timer issues
	(see patches below).

o	Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
	Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
	Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
	alive despite my heavy abuse^Wtesting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 21:56:04 +01:00
Rusty Russell 968ea6d80e Merge ../linux-2.6-x86
Conflicts:

	arch/x86/kernel/io_apic.c
	kernel/sched.c
	kernel/sched_stats.h
2008-12-13 21:55:51 +10:30
Rusty Russell 98a79d6a50 cpumask: centralize cpu_online_map and cpu_possible_map
Impact: cleanup

Each SMP arch defines these themselves.  Move them to a central
location.

Twists:
1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
   CONFIG_INIT_ALL_POSSIBLE for this rather than break them.

2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
   Those archs simply have phys_cpu_present_map replaced everywhere.

3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
   so I just manipulate them both in sync.

4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
   declarations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Travis <travis@sgi.com>
Cc: ink@jurassic.park.msu.ru
Cc: rmk@arm.linux.org.uk
Cc: starvik@axis.com
Cc: tony.luck@intel.com
Cc: takata@linux-m32r.org
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: paulus@samba.org
Cc: schwidefsky@de.ibm.com
Cc: lethal@linux-sh.org
Cc: wli@holomorphy.com
Cc: davem@davemloft.net
Cc: jdike@addtoit.com
Cc: mingo@redhat.com
2008-12-13 21:19:41 +10:30
Ingo Molnar 8299608f14 Merge branches 'irq/sparseirq', 'x86/quirks' and 'x86/reboot' into cpus4096
We merge the irq/sparseirq, x86/quirks and x86/reboot trees into the
cpus4096 tree because the io-apic changes in the sparseirq change
conflict with the cpumask changes in the cpumask tree, and we
want to resolve those.
2008-12-12 13:49:24 +01:00
Ingo Molnar 4c59e4676d perfcounters: select ANON_INODES
The perfcounters subsystem depends on CONFIG_ANON_INODES facilities,
so make sure it's selected.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 19:38:33 +01:00
Thomas Gleixner 0793a61d4d performance counters: core code
Implement the core kernel bits of Performance Counters subsystem.

The Linux Performance Counter subsystem provides an abstraction of
performance counter hardware capabilities. It provides per task and per
CPU counters, and it provides event capabilities on top of those.

Performance counters are accessed via special file descriptors.
There's one file descriptor per virtual counter used.

The special file descriptor is opened via the perf_counter_open()
system call:

 int
 perf_counter_open(u32 hw_event_type,
                   u32 hw_event_period,
                   u32 record_type,
                   pid_t pid,
                   int cpu);

The syscall returns the new fd. The fd can be used via the normal
VFS system calls: read() can be used to read the counter, fcntl()
can be used to set the blocking mode, etc.

Multiple counters can be kept open at a time, and the counters
can be poll()ed.

See more details in Documentation/perf-counters.txt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 15:47:03 +01:00
Yinghai Lu 0b8f1efad3 sparse irq_desc[] array: core kernel and x86 changes
Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:51 +01:00
James Morris ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Will Newton 1d926f2756 init/main.c: use ktime accessor function in initcall_debug code
Impact: fix initcall debug output on non-scalar ktime platforms (32-bit embedded)

The initcall_debug code access the tv64 member of ktime.  This won't work
correctly for large deltas on platforms that don't use the scalar ktime
implementation.

Signed-off-by: Will Newton <will.newton@gmail.com>
Acked-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:10:15 +01:00
Ingo Molnar 9676e73a9e Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core
Conflicts:
	kernel/trace/ftrace.c

[ We conflicted here because we backported a few fixes to
  tracing/urgent - which has different internal APIs. ]
2008-11-19 10:04:25 +01:00
Linus Torvalds 8c60bfb066 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpuset: fix regression when failed to generate sched domains
  sched, signals: fix the racy usage of ->signal in account_group_xxx/run_posix_cpu_timers
  sched: fix kernel warning on /proc/sched_debug access
  sched: correct sched-rt-group.txt pathname in init/Kconfig
2008-11-18 08:06:21 -08:00
Mathieu Desnoyers c1df1bd2c4 markers: auto enable tracepoints (new API : trace_mark_tp())
Impact: new API

Add a new API trace_mark_tp(), which declares a marker within a
tracepoint probe. When the marker is activated, the tracepoint is
automatically enabled.

No branch test is used at the marker site, because it would be a
duplicate of the branch already present in the tracepoint.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 09:01:29 +01:00
James Morris 2b82892565 Merge branch 'master' into next
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 11:29:12 +11:00
David Howells d84f4f992c CRED: Inaugurate COW credentials
Inaugurate copy-on-write credentials management.  This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.

A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().

With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:

	struct cred *new = prepare_creds();
	int ret = blah(new);
	if (ret < 0) {
		abort_creds(new);
		return ret;
	}
	return commit_creds(new);

There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.

To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const.  The purpose of this is compile-time
discouragement of altering credentials through those pointers.  Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:

  (1) Its reference count may incremented and decremented.

  (2) The keyrings to which it points may be modified, but not replaced.

The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

 (1) execve().

     This now prepares and commits credentials in various places in the
     security code rather than altering the current creds directly.

 (2) Temporary credential overrides.

     do_coredump() and sys_faccessat() now prepare their own credentials and
     temporarily override the ones currently on the acting thread, whilst
     preventing interference from other threads by holding cred_replace_mutex
     on the thread being dumped.

     This will be replaced in a future patch by something that hands down the
     credentials directly to the functions being called, rather than altering
     the task's objective credentials.

 (3) LSM interface.

     A number of functions have been changed, added or removed:

     (*) security_capset_check(), ->capset_check()
     (*) security_capset_set(), ->capset_set()

     	 Removed in favour of security_capset().

     (*) security_capset(), ->capset()

     	 New.  This is passed a pointer to the new creds, a pointer to the old
     	 creds and the proposed capability sets.  It should fill in the new
     	 creds or return an error.  All pointers, barring the pointer to the
     	 new creds, are now const.

     (*) security_bprm_apply_creds(), ->bprm_apply_creds()

     	 Changed; now returns a value, which will cause the process to be
     	 killed if it's an error.

     (*) security_task_alloc(), ->task_alloc_security()

     	 Removed in favour of security_prepare_creds().

     (*) security_cred_free(), ->cred_free()

     	 New.  Free security data attached to cred->security.

     (*) security_prepare_creds(), ->cred_prepare()

     	 New. Duplicate any security data attached to cred->security.

     (*) security_commit_creds(), ->cred_commit()

     	 New. Apply any security effects for the upcoming installation of new
     	 security by commit_creds().

     (*) security_task_post_setuid(), ->task_post_setuid()

     	 Removed in favour of security_task_fix_setuid().

     (*) security_task_fix_setuid(), ->task_fix_setuid()

     	 Fix up the proposed new credentials for setuid().  This is used by
     	 cap_set_fix_setuid() to implicitly adjust capabilities in line with
     	 setuid() changes.  Changes are made to the new credentials, rather
     	 than the task itself as in security_task_post_setuid().

     (*) security_task_reparent_to_init(), ->task_reparent_to_init()

     	 Removed.  Instead the task being reparented to init is referred
     	 directly to init's credentials.

	 NOTE!  This results in the loss of some state: SELinux's osid no
	 longer records the sid of the thread that forked it.

     (*) security_key_alloc(), ->key_alloc()
     (*) security_key_permission(), ->key_permission()

     	 Changed.  These now take cred pointers rather than task pointers to
     	 refer to the security context.

 (4) sys_capset().

     This has been simplified and uses less locking.  The LSM functions it
     calls have been merged.

 (5) reparent_to_kthreadd().

     This gives the current thread the same credentials as init by simply using
     commit_thread() to point that way.

 (6) __sigqueue_alloc() and switch_uid()

     __sigqueue_alloc() can't stop the target task from changing its creds
     beneath it, so this function gets a reference to the currently applicable
     user_struct which it then passes into the sigqueue struct it returns if
     successful.

     switch_uid() is now called from commit_creds(), and possibly should be
     folded into that.  commit_creds() should take care of protecting
     __sigqueue_alloc().

 (7) [sg]et[ug]id() and co and [sg]et_current_groups.

     The set functions now all use prepare_creds(), commit_creds() and
     abort_creds() to build and check a new set of credentials before applying
     it.

     security_task_set[ug]id() is called inside the prepared section.  This
     guarantees that nothing else will affect the creds until we've finished.

     The calling of set_dumpable() has been moved into commit_creds().

     Much of the functionality of set_user() has been moved into
     commit_creds().

     The get functions all simply access the data directly.

 (8) security_task_prctl() and cap_task_prctl().

     security_task_prctl() has been modified to return -ENOSYS if it doesn't
     want to handle a function, or otherwise return the return value directly
     rather than through an argument.

     Additionally, cap_task_prctl() now prepares a new set of credentials, even
     if it doesn't end up using it.

 (9) Keyrings.

     A number of changes have been made to the keyrings code:

     (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
     	 all been dropped and built in to the credentials functions directly.
     	 They may want separating out again later.

     (b) key_alloc() and search_process_keyrings() now take a cred pointer
     	 rather than a task pointer to specify the security context.

     (c) copy_creds() gives a new thread within the same thread group a new
     	 thread keyring if its parent had one, otherwise it discards the thread
     	 keyring.

     (d) The authorisation key now points directly to the credentials to extend
     	 the search into rather pointing to the task that carries them.

     (e) Installing thread, process or session keyrings causes a new set of
     	 credentials to be created, even though it's not strictly necessary for
     	 process or session keyrings (they're shared).

(10) Usermode helper.

     The usermode helper code now carries a cred struct pointer in its
     subprocess_info struct instead of a new session keyring pointer.  This set
     of credentials is derived from init_cred and installed on the new process
     after it has been cloned.

     call_usermodehelper_setup() allocates the new credentials and
     call_usermodehelper_freeinfo() discards them if they haven't been used.  A
     special cred function (prepare_usermodeinfo_creds()) is provided
     specifically for call_usermodehelper_setup() to call.

     call_usermodehelper_setkeys() adjusts the credentials to sport the
     supplied keyring as the new session keyring.

(11) SELinux.

     SELinux has a number of changes, in addition to those to support the LSM
     interface changes mentioned above:

     (a) selinux_setprocattr() no longer does its check for whether the
     	 current ptracer can access processes with the new SID inside the lock
     	 that covers getting the ptracer's SID.  Whilst this lock ensures that
     	 the check is done with the ptracer pinned, the result is only valid
     	 until the lock is released, so there's no point doing it inside the
     	 lock.

(12) is_single_threaded().

     This function has been extracted from selinux_setprocattr() and put into
     a file of its own in the lib/ directory as join_session_keyring() now
     wants to use it too.

     The code in SELinux just checked to see whether a task shared mm_structs
     with other tasks (CLONE_VM), but that isn't good enough.  We really want
     to know if they're part of the same thread group (CLONE_THREAD).

(13) nfsd.

     The NFS server daemon now has to use the COW credentials to set the
     credentials it is going to use.  It really needs to pass the credentials
     down to the functions it calls, but it can't do that until other patches
     in this series have been applied.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:23 +11:00
Simon Arlott 02f5621042 Kconfig: SLUB is the default slab allocator
In 2007, a0acd82080 changed the default
slab allocator to SLUB, but the SLAB help text still says SLAB is the
default. This change fixes that.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-11-13 20:49:01 +02:00
Adrian Knoth 2fe401e386 sched: correct sched-rt-group.txt pathname in init/Kconfig
init/Kconfig directs the user to Documentation/sched-rt-group.txt, but
the file is actually in Documentation/scheduler/sched-rt-group.txt.

This patch corrects the pathname mentioned in init/Kconfig.

Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-13 09:32:29 +01:00
Frederic Weisbecker 7423907283 tracing/fastboot: Use the ring-buffer timestamp for initcall entries
Impact: Split the boot tracer entries in two parts: call and return

Now that we are using the sched tracer from the boot tracer, we want
to use the same timestamp than the ring-buffer to have consistent time
captures between sched events and initcall events.

So we get rid of the old time capture by the boot tracer and split the
initcall events in two parts: call and return. This way we have the
ring buffer timestamp of both.

An example trace:

[   27.904149584] calling  net_ns_init+0x0/0x1c0 @ 1
[   27.904429624] initcall net_ns_init+0x0/0x1c0 returned 0 after 0 msecs
[   27.904575926] calling  reboot_init+0x0/0x20 @ 1
[   27.904655399] initcall reboot_init+0x0/0x20 returned 0 after 0 msecs
[   27.904800228] calling  sysctl_init+0x0/0x30 @ 1
[   27.905142914] initcall sysctl_init+0x0/0x30 returned 0 after 0 msecs
[   27.905287211] calling  ksysfs_init+0x0/0xb0 @ 1
 ##### CPU 0 buffer started ####
            init-1     [000]    27.905395:      1:120:R   + [001]    11:115:S
 ##### CPU 1 buffer started ####
          <idle>-0     [001]    27.905425:      0:140:R ==> [001]    11:115:R
            init-1     [000]    27.905426:      1:120:D ==> [000]     0:140:R
          <idle>-0     [000]    27.905431:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.905451:      0:140:R ==> [000]     4:115:R
     ksoftirqd/0-4     [000]    27.905456:      4:115:S ==> [000]     0:140:R
           udevd-11    [001]    27.905458:     11:115:R   + [001]    14:115:R
          <idle>-0     [000]    27.905459:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.905462:      0:140:R ==> [000]     4:115:R
           udevd-11    [001]    27.905462:     11:115:R ==> [001]    14:115:R
     ksoftirqd/0-4     [000]    27.905467:      4:115:S ==> [000]     0:140:R
          <idle>-0     [000]    27.905470:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.905473:      0:140:R ==> [000]     4:115:R
     ksoftirqd/0-4     [000]    27.905476:      4:115:S ==> [000]     0:140:R
          <idle>-0     [000]    27.905479:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.905482:      0:140:R ==> [000]     4:115:R
     ksoftirqd/0-4     [000]    27.905486:      4:115:S ==> [000]     0:140:R
           udevd-14    [001]    27.905499:     14:120:X ==> [001]    11:115:R
           udevd-11    [001]    27.905506:     11:115:R   + [000]     1:120:D
          <idle>-0     [000]    27.905515:      0:140:R ==> [000]     1:120:R
           udevd-11    [001]    27.905517:     11:115:S ==> [001]     0:140:R
[   27.905557107] initcall ksysfs_init+0x0/0xb0 returned 0 after 3906 msecs
[   27.905705736] calling  init_jiffies_clocksource+0x0/0x10 @ 1
[   27.905779239] initcall init_jiffies_clocksource+0x0/0x10 returned 0 after 0 msecs
[   27.906769814] calling  pm_init+0x0/0x30 @ 1
[   27.906853627] initcall pm_init+0x0/0x30 returned 0 after 0 msecs
[   27.906997803] calling  pm_disk_init+0x0/0x20 @ 1
[   27.907076946] initcall pm_disk_init+0x0/0x20 returned 0 after 0 msecs
[   27.907222556] calling  swsusp_header_init+0x0/0x30 @ 1
[   27.907294325] initcall swsusp_header_init+0x0/0x30 returned 0 after 0 msecs
[   27.907439620] calling  stop_machine_init+0x0/0x50 @ 1
            init-1     [000]    27.907485:      1:120:R   + [000]     2:115:S
            init-1     [000]    27.907490:      1:120:D ==> [000]     2:115:R
        kthreadd-2     [000]    27.907507:      2:115:R   + [001]    15:115:R
          <idle>-0     [001]    27.907517:      0:140:R ==> [001]    15:115:R
        kthreadd-2     [000]    27.907517:      2:115:D ==> [000]     0:140:R
          <idle>-0     [000]    27.907521:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.907524:      0:140:R ==> [000]     4:115:R
           udevd-15    [001]    27.907527:     15:115:D   + [000]     2:115:D
     ksoftirqd/0-4     [000]    27.907537:      4:115:S ==> [000]     2:115:R
           udevd-15    [001]    27.907537:     15:115:D ==> [001]     0:140:R
        kthreadd-2     [000]    27.907546:      2:115:R   + [000]     1:120:D
        kthreadd-2     [000]    27.907550:      2:115:S ==> [000]     1:120:R
            init-1     [000]    27.907584:      1:120:R   + [000]    15:  0:D
            init-1     [000]    27.907589:      1:120:R   + [000]     2:115:S
            init-1     [000]    27.907593:      1:120:D ==> [000]    15:  0:R
           udevd-15    [000]    27.907601:     15:  0:S ==> [000]     2:115:R
 ##### CPU 0 buffer started ####
        kthreadd-2     [000]    27.907616:      2:115:R   + [001]    16:115:R
 ##### CPU 1 buffer started ####
          <idle>-0     [001]    27.907620:      0:140:R ==> [001]    16:115:R
        kthreadd-2     [000]    27.907621:      2:115:D ==> [000]     0:140:R
           udevd-16    [001]    27.907625:     16:115:D   + [000]     2:115:D
          <idle>-0     [000]    27.907628:      0:140:R   + [000]     4:115:S
           udevd-16    [001]    27.907629:     16:115:D ==> [001]     0:140:R
          <idle>-0     [000]    27.907631:      0:140:R ==> [000]     4:115:R
     ksoftirqd/0-4     [000]    27.907636:      4:115:S ==> [000]     2:115:R
        kthreadd-2     [000]    27.907644:      2:115:R   + [000]     1:120:D
        kthreadd-2     [000]    27.907647:      2:115:S ==> [000]     1:120:R
            init-1     [000]    27.907657:      1:120:R   + [001]    16:  0:D
          <idle>-0     [001]    27.907666:      0:140:R ==> [001]    16:  0:R
[   27.907703862] initcall stop_machine_init+0x0/0x50 returned 0 after 0 msecs
[   27.907850704] calling  filelock_init+0x0/0x30 @ 1
[   27.907926573] initcall filelock_init+0x0/0x30 returned 0 after 0 msecs
[   27.908071327] calling  init_script_binfmt+0x0/0x10 @ 1
[   27.908165195] initcall init_script_binfmt+0x0/0x10 returned 0 after 0 msecs
[   27.908309461] calling  init_elf_binfmt+0x0/0x10 @ 1

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 10:17:19 +01:00
Frederic Weisbecker 3f5ec13696 tracing/fastboot: move boot tracer structs and funcs into their own header.
Impact: Cleanups on the boot tracer and ftrace

This patch bring some cleanups about the boot tracer headers. The
functions and structures of this tracer have nothing related to ftrace
and should have so their own header file.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 10:17:18 +01:00
Frederic Weisbecker 71566a0d16 tracing/fastboot: Enable boot tracing only during initcalls
Impact: modify boot tracer

We used to disable the initcall tracing at a specified time (IE: end
of builtin initcalls). But we don't need it anymore. It will be
stopped when initcalls are finished.

However we want two things:

_Start this tracing only after pre-smp initcalls are finished.

_Since we are planning to trace sched_switches at the same time, we
want to enable them only during the initcall execution.

For this purpose, this patch introduce two functions to enable/disable
the sched_switch tracing during boot.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 17:14:02 +01:00
Huang Weiyi d3f15800d5 init/do_mounts_md.c: remove duplicated #include
Removed duplicated #include <linux/delay.h> in init/do_mounts_md.c.

The same compile error ("error: implicit declaration of function
'msleep'") got fixed twice:

 - f8b77d3939 ("init/do_mounts_md.c:
   msleep compile fix")

 - 73b4a24f5f ("init/do_mounts_md.c must
   #include <linux/delay.h>")

by people adding the <linux/delay.h> include in two slightly different
places.  Andrew's quilt scripts happily ignore the fuzz, and will
re-apply the patch even though they had conflicts.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-01 10:35:51 -07:00
KAMEZAWA Hiroyuki 84ad6d7000 memcg: update menuconfig help text
page_cgroup is now allocated at boot and memmap doesn't includes pointer
for page_cgroup.  Fix the menu help text.

Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:46 -07:00
Alexey Dobriyan f8b77d3939 init/do_mounts_md.c: msleep compile fix
init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:46 -07:00
Linus Torvalds 4403b406d4 Revert "Call init_workqueues before pre smp initcalls."
This reverts commit a802dd0eb5 by moving
the call to init_workqueues() back where it belongs - after SMP has been
initialized.

It also moves stop_machine_init() - which needs workqueues - to a later
phase using a core_initcall() instead of early_initcall().  That should
satisfy all ordering requirements, and was apparently the reason why
init_workqueues() was moved to be too early.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-25 19:53:38 -07:00
Linus Torvalds 5ed487bc2c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits)
  [PATCH] fs: add a sanity check in d_free
  [PATCH] i_version: remount support
  [patch] vfs: make security_inode_setattr() calling consistent
  [patch 1/3] FS_MBCACHE: don't needlessly make it built-in
  [PATCH] move executable checking into ->permission()
  [PATCH] fs/dcache.c: update comment of d_validate()
  [RFC PATCH] touch_mnt_namespace when the mount flags change
  [PATCH] reiserfs: add missing llseek method
  [PATCH] fix ->llseek for more directories
  [PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
  [PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
  [PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
  [PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
  [PATCH vfs-2.6 2/6] vfs: add d_ancestor()
  [PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
  [PATCH] get rid of on-stack dentry in udf
  [PATCH 2/2] anondev: switch to IDA
  [PATCH 1/2] anondev: init IDR statically
  [JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
  [PATCH] Optimise NFS readdir hack slightly.
  ...
2008-10-23 10:22:40 -07:00
Linus Torvalds a3415dc34f Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (32 commits)
  PCI hotplug: fix logic in Compaq hotplug controller bus speed setup
  PCI: don't export linux/io.h from pci.h
  PCI: PCI_QUIRKS depends on PCI
  PCI hotplug: pciehp: poll data link layer link active
  PCI hotplug: pciehp: fix possible memory leak in pcie_init
  PCI: Workaround invalid P2P bridge bus numbers
  PCI Hotplug: fakephp: add duplicate slot name debugging
  PCI: Hotplug core: remove 'name'
  PCI: shcphp: remove 'name' parameter
  PCI: SGI Hotplug: stop managing bss_hotplug_slot->name
  PCI: rpaphp: kmalloc/kfree slot->name directly
  PCI: pciehp: remove 'name' parameter
  PCI: ibmphp: stop managing hotplug_slot->name
  PCI: fakephp: remove 'name' parameter
  PCI, PCI Hotplug: introduce slot_name helpers
  PCI: cpqphp: stop managing hotplug_slot->name
  PCI: cpci_hotplug: stop managing hotplug_slot->name
  PCI: acpiphp: remove 'name' parameter
  PCI: prevent duplicate slot names
  PCI Hotplug: serialize pci_hp_register and pci_hp_deregister
  ...
2008-10-23 10:16:53 -07:00
Linus Torvalds a534487606 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  stop_machine: fix error code handling on multiple cpus
  stop_machine: use workqueues instead of kernel threads
  workqueue: introduce create_rt_workqueue
  Call init_workqueues before pre smp initcalls.
  Make panic= and panic_on_oops into core_params
  Make initcall_debug a core_param
  core_param() for genuinely core kernel parameters
  param: Fix duplicate module prefixes
  module: check kernel param length at compile time, not runtime
  Remove stop_machine during module load v2
  module: simplify load_module.
2008-10-23 10:00:14 -07:00
KAMEZAWA Hiroyuki 94b6da5ab8 memcg: fix page_cgroup allocation
page_cgroup_init() is called from mem_cgroup_init(). But at this
point, we cannot call alloc_bootmem().
(and this caused panic at boot.)

This patch moves page_cgroup_init() to init/main.c.

Time table is following:
==
  parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
  ....
  cgroup_init_early()  # "early" init of cgroup.
  ....
  setup_arch()         # memmap is allocated.
  ...
  page_cgroup_init();
  mem_init();   # we cannot call alloc_bootmem after this.
  ....
  cgroup_init() # mem_cgroup is initialized.
==

Before page_cgroup_init(), mem_map must be initialized. So,
I added page_cgroup_init() to init/main.c directly.

(*) maybe this is not very clean but
    - cgroup_init_early() is too early
    - in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
    use of vmalloc area in x86-32 is important and we should avoid very large
    vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
    directly to init/main.c

[akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
[akpm@linux-foundation.org: fix build]
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23 08:55:02 -07:00
Alexey Dobriyan 6de24f0ed0 [PATCH 1/2] anondev: init IDR statically
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 05:13:13 -04:00
Geert Uytterhoeven 61cfc7e442 PCI: PCI_QUIRKS depends on PCI
commit 3d13731024 ("PCI: allow quirks to be
compiled out") introduced CONFIG_PCI_QUIRKS, which now shows up in each
and every .config.  Fix this by making it depend on PCI.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-10-22 16:42:46 -07:00
Heiko Carstens a802dd0eb5 Call init_workqueues before pre smp initcalls.
This allows to create workqueues from within the context of
a pre smp initcall (aka early_initcall).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-10-22 10:00:25 +11:00
Rusty Russell d0ea3d7d28 Make initcall_debug a core_param
This is the one I really wanted: now it effects module loading, it
makes sense to be able to flip it after boot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
2008-10-22 10:00:24 +11:00
Linus Torvalds a0bfb673dc Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (41 commits)
  PCI: fix pci_ioremap_bar() on s390
  PCI: fix AER capability check
  PCI: use pci_find_ext_capability everywhere
  PCI: remove #ifdef DEBUG around dev_dbg call
  PCI hotplug: fix get_##name return value problem
  PCI: document the pcie_aspm kernel parameter
  PCI: introduce an pci_ioremap(pdev, barnr) function
  powerpc/PCI: Add legacy PCI access via sysfs
  PCI: Add ability to mmap legacy_io on some platforms
  PCI: probing debug message uniformization
  PCI: support PCIe ARI capability
  PCI: centralize the capabilities code in probe.c
  PCI: centralize the capabilities code in pci-sysfs.c
  PCI: fix 64-vbit prefetchable memory resource BARs
  PCI: replace cfg space size (256/4096) by macros.
  PCI: use resource_size() everywhere.
  PCI: use same arg names in PCI_VDEVICE comment
  PCI hotplug: rpaphp: make debug var unique
  PCI: use %pF instead of print_fn_descriptor_symbol() in quirks.c
  PCI: fix hotplug get_##name return value problem
  ...
2008-10-20 13:40:47 -07:00
Linus Torvalds 92b29b86fe Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits)
  tracing/fastboot: improve help text
  tracing/stacktrace: improve help text
  tracing/fastboot: fix initcalls disposition in bootgraph.pl
  tracing/fastboot: fix bootgraph.pl initcall name regexp
  tracing/fastboot: fix issues and improve output of bootgraph.pl
  tracepoints: synchronize unregister static inline
  tracepoints: tracepoint_synchronize_unregister()
  ftrace: make ftrace_test_p6nop disassembler-friendly
  markers: fix synchronize marker unregister static inline
  tracing/fastboot: add better resolution to initcall debug/tracing
  trace: add build-time check to avoid overrunning hex buffer
  ftrace: fix hex output mode of ftrace
  tracing/fastboot: fix initcalls disposition in bootgraph.pl
  tracing/fastboot: fix printk format typo in boot tracer
  ftrace: return an error when setting a nonexistent tracer
  ftrace: make some tracers reentrant
  ring-buffer: make reentrant
  ring-buffer: move page indexes into page headers
  tracing/fastboot: only trace non-module initcalls
  ftrace: move pc counter in irqtrace
  ...

Manually fix conflicts:
 - init/main.c: initcall tracing
 - kernel/module.c: verbose level vs tracepoints
 - scripts/bootgraph.pl: fallout from cherry-picking commits.
2008-10-20 13:35:07 -07:00
Thomas Petazzoni 3d13731024 PCI: allow quirks to be compiled out
This patch adds the CONFIG_PCI_QUIRKS option which allows to remove all
the PCI quirks, which are not necessarily used on embedded systems when
PCI is working properly. As this is a size-reduction option, it depends
on CONFIG_EMBEDDED. It allows to save almost 12 kilobytes of kernel
code:

   text	   data	    bss	    dec	    hex	filename
1287806	 123596	 212992	1624394	 18c94a	vmlinux.old
1275854	 123596	 212992	1612442	 189a9a	vmlinux
 -11952       0       0  -11952   -2EB0 +/-

This patch has originally been written by Zwane Mwaikambo
<zwane@arm.linux.org.uk> and is part of the Linux Tiny project.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-10-20 10:53:40 -07:00
Matt Helsley dc52ddc0e6 container freezer: implement freezer cgroup subsystem
This patch implements a new freezer subsystem in the control groups
framework.  It provides a way to stop and resume execution of all tasks in
a cgroup by writing in the cgroup filesystem.

The freezer subsystem in the container filesystem defines a file named
freezer.state.  Writing "FROZEN" to the state file will freeze all tasks
in the cgroup.  Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup.  Reading will return the current state.

* Examples of usage :

   # mkdir /containers/freezer
   # mount -t cgroup -ofreezer freezer  /containers
   # mkdir /containers/0
   # echo $some_pid > /containers/0/tasks

to get status of the freezer subsystem :

   # cat /containers/0/freezer.state
   RUNNING

to freeze all tasks in the container :

   # echo FROZEN > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   FREEZING
   # cat /containers/0/freezer.state
   FROZEN

to unfreeze all tasks in the container :

   # echo RUNNING > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   RUNNING

This is the basic mechanism which should do the right thing for user space
task in a simple scenario.

It's important to note that freezing can be incomplete.  In that case we
return EBUSY.  This means that some tasks in the cgroup are busy doing
something that prevents us from completely freezing the cgroup at this
time.  After EBUSY, the cgroup will remain partially frozen -- reflected
by freezer.state reporting "FREEZING" when read.  The state will remain
"FREEZING" until one of these things happens:

	1) Userspace cancels the freezing operation by writing "RUNNING" to
		the freezer.state file
	2) Userspace retries the freezing operation by writing "FROZEN" to
		the freezer.state file (writing "FREEZING" is not legal
		and returns EIO)
	3) The tasks that blocked the cgroup from entering the "FROZEN"
		state disappear from the cgroup's set of tasks.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: export thaw_process]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:34 -07:00
Nick Piggin db64fe0225 mm: rewrite vmap layer
Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
provide a fast, scalable percpu frontend for small vmaps (requires a
slightly different API, though).

The biggest problem with vmap is actually vunmap.  Presently this requires
a global kernel TLB flush, which on most architectures is a broadcast IPI
to all CPUs to flush the cache.  This is all done under a global lock.  As
the number of CPUs increases, so will the number of vunmaps a scaled
workload will want to perform, and so will the cost of a global TLB flush.
 This gives terrible quadratic scalability characteristics.

Another problem is that the entire vmap subsystem works under a single
lock.  It is a rwlock, but it is actually taken for write in all the fast
paths, and the read locking would likely never be run concurrently anyway,
so it's just pointless.

This is a rewrite of vmap subsystem to solve those problems.  The existing
vmalloc API is implemented on top of the rewritten subsystem.

The TLB flushing problem is solved by using lazy TLB unmapping.  vmap
addresses do not have to be flushed immediately when they are vunmapped,
because the kernel will not reuse them again (would be a use-after-free)
until they are reallocated.  So the addresses aren't allocated again until
a subsequent TLB flush.  A single TLB flush then can flush multiple
vunmaps from each CPU.

XEN and PAT and such do not like deferred TLB flushing because they can't
always handle multiple aliasing virtual addresses to a physical address.
They now call vm_unmap_aliases() in order to flush any deferred mappings.
That call is very expensive (well, actually not a lot more expensive than
a single vunmap under the old scheme), however it should be OK if not
called too often.

The virtual memory extent information is stored in an rbtree rather than a
linked list to improve the algorithmic scalability.

There is a per-CPU allocator for small vmaps, which amortizes or avoids
global locking.

To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
must be used in place of vmap and vunmap.  Vmalloc does not use these
interfaces at the moment, so it will not be quite so scalable (although it
will use lazy TLB flushing).

As a quick test of performance, I ran a test that loops in the kernel,
linearly mapping then touching then unmapping 4 pages.  Different numbers
of tests were run in parallel on an 4 core, 2 socket opteron.  Results are
in nanoseconds per map+touch+unmap.

threads           vanilla         vmap rewrite
1                 14700           2900
2                 33600           3000
4                 49500           2800
8                 70631           2900

So with a 8 cores, the rewritten version is already 25x faster.

In a slightly more realistic test (although with an older and less
scalable version of the patch), I ripped the not-very-good vunmap batching
code out of XFS, and implemented the large buffer mapping with vm_map_ram
and vm_unmap_ram...  along with a couple of other tricks, I was able to
speed up a large directory workload by 20x on a 64 CPU system.  I believe
vmap/vunmap is actually sped up a lot more than 20x on such a system, but
I'm running into other locks now.  vmap is pretty well blown off the
profiles.

Before:
1352059 total                                      0.1401
798784 _write_lock                              8320.6667 <- vmlist_lock
529313 default_idle                             1181.5022
 15242 smp_call_function                         15.8771  <- vmap tlb flushing
  2472 __get_vm_area_node                         1.9312  <- vmap
  1762 remove_vm_area                             4.5885  <- vunmap
   316 map_vm_area                                0.2297  <- vmap
   312 kfree                                      0.1950
   300 _spin_lock                                 3.1250
   252 sn_send_IPI_phys                           0.4375  <- tlb flushing
   238 vmap                                       0.8264  <- vmap
   216 find_lock_page                             0.5192
   196 find_next_bit                              0.3603
   136 sn2_send_IPI                               0.2024
   130 pio_phys_write_mmr                         2.0312
   118 unmap_kernel_range                         0.1229

After:
 78406 total                                      0.0081
 40053 default_idle                              89.4040
 33576 ia64_spinlock_contention                 349.7500
  1650 _spin_lock                                17.1875
   319 __reg_op                                   0.5538
   281 _atomic_dec_and_lock                       1.0977
   153 mutex_unlock                               1.5938
   123 iget_locked                                0.1671
   117 xfs_dir_lookup                             0.1662
   117 dput                                       0.1406
   114 xfs_iget_core                              0.0268
    92 xfs_da_hashname                            0.1917
    75 d_alloc                                    0.0670
    68 vmap_page_range                            0.0462 <- vmap
    58 kmem_cache_alloc                           0.0604
    57 memset                                     0.0540
    52 rb_next                                    0.1625
    50 __copy_user                                0.0208
    49 bitmap_find_free_region                    0.2188 <- vmap
    46 ia64_sn_udelay                             0.1106
    45 find_inode_fast                            0.1406
    42 memcmp                                     0.2188
    42 finish_task_switch                         0.1094
    42 __d_lookup                                 0.0410
    40 radix_tree_lookup_slot                     0.1250
    37 _spin_unlock_irqrestore                    0.3854
    36 xfs_bmapi                                  0.0050
    36 kmem_cache_free                            0.0256
    35 xfs_vn_getattr                             0.0322
    34 radix_tree_lookup                          0.1062
    33 __link_path_walk                           0.0035
    31 xfs_da_do_buf                              0.0091
    30 _xfs_buf_find                              0.0204
    28 find_get_page                              0.0875
    27 xfs_iread                                  0.0241
    27 __strncpy_from_user                        0.2812
    26 _xfs_buf_initialize                        0.0406
    24 _xfs_buf_lookup_pages                      0.0179
    24 vunmap_page_range                          0.0250 <- vunmap
    23 find_lock_page                             0.0799
    22 vm_map_ram                                 0.0087 <- vmap
    20 kfree                                      0.0125
    19 put_page                                   0.0330
    18 __kmalloc                                  0.0176
    17 xfs_da_node_lookup_int                     0.0086
    17 _read_lock                                 0.0885
    17 page_waitqueue                             0.0664

vmap has gone from being the top 5 on the profiles and flushing the crap
out of all TLBs, to using less than 1% of kernel time.

[akpm@linux-foundation.org: cleanups, section fix]
[akpm@linux-foundation.org: fix build on alpha]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:32 -07:00
Adrian Bunk 73b4a24f5f init/do_mounts_md.c must #include <linux/delay.h>
This patch fixes the following compile error caused by commit
589f800bb1 ("fastboot: make the raid
autodetect code wait for all devices to init"):

    CC      init/do_mounts_md.o
  init/do_mounts_md.c: In function 'autodetect_raid':
  init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'
  make[2]: *** [init/do_mounts_md.o] Error 1

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 13:35:34 -07:00
Thomas Petazzoni ebf3f09c63 Configure out AIO support
This patchs adds the CONFIG_AIO option which allows to remove support
for asynchronous I/O operations, that are not necessarly used by
applications, particularly on embedded devices. As this is a
size-reduction option, it depends on CONFIG_EMBEDDED. It allows to
save ~7 kilobytes of kernel code/data:

   text	   data	    bss	    dec	    hex	filename
1115067	 119180	 217088	1451335	 162547	vmlinux
1108025	 119048	 217088	1444161	 160941	vmlinux.new
  -7042    -132       0   -7174   -1C06 +/-

This patch has been originally written by Matt Mackall
<mpm@selenic.com>, and is part of the Linux Tiny project.

[randy.dunlap@oracle.com: build fix]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:51 -07:00
Nye Liu 889d51a107 initramfs: add option to preserve mtime from initramfs cpio images
When unpacking the cpio into the initramfs, mtimes are not preserved by
default.  This patch adds an INITRAMFS_PRESERVE_MTIME option that allows
mtimes stored in the cpio image to be used when constructing the
initramfs.

For embedded applications that run exclusively out of the initramfs, this
is invaluable:

When building embedded application initramfs images, its nice to know when
the files were actually created during the build process - that makes it
easier to see what files were modified when so we can compare the files
that are being used on the image with the files used during the build
process.  This might help (for example) to determine if the target system
has all the updated files you expect to see w/o having to check MD5s etc.

In our environment, the whole system runs off the initramfs partition, and
seeing the modified times of the shared libraries (for example) helps us
find bugs that may have been introduced by the build system incorrectly
propogating outdated shared libraries into the image.

Similarly, many of the initializion/configuration files in /etc might be
dynamically built by the build system, and knowing when they were modified
helps us sanity check whether the target system has the "latest" files
etc.

Finally, we might use last modified times to determine whether a hot fix
should be applied or not to the running ramfs.

Signed-off-by: Nye Liu <nyet@nyet.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:31 -07:00
Geert Uytterhoeven 93fd85d005 identify_ramdisk_image(): correct typo about return value in comment
identify_ramdisk_image() returns 0 (not -1) if a gzipped ramdisk is found:

	if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
		printk(KERN_NOTICE
		       "RAMDISK: Compressed image found at block %d\n",
		       start_block);
		nblocks = 0;
		^^^^^^^^^^^
		goto done;
	}

	...

done:
	sys_lseek(fd, start_block * BLOCK_SIZE, 0);
	kfree(buf);
	return nblocks;
	^^^^^^^^^^^^^^

Hence correct the typo in the comment, which has existed since the
addition of compressed ramdisk support in 1.3.48.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:30 -07:00
Ingo Molnar b2aaf8f74c Merge branch 'linus' into stackprotector
Conflicts:
	arch/x86/kernel/Makefile
	include/asm-x86/pda.h
2008-10-15 13:46:29 +02:00
Linus Torvalds 2d51b75370 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot:
  raid, fastboot: hide RAID autodetect option if MD is compiled as a module
  raid: make RAID autodetect default a KConfig option
  warning: fix init do_mounts_md c
  fastboot: make the RAID autostart code print a message just before waiting
  fastboot: make the raid autodetect code wait for all devices to init
  fastboot: Fix bootgraph.pl initcall name regexp
  fastboot: fix issues and improve output of bootgraph.pl
  Add a script to visualize the kernel boot process / time
2008-10-14 12:28:02 -07:00
Tim Bird ca538f6bbe tracing/fastboot: add better resolution to initcall debug/tracing
Change the time resolution for initcall_debug to microseconds, from
milliseconds.  This is handy to determine which initcalls you want to work
on for faster booting.

One one of my test machines, over 90% of the initcalls are less than a
millisecond and (without this patch) these are all reported as 0 msecs.
Working on the 900 us ones is more important than the 4 us ones.

With 'quiet' on the kernel command line, this adds no significant overhead
to kernel boot time.

Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:39:27 +02:00