Commit Graph

17277 Commits

Author SHA1 Message Date
Olaf Hering 4f9a58d75b increase AT_VECTOR_SIZE to terminate saved_auxv properly
include/asm-powerpc/elf.h has 6 entries in ARCH_DLINFO.  fs/binfmt_elf.c
has 14 unconditional NEW_AUX_ENT entries and 2 conditional NEW_AUX_ENT
entries.  So in the worst case, saved_auxv does not get an AT_NULL entry at
the end.

The saved_auxv array must be terminated with an AT_NULL entry.  Make the
size of mm_struct->saved_auxv arch dependend, based on the number of
ARCH_DLINFO entries.

Signed-off-by: Olaf Hering <olh@suse.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:00 -07:00
Pavel Emelyanov 1efd24fa05 Remove unused member from nsproxy
The nslock spinlock is not used in the kernel at all.  Remove it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:59 -07:00
Alexey Dobriyan 970a8645ca user.c: #ifdef ->mq_bytes
For those who deselect POSIX message queues.

Reduces SLAB size of user_struct from 64 to 32 bytes here, SLUB size -- from
40 bytes to 32 bytes.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:59 -07:00
Alan Cox 5f519d7281 tty: expose new methods needed for drivers to get termios right
This adds three new functions (or in one case to be more exact makes it
always available)

tty_termios_copy_hw

Copies all the hardware settings from one termios structure to the other.
This is intended for drivers that support little or no hardware setting

tty_termios_encode_baud_rate

Allows you to set the input and output baud rate in a termios structure.  A
driver is supposed to set the resulting baud rate from a request so most
will want to use this function to set the resulting input and output rates
to match the hardware values.  Internally it knows about keeping Bxxx
encoding when possible to maximise compatibility.

tty_encode_baud_rate

As above but for the tty's own current termios structure

I suspect this will initially need some tweaking as it gets enabled by
driver patches over the next few mm cycles so consider this lot -mm only
for the moment so it can stabilize and end up neat before it goes to base.

I've tried not to break any obscure architectures - if you get a speed you
can't represent the code will print warnings on non updated termios systems
but not break.

Once this is merged and seems sane I've got a growing pile of driver
updates to use it - notably for USB serial drivers.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:58 -07:00
Denys Vlasenko b9ec0339d8 add consts where appropriate in fs/nls/*
Add const modifiers to a few struct nls_table's member pointers in
include/linux/nls.h and adds a lot of const's in fs/nls/*.c files.

Resulting changes as visible by size:

   text    data     bss     dec     hex filename
 113612  481216    2368  597196   91ccc nls.org/built-in.o
 593548    3296     288  597132   91c8c nls/built-in.o

Apparently compiler managed to optimize code a bit better
because of const-ness.

No other changes are made.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:58 -07:00
Denis V. Lunev 37c42524d6 shrink_dcache_sb speedup
This patch makes shrink_dcache_sb consistent with dentry pruning policy.

On the first pass we iterate over dentry unused list and prepare some
dentries for removal.

However, since the existing code moves evicted dentries to the beginning of
the LRU it can happen that fresh dentries from other superblocks will be
inserted *before* our dentries.

This can result in significant slowdown of shrink_dcache_sb().  Moreover,
for virtual filesystems like unionfs which can call dput() during dentries
kill existing code results in O(n^2) complexity.

We observed 2 minutes shrink_dcache_sb() with only 35000 dentries.

To avoid this effects we propose to isolate sb dentries at the end
of LRU list.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrey Mirkin <amirkin@openvz.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:57 -07:00
Emil Medve 1f7c8234c7 Make the pr_*() family of macros in kernel.h complete
Other/Some pr_*() macros are already defined in kernel.h, but pr_err() was
defined multiple times in several other places

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Tony Lindgren <tony@atomide.com>
Reviewed-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:57 -07:00
David Howells 76181c134f KEYS: Make request_key() and co fundamentally asynchronous
Make request_key() and co fundamentally asynchronous to make it easier for
NFS to make use of them.  There are now accessor functions that do
asynchronous constructions, a wait function to wait for construction to
complete, and a completion function for the key type to indicate completion
of construction.

Note that the construction queue is now gone.  Instead, keys under
construction are linked in to the appropriate keyring in advance, and that
anyone encountering one must wait for it to be complete before they can use
it.  This is done automatically for userspace.

The following auxiliary changes are also made:

 (1) Key type implementation stuff is split from linux/key.h into
     linux/key-type.h.

 (2) AF_RXRPC provides a way to allocate null rxrpc-type keys so that AFS does
     not need to call key_instantiate_and_link() directly.

 (3) Adjust the debugging macros so that they're -Wformat checked even if
     they are disabled, and make it so they can be enabled simply by defining
     __KDEBUG to be consistent with other code of mine.

 (3) Documentation.

[alan@lxorguk.ukuu.org.uk: keys: missing word in documentation]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:57 -07:00
Ralf Baechle 622a9edd91 Remove dma_cache_(wback|inv|wback_inv) functions
dma_cache_(wback|inv|wback_inv) were the earliest attempt on a generalized
cache managment API for I/O purposes.  Originally it was basically the raw
MIPS low level cache API exported to the entire world.  The API has
suffered from a lack of documentation, was not very widely used unlike it's
more modern brothers and can easily be replaced by dma_cache_sync.  So
remove it rsp.  turn the surviving bits back into an arch private API, as
discussed on linux-arch.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:57 -07:00
Matti Linnanvuori f20fda4861 Mutex documentation is unclear about software interrupts, tasklets and timers
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:57 -07:00
Bill Nottingham 2e8ecb9db0 add CONFIG_VT_UNICODE
As of now, the kernel defaults to non-unicode and XLATE for the keyboard.
We've been changing this in Fedora, but that requires patching the defaults
in the kernel.

The attached introduces CONFIG_VT_UNICODE, which sets the console in
unicode mode by default on boot, including both the virtual terminal and
the keyboard driver.

Signed-off-by: Bill Nottingham <notting@redhat.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:56 -07:00
Jan Beulich 22e48eaf58 constify string/array kparam tracking structures
.. in an effort to make read-only whatever can be made, so that
CONFIG_DEBUG_RODATA can catch as many issues as possible.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:56 -07:00
Jan Beulich d5aa0daf6d store __setup_str_* in a more compact way
__setup_str_* are referenced only during boot, hence there's no need to
waste image space for aligning these strings (with the aim of improving
performance).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:56 -07:00
Robert P. J. Day b311e921b3 Add a "rounddown_pow_of_two" routine to log2.h
To go along with the existing "roundup_pow_of_two" routine, add one for
rounding down since that operation appears to crop up on a regular basis in
the source tree.

[m.kozlowski@tuxland.pl: fix unbalanced parentheses]
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:56 -07:00
Jan Kara 8e8934695d quota: send messages via netlink
Implement sending of quota messages via netlink interface.  The advantage
is that in userspace we can better decide what to do with the message - for
example display a dialogue in your X session or just write the message to
the console.  As a bonus, we can get rid of problems with console locking
deep inside filesystem code once we remove the old printing mechanism.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:56 -07:00
Adrian Bunk b012d346c0 make kernel/profile.c:time_hook static
{,un}register_timer_hook() is the API that should be used.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:55 -07:00
Adrian Bunk cba4fbbff2 remove include/asm-*/ipc.h
All asm/ipc.h files do only #include <asm-generic/ipc.h>.

This patch therefore removes all include/asm-*/ipc.h files and moves the
contents of include/asm-generic/ipc.h to include/linux/ipc.h.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:55 -07:00
Alexey Dobriyan 4af3c9cc4f Drop some headers from mm.h
mm.h doesn't use directly anything from mutex.h and backing-dev.h, so
remove them and add them back to files which need them.

Cross-compile tested on many configs and archs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:55 -07:00
Paul Clements 7fdfd4065c NBD: allow hung network I/O to be cancelled
Allow NBD I/O to be cancelled when a network outage occurs.  Previously, I/O
would just hang, and if enough I/O was hung in nbd, the system (at least
user-level) would completely hang until a TCP timeout (default, 15 minutes)
occurred.

The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit
timeout value (in seconds) to be specified.  Any network send that exceeds the
timeout will be cancelled and the nbd connection will be shut down.  I've
tested with various timeout values and 6 seconds seems to be a good choice for
the timeout.  If the NBD_SET_TIMEOUT ioctl is not called, you get the old (I/O
hang) behavior.

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:55 -07:00
Jan Beulich 9c6cdad7fe cleanup floppy.h
AUTO_DMA and FLOPPY_MOTOR_MASK in include/asm-*/floppy.h are dead symbols -
remove them.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:55 -07:00
Alan Cox 328dfd0f78 tty.h: remove dead define
No longer used. TTY_FLIPBUF_SIZE will also go soon but needs a couple of
other cleanups first

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:55 -07:00
Alexey Dobriyan 42b2dd0a02 Shrink task_struct if CONFIG_FUTEX=n
robust_list, compat_robust_list, pi_state_list, pi_state_cache are
really used if futexes are on.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:55 -07:00
Ken'ichi Ohmichi bcbba6c10e add-vmcore: add a prefix "VMCOREINFO_" to the vmcoreinfo macros
Add a prefix "VMCOREINFO_" to the vmcoreinfo macros.  Old vmcoreinfo macros
were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is
impossible to grep for them.  So these names should be changed.  This
discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html

Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:54 -07:00
Ken'ichi Ohmichi 6cfa062f01 add-vmcore: add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_data
[2/3] Add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_data.
  The dump filetering command 'makedumpfile'(v1.1.6 or before) had assumed
  the above values, and it was not good from the reliability viewpoint.
  So makedumpfile v1.2.0 came to need these values and I created the patch
  to let the kernel output them.
  makedumpfile site:
  https://sourceforge.net/projects/makedumpfile/

Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:54 -07:00
Ken'ichi Ohmichi d768281e97 add-vmcore: cleanup the coding style according to Andrew's comments
[1/3] Cleanup the coding style according to Andrew's comments:
http://lists.infradead.org/pipermail/kexec/2007-August/000522.html
- vmcoreinfo_append_str() should have suitable __attribute__s so that
  the compiler can check its use.
- vmcoreinfo_max_size should have size_t.
- Use get_seconds() instead of xtime.tv_sec.
- Use init_uts_ns.name.release instead of UTS_RELEASE.

Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:54 -07:00
Ken'ichi Ohmichi fd59d231f8 Add vmcoreinfo
This patch set frees the restriction that makedumpfile users should install a
vmlinux file (including the debugging information) into each system.

makedumpfile command is the dump filtering feature for kdump.  It creates a
small dumpfile by filtering unnecessary pages for the analysis.  To
distinguish unnecessary pages, it needs a vmlinux file including the debugging
information.  These days, the debugging package becomes a huge file, and it is
hard to install it into each system.

To solve the problem, kdump developers discussed it at lkml and kexec-ml.  As
the result, we reached the conclusion that necessary information for dump
filtering (called "vmcoreinfo") should be embedded into the first kernel file
and it should be accessed through /proc/vmcore during the second kernel.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)

Dan Aloni created the patch set for the above implementation.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)

And I updated it for multi architectures and memory models.
(http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)

Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:54 -07:00
Mathieu Desnoyers 2b47c3611d Fix f_version type: should be u64 instead of unsigned long
Fix f_version type: should be u64 instead of long

There is a type inconsistency between struct inode i_version and struct file
f_version.

fs.h:

struct inode
  u64                     i_version;

and

struct file
  unsigned long           f_version;

Users do:

fs/ext3/dir.c:

if (filp->f_version != inode->i_version) {

So why isn't f_version a u64 ? It becomes a problem if versions gets
higher than 2^32 and we are on an architecture where longs are 32 bits.

This patch changes the f_version type to u64, and updates the users accordingly.

It applies to 2.6.23-rc2-mm2.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Martin Bligh <mbligh@google.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: <linux-ext4@vger.kernel.org>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:53 -07:00
Adrian Bunk 4a239427f2 make fs/libfs.c:simple_commit_write() static
simple_commit_write() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:53 -07:00
Adrian Bunk ba2a631b14 kernel/time/timekeeping.c: cleanups
- remove the no longer required __attribute__((weak)) of xtime_lock
- remove the following no longer used EXPORT_SYMBOL's:
  - xtime
  - xtime_lock

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:53 -07:00
Alexey Dobriyan 3befe7ceb8 Shrink struct task_struct::oomkilladj
oomkilladj is int, but values which can be assigned to it are -17, [-16,
15], thus fitting into s8.

While patch itself doesn't help in making task_struct smaller, because of
natural alignment of ->link_count, it will make picture clearer wrt futher
task_struct reduction patches.  My plan is to move ->fpu_counter and
->oomkilladj after ->ioprio filling hole on i386 and x86_64.  But that's
for later, because bloated distro configs need looking at as well.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:53 -07:00
Olaf Hering 68a9bd0cd5 remove strict ansi check from __u64 in asm/types.h
Remove the __STRICT_ANSI__ check from the __u64/__s64 declaration on
32bit targets.

GCC can be made to warn about usage of long long types with ISO C90
(-ansi), but only with -pedantic.  You can write this in a way that even
then it doesn't cause warnings, namely by:

#ifdef __GNUC__
__extension__ typedef __signed__ long long __s64;
__extension__ typedef unsigned long long __u64;
#endif

The __extension__ keyword in front of this switches off any pedantic
warnings for this expression.

Signed-off-by: Olaf Hering <olh@suse.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:53 -07:00
Andi Drebes ac8d35c565 cramfs: error message about endianess
The README file in the cramfs subdirectory says: "All data is currently in
host-endian format; neither mkcramfs nor the kernel ever do swabbing."

If somebody tries to mount a cramfs with the wrong endianess, cramfs only
complains about a wrong magic but doesn't inform the user that only the
endianess isn't right.

The following patch adds an error message to the cramfs sources.  If a user
tries to mount a cramfs with the wrong endianess using the patched sources,
cramfs will display the message "cramfs: wrong endianess".

Signed-off-by: Andi Drebes <lists-receive@programmierforen.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:53 -07:00
Olaf Hering 7f44c3621a include linux/types.h in if_fddi.h
include/linux/if_fddi.h is an exported header.
It uses __be16. Include linux/types.h to get this prototype.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:52 -07:00
Olaf Hering e30618cbd1 remove consolemap.h from header exports
Remove linux/consolemap.h from make headers_install

It contains no user interfaces.
The defines in this file are used only for kernel internal state.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:52 -07:00
Samuel Thibault 04c7197650 unicode diacritics support
There have been issues with non-latin1 diacritics and unicode.
http://bugzilla.kernel.org/show_bug.cgi?id=7746

Git 759448f459 `Kernel utf-8 handling'
partly resolved it by adding conversion between diacritics and
unicode. The patch below goes further by just turning diacritics into
unicode, hence providing better future support. The kbd support can be
fetched from
http://bugzilla.kernel.org/attachment.cgi?id=12313

This was tested in all of latin1, latin9, latin2 and unicode with french
and czech dead keys.

Turn the kernel accent_table into unicode, and extend ioctls KDGKBDIACR
and KDSKBDIACR into their equivalents KDGKBDIACRUC and KDSKBDIACR.

New function int conv_uni_to_8bit(u32 uni) for converting unicode into 8bit
_input_.  No, we don't want to store the translation, as it is potentially
sparse and large.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Jan Engelhardt <jengelh@gmx.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:52 -07:00
Roland McGrath 82df39738b Add MMF_DUMP_ELF_HEADERS
This adds the MMF_DUMP_ELF_HEADERS option to /proc/pid/coredump_filter.
This dumps the first page (only) of a private file mapping if it appears to
be a mapping of an ELF file.  Including these pages in the core dump may
give sufficient identifying information to associate the original DSO and
executable file images and their debugging information with a core file in
a generic way just from its contents (e.g.  when those binaries were built
with ld --build-id).  I expect this to become the default behavior
eventually.  Existing versions of gdb can be confused by the core dumps it
creates, so it won't enabled by default for some time to come.  Soon many
people will have systems with a gdb that handle these dumps, so they can
arrange to set the bit at boot and have it inherited system-wide.

This also cleans up the checking of the MMF_DUMP_* flag bits, which did not
need to be using atomic macros.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:52 -07:00
Olaf Hering e629a7ddc0 do not export /usr/include/scsi in make headers_install
/usr/include/scsi is provided by glibc.
Remove the scsi export from make headers_install target.

Signed-off-by: Olaf Hering <olh@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:52 -07:00
Roland McGrath 5f149cf0ac powerpc: Use linux/elfcore-compat.h
This makes powerpc64's compat code use the new linux/elfcore-compat.h,
reducing some hand-copied duplication.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:51 -07:00
Roland McGrath ab799dede9 Add linux/elfcore-compat.h
This adds the linux/elfcore-compat.h header file, which is the CONFIG_COMPAT
analog of the linux/elfcore.h header.  Each arch that needs to fake out
fs/binfmt_elf.c for its compat code can use this header to replace the
hand-copied definitions of the compat variants of struct elf_prstatus et al.
Only the pr_reg field varies by arch, so asm/{compat,elf}.h must define
compat_elf_gregset_t before linux/elfcore-compat.h can be used.

It's a clean-up that every arch with compat core dumping code can benefit
from.  I only touched the ones I have handy to test at home.  Doing the same
for each other arch should be straightforward, and I'm happy to offer tips.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:51 -07:00
Christoph Hellwig bcd6d4ecf6 ufs: move non-layout parts of ufs_fs.h to fs/ufs/
Move prototypes and in-core structures to fs/ufs/ similar to what most
other filesystems already do.

I made little modifications: move also ufs debug macros and
mount options constants into fs/ufs/ufs.h, this stuff
also private for ufs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:51 -07:00
Mike Frysinger 0b15d04af3 printk: add interfaces for external access to the log buffer
Add two new functions for reading the kernel log buffer.  The intention is for
them to be used by recovery/dump/debug code so the kernel log can be easily
retrieved/parsed in a crash scenario, but they are generic enough for other
people to dream up other fun uses.

[akpm@linux-foundation.org: buncha fixes]
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:50 -07:00
Roland McGrath 6d76013381 Add /sys/module/name/notes
This patch adds the /sys/module/<name>/notes/ magic directory, which has a
file for each allocated SHT_NOTE section that appears in <name>.ko.  This
is the counterpart for each module of /sys/kernel/notes for vmlinux.
Reading this delivers the contents of the module's SHT_NOTE sections.  This
lets userland easily glean any detailed information about that module's
build that was stored there at compile time (e.g.  by ld --build-id).

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:50 -07:00
Neil Horman 7dc0b22e3c core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe
For some time /proc/sys/kernel/core_pattern has been able to set its output
destination as a pipe, allowing a user space helper to receive and
intellegently process a core.  This infrastructure however has some
shortcommings which can be enhanced.  Specifically:

1) The coredump code in the kernel should ignore RLIMIT_CORE limitation
   when core_pattern is a pipe, since file system resources are not being
   consumed in this case, unless the user application wishes to save the core,
   at which point the app is restricted by usual file system limits and
   restrictions.

2) The core_pattern code should be able to parse and pass options to the
   user space helper as an argv array.  The real core limit of the uid of the
   crashing proces should also be passable to the user space helper (since it
   is overridden to zero when called).

3) Some miscellaneous bugs need to be cleaned up (specifically the
   recognition of a recursive core dump, should the user mode helper itself
   crash.  Also, the core dump code in the kernel should not wait for the user
   mode helper to exit, since the same context is responsible for writing to
   the pipe, and a read of the pipe by the user mode helper will result in a
   deadlock.

This patch:

Remove the check of RLIMIT_CORE if core_pattern is a pipe.  In the event that
core_pattern is a pipe, the entire core will be fed to the user mode helper.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Cc: <martin.pitt@ubuntu.com>
Cc: <wwoods@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:50 -07:00
Mark Fortescue 252e211e90 Add in SunOS 4.1.x compatible mode for UFS
Add in support for SunOS 4.1.x flavor of BSD 4.2 UFS filing system Macros have
been put in to alow suport for the old static table Cylinder Groups but this
implementation does not use them yet.

This also fixes Solaris UFS filing system access by disabling fast symbolic
links as Sun's version of UFS does not support on-disk fast symbolic links.

Tested by:
  Ppartitioning a new disk using SunOS 4.1.1, creating a UFS filing system on
  one of the partitions and writing some files to the filing system.
  Using Linux-2.6.22 (patched) to read the files and then write a shed load of
  files to the UFS partition.
  Using SunOS 4.1.1 to verify the filing system is OK and to check the files.
The test host is a sun4c SS1 Clone.

[akpm@linux-foundation.org: coding style fixes]
[adobriyan@gmail.com: fix oops]
Signed-off-by: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:49 -07:00
Denis Cheng 74bf17cffc fs: remove the unused mempages parameter
Since the mempages parameter is actually not used, they should be removed.

Now there is only files_init use the mempages parameter,

 	files_init(mempages);

but I don't think the adaptation to mempages in files_init is really
useful; and if files_init also changed to the prototype void (*func)(void),
the wrapper vfs_caches_init would also not need the mempages parameter.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:49 -07:00
Rusty Russell af49d9248f Remove "unsafe" from module struct
Adrian Bunk points out that "unsafe" was used to mark modules touched by
the deprecated MOD_INC_USE_COUNT interface, which has long gone.  It's time
to remove the member from the module structure, as well.

If you want a module which can't unload, don't register an exit function.

(Vlad Yasevich says SCTP is now safe to unload, so just remove the
__unsafe there).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:49 -07:00
Miklos Szeredi d9c9bef134 ext4: show all mount options
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:49 -07:00
Miklos Szeredi 571beed8d6 ext3: show all mount options
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:48 -07:00
Miklos Szeredi 93d44cb275 ext2: show all mount options
Using mtab is problematic for various reasons, one of them is that
unprivileged mounts won't turn up in there.  So we want to get rid of it, and
use /proc/mounts instead.

But most filesystems are lazy, and are not showing all mount options.  Which
means, that without mtab, the user won't be able to see some or all of the
options.

It would be nice if the generic code could remember the mount options, and
show them without the need to add extra code to filesystems.  But this is not
easy, because different filesystems handle mount options given options, and
not tough the rest.  This is not taken into account by mount(8) either, so
/etc/mtab will be broken in this case.

This series fixes up ->show_options() in ext[234].

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:48 -07:00
Alexey Dobriyan 4be28540ee Remove sysctl.h from fs.h
Rrrr, addition of sysctl.h to fs.h was't very smart, because simple
editing of the former will buy you big recompile, where it shouldn't
have to.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:48 -07:00
Christoph Hellwig 04fc8bbcf5 kill DECLARE_MUTEX_LOCKED
DECLARE_MUTEX_LOCKED was used for semaphores used as completions and we've
got rid of them.  Well, except for one in libusual that the maintainer
explicitly wants to keep as semaphore.  So convert that useage to an
explicit sema_init and kill of DECLARE_MUTEX_LOCKED so that new code is
reminded to use a completion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: "Satyam Sharma" <satyam.sharma@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:47 -07:00
Olaf Hering 4029a9177f unexport asm/shmparam.h
SHMLBA cant possible be used in userspace, see sparc versions of that header.

Do not export asm/shmparam.h during make headers_install_all
This removes another uservisible place of PAGE_SIZE

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:47 -07:00
Hans-Christian Egtvedt eb1f293060 Driver for the Atmel on-chip SSC on AT32AP and AT91
The Synchronous Serial Controller (SSC) on Atmel microprocessors are
capable of tranceiving many frame based protocols, like I2S.  Tested on the
AT32AP7000/ATSTK1000.

This driver is used in the ALSA sound driver for the AT73C213 external DAC
on the ATSTK1000 development board for AVR32.  This sound driver will be
submitted soon.

Hardware documentation can be found in the AT32AP7000 data sheet, which can
be downloaded from
http://www.atmel.com/dyn/products/datasheets.asp?family_id=682

[akpm@linux-foundation.org: init spinlock at compile time]
Signed-off-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Andrew Victor <andrew@sanpeople.com>
Cc: Patrice Vilchez <patrice.vilchez@rfo.atmel.com>
Cc: Nicolas Ferre <nicolas.ferre@rfo.atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:47 -07:00
Robert P. J. Day 94f582f82a Force erroneous inclusions of compiler-*.h files to be errors
Replace worthless comments with actual preprocessor errors when including
the wrong versions of the compiler.h files.

[akpm@linux-foundation.org: make it work]
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:47 -07:00
Ravikiran G Thirumalai c4f3b63fe1 softlockup: add a /proc tuning parameter
Control the trigger limit for softlockup warnings.  This is useful for
debugging softlockups, by lowering the softlockup_thresh to identify
possible softlockups earlier.

This patch:
1. Adds a sysctl softlockup_thresh with valid values of 1-60s
   (Higher value to disable false positives)
2. Changes the softlockup printk to print the cpu softlockup time

[akpm@linux-foundation.org: Fix various warnings and add definition of "two"]
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:47 -07:00
Ingo Molnar ad3b82795f softlockup: make asm/irq_regs.h available on every platform
The softlockup detector would like to use get_irq_regs(), so generalize the
availability on every Linux architecture.

(It is fine for an architecture to always return NULL to get_irq_regs(),
which it does by default.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Ian Molton <spyro@f2s.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:47 -07:00
Paul E. McKenney 97b430320c Immunize rcu_dereference() against crazy compiler writers
Turns out that compiler writers are a bit more aggressive about optimizing
than one might expect.  This patch prevents a number of such optimizations
from messing up rcu_deference().  This is not merely a theoretical problem, as
evidenced by the rmb() in mce_log().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:46 -07:00
Alexey Dobriyan f6b450d489 Make unregister_binfmt() return void
list_del() hardly can fail, so checking for return value is pointless
(and current code always return 0).

Nobody really cared that return value anyway.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:46 -07:00
Alexey Dobriyan e4dc1b14d8 Use list_head in binfmt handling
Switch single-linked binfmt formats list to usual list_head's.  This leads
to one-liners in register_binfmt() and unregister_binfmt().  The downside
is one pointer more in struct linux_binfmt.  This is not a problem, since
the set of registered binfmts on typical box is very small -- (ELF +
something distro enabled for you).

Test-booted, played with executable .txt files, modprobe/rmmod binfmt_misc.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:46 -07:00
Adrian Bunk deba0f49b9 fs/reiserfs/: cleanups
- remove the following no longer used functions:
  - bitmap.c: reiserfs_claim_blocks_to_be_allocated()
  - bitmap.c: reiserfs_release_claimed_blocks()
  - bitmap.c: reiserfs_can_fit_pages()

- make the following functions static:
  - inode.c: restart_transaction()
  - journal.c: reiserfs_async_progress_wait()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Vladimir V. Saveliev <vs@namesys.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:46 -07:00
David Rientjes d773ed6b85 mm: test and set zone reclaim lock before starting reclaim
Introduces new zone flag interface for testing and setting flags:

	int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag)

Instead of setting and clearing ZONE_RECLAIM_LOCKED each time shrink_zone() is
called, this flag is test and set before starting zone reclaim.  Zone reclaim
starts in __alloc_pages() when a zone's watermark fails and the system is in
zone_reclaim_mode.  If it's already in reclaim, there's no need to start again
so it is simply considered full for that allocation attempt.

There is a change of behavior with regard to concurrent zone shrinking.  It is
now possible for try_to_free_pages() or kswapd to already be shrinking a
particular zone when __alloc_pages() starts zone reclaim.  In this case, it is
possible for two concurrent threads to invoke shrink_zone() for a single zone.

This change forbids a zone to be in zone reclaim twice, which was always the
behavior, but allows for concurrent try_to_free_pages() or kswapd shrinking
when starting zone reclaim.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:46 -07:00
David Rientjes 9aad369e56 oom: add header file to Kbuild as unifdef
Preprocess include/linux/oom.h before exporting it to userspace.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:46 -07:00
David Rientjes 172acf60f3 oom: prevent including sched.h in header file
It's not necessary to include all of linux/sched.h in linux/oom.h.  Instead,
simply include prototypes for the relevant structs and include linux/types.h
for gfp_t.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:46 -07:00
David Rientjes bbe373f2c6 oom: compare cpuset mems_allowed instead of exclusive ancestors
Instead of testing for overlap in the memory nodes of the the nearest
exclusive ancestor of both current and the candidate task, it is better to
simply test for intersection between the task's mems_allowed in their task
descriptors.  This does not require taking callback_mutex since it is only
used as a hint in the badness scoring.

Tasks that do not have an intersection in their mems_allowed with the current
task are not explicitly restricted from being OOM killed because it is quite
possible that the candidate task has allocated memory there before and has
since changed its mems_allowed.

Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:46 -07:00
David Rientjes 098d7f128a oom: add per-zone locking
OOM killer synchronization should be done with zone granularity so that memory
policy and cpuset allocations may have their corresponding zones locked and
allow parallel kills for other OOM conditions that may exist elsewhere in the
system.  DMA allocations can be targeted at the zone level, which would not be
possible if locking was done in nodes or globally.

Synchronization shall be done with a variation of "trylocks." The goal is to
put the current task to sleep and restart the failed allocation attempt later
if the trylock fails.  Otherwise, the OOM killer is invoked.

Each zone in the zonelist that __alloc_pages() was called with is checked for
the newly-introduced ZONE_OOM_LOCKED flag.  If any zone has this flag present,
the "trylock" to serialize the OOM killer fails and returns zero.  Otherwise,
all the zones have ZONE_OOM_LOCKED set and the try_set_zone_oom() function
returns non-zero.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
David Rientjes e815af95f9 oom: change all_unreclaimable zone member to flags
Convert the int all_unreclaimable member of struct zone to unsigned long
flags.  This can now be used to specify several different zone flags such as
all_unreclaimable and reclaim_in_progress, which can now be removed and
converted to a per-zone flag.

Flags are set and cleared as follows:

	zone_set_flag(struct zone *zone, zone_flags_t flag)
	zone_clear_flag(struct zone *zone, zone_flags_t flag)

Defines the first zone flags, ZONE_ALL_UNRECLAIMABLE and ZONE_RECLAIM_LOCKED,
which have the same semantics as the old zone->all_unreclaimable and
zone->reclaim_in_progress, respectively.  Also converts all current users that
set or clear either flag to use the new interface.

Helper functions are defined to test the flags:

	int zone_is_all_unreclaimable(const struct zone *zone)
	int zone_is_reclaim_locked(const struct zone *zone)

All flag operators are of the atomic variety because there are currently
readers that are implemented that do not take zone->lock.

[akpm@linux-foundation.org: add needed include]
Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
David Rientjes 70e24bdf6d oom: move constraints to enum
The OOM killer's CONSTRAINT definitions are really more appropriate in an
enum, so define them in include/linux/oom.h.

Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
David Rientjes 5a3135c2e7 oom: move prototypes to appropriate header file
Move the OOM killer's extern function prototypes to include/linux/oom.h and
include it where necessary.

[clg@fr.ibm.com: build fix]
Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Christoph Lameter 4ba9b9d0ba Slab API: remove useless ctor parameter and reorder parameters
Slab constructors currently have a flags parameter that is never used.  And
the order of the arguments is opposite to other slab functions.  The object
pointer is placed before the kmem_cache pointer.

Convert

        ctor(void *object, struct kmem_cache *s, unsigned long flags)

to

        ctor(struct kmem_cache *s, void *object)

throughout the kernel

[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra 3e26c149c3 mm: dirty balancing for tasks
Based on ideas of Andrew:
  http://marc.info/?l=linux-kernel&m=102912915020543&w=2

Scale the bdi dirty limit inversly with the tasks dirty rate.
This makes heavy writers have a lower dirty limit than the occasional writer.

Andrea proposed something similar:
  http://lwn.net/Articles/152277/

The main disadvantage to his patch is that he uses an unrelated quantity to
measure time, which leaves him with a workload dependant tunable. Other than
that the two approaches appear quite similar.

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra 04fbfdc14e mm: per device dirty threshold
Scale writeback cache per backing device, proportional to its writeout speed.

By decoupling the BDI dirty thresholds a number of problems we currently have
will go away, namely:

 - mutual interference starvation (for any number of BDIs);
 - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts).

It might be that all dirty pages are for a single BDI while other BDIs are
idling. By giving each BDI a 'fair' share of the dirty limit, each one can have
dirty pages outstanding and make progress.

A global threshold also creates a deadlock for stacked BDIs; when A writes to
B, and A generates enough dirty pages to get throttled, B will never start
writeback until the dirty pages go away. Again, by giving each BDI its own
'independent' dirty limit, this problem is avoided.

So the problem is to determine how to distribute the total dirty limit across
the BDIs fairly and efficiently. A DBI that has a large dirty limit but does
not have any dirty pages outstanding is a waste.

What is done is to keep a floating proportion between the DBIs based on
writeback completions. This way faster/more active devices get a larger share
than slower/idle devices.

[akpm@linux-foundation.org: fix warnings]
[hugh@veritas.com: Fix occasional hang when a task couldn't get out of balance_dirty_pages]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra 145ca25eb2 lib: floating proportions
Given a set of objects, floating proportions aims to efficiently give the
proportional 'activity' of a single item as compared to the whole set. Where
'activity' is a measure of a temporal property of the items.

It is efficient in that it need not inspect any other items of the set
in order to provide the answer. It is not even needed to know how many
other items there are.

It has one parameter, and that is the period of 'time' over which the
'activity' is measured.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra 69cb51d18c mm: count writeback pages per BDI
Count per BDI writeback pages.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra c9e51e4180 mm: count reclaimable pages per BDI
Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra b2e8fb6efa mm: scalable bdi statistics counters
Provide scalable per backing_dev_info statistics counters.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra e0bf68ddec mm: bdi init hooks
provide BDI constructor/destructor hooks

[akpm@linux-foundation.org: compile fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra dc62a30e27 lib: percpu_counter_init_irq
provide a way to tell lockdep about percpu_counters that are supposed to be
used from irq safe contexts.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Peter Zijlstra 833f4077bf lib: percpu_counter_init error handling
alloc_percpu can fail, propagate that error.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Peter Zijlstra bf1d89c813 lib: percpu_count_sum()
Provide an accurate version of percpu_counter_read.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Peter Zijlstra 52d9f3b409 lib: percpu_counter_sum_positive
s/percpu_counter_sum/&_positive/

Because its consitent with percpu_counter_read*

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Peter Zijlstra 3a587f47b8 lib: percpu_counter_set
Provide a method to set a percpu counter to a specified value.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Peter Zijlstra 20e8976709 lib: make percpu_counter_add take s64
percpu_counter is a s64 counter, make _add consitent.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Peter Zijlstra 252e0ba6b7 lib: percpu_counter variable batch
Because the current batch setup has an quadric error bound on the counter,
allow for an alternative setup.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Peter Zijlstra 3cb4f9fa0c lib: percpu_counter_sub
Hugh spotted that some code does:
  percpu_counter_add(&counter, -unsignedlong)

which, when the amount argument is of type s32, sort-of works thanks to
two's-complement. However when we'd change the type to s64 this breaks on 32bit
machines, because the promotion rules zero extend the unsigned number.

Provide percpu_counter_sub() to hide the s64 cast. That is:
  percpu_counter_sub(&counter, foo)
is equal to:
  percpu_counter_add(&counter, -(s64)foo);

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Peter Zijlstra aa0dff2d09 lib: percpu_counter_add
s/percpu_counter_mod/percpu_counter_add/

Because its a better name, _mod implies modulo.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Peter Zijlstra c4dc4beed2 nfs: remove congestion_end()
These patches aim to improve balance_dirty_pages() and directly address three
issues:
  1) inter device starvation
  2) stacked device deadlocks
  3) inter process starvation

1 and 2 are a direct result from removing the global dirty limit and using
per device dirty limits. By giving each device its own dirty limit is will
no longer starve another device, and the cyclic dependancy on the dirty limit
is broken.

In order to efficiently distribute the dirty limit across the independant
devices a floating proportion is used, this will allocate a share of the total
limit proportional to the device's recent activity.

3 is done by also scaling the dirty limit proportional to the current task's
recent dirty rate.

This patch:

nfs: remove congestion_end().  It's redundant, clear_bdi_congested() already
wakes the waiters.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Mark Nelson 1f7d6668c2 powerpc: add Altivec/VMX state to coredumps
Update dump_task_altivec() (which has so far never been put to use) so that
it dumps the Altivec/VMX registers (VR[0] - VR[31], VSCR and VRSAVE) in the
same format as the ptrace get_vrregs(), and add the appropriate glue
typedef and #defines to make it work.

A new note type of NT_PPC_VMX was chosen to be 0x100 (arbitrarily) because
it allows the low range values to be used for more generic purposes and
0x100 seems an adequate starting point for PowerPC extensions.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Mark Nelson 5b20cd80b4 x86: replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE #define
Replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE in the coredump code which
allows for more flexibility in the note type for the state of 'extended
floating point' implementations in coredumps.  New note types can now be
added with an appropriate #define.

This does #define ELF_CORE_XFPREG_TYPE to be NT_PRXFPREG in all
current users so there's are no change in behaviour.

This will let us use different note types on powerpc for the Altivec/VMX
state that some PowerPC cpus have (G4, PPC970, POWER6) and for the SPE
(signal processing extension) state that some embedded PowerPC cpus from
Freescale have.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Christoph Hellwig eead191153 partially fix up the lookup_one_noperm mess
Try to fix the mess created by sysfs braindamage.

 - refactor code internal to fs/namei.c a little to avoid too much
   duplication:
	o __lookup_hash_kern is renamed back to __lookup_hash
	o the old __lookup_hash goes away, permission checks moves to
	  the two callers
	o useless inline qualifiers on above functions go away
 - lookup_one_len_kern loses it's last argument and is renamed to
   lookup_one_noperm to make it's useage a little more clear
 - added kerneldoc comments to describe lookup_one_len aswell as
   lookup_one_noperm and make it very clear that no one should use
   the latter ever.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Dhaval Giani b1a8c172c3 sched: fix !SYSFS build breakage
When CONFIG_SYSFS is not set, CONFIG_FAIR_USER_SCHED fails to build
with

kernel/built-in.o: In function `uids_kobject_init':
(.init.text+0x1488): undefined reference to `kernel_subsys'
kernel/built-in.o: In function `uids_kobject_init':
(.init.text+0x1490): undefined reference to `kernel_subsys'
kernel/built-in.o: In function `uids_kobject_init':
(.init.text+0x1480): undefined reference to `kernel_subsys'
kernel/built-in.o: In function `uids_kobject_init':
(.init.text+0x1494): undefined reference to `kernel_subsys'

This patch fixes this build error.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-17 16:55:11 +02:00
Paul Mackerras 4acadb965c Merge branch 'for-2.6.24' of git://git.secretlab.ca/git/linux-2.6-mpc52xx into merge 2007-10-17 22:31:13 +10:00
Olof Johansson f66bce5e6a [POWERPC] Add 1TB workaround for PA6T
PA6T has a bug where the slbie instruction does not honor the large
segment bit.  As a result, we have to always use slbia when switching
context.

We don't have to worry about changing the slbie's during fault processing,
since they should never be replacing one VSID with another using the
same ESID.  I.e. there's no risk for inserting duplicate entries due to a
failed slbie of the old entry.  So as long as we clear it out on context
switch we should be fine.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:09 +10:00
Joachim Fenkes 6b08f3ae8e [POWERPC] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers
Replace struct ibmebus_dev and struct ibmebus_driver with struct of_device
and struct of_platform_driver, respectively.  Match the external ibmebus
interface and drivers using it.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:08 +10:00
Joachim Fenkes fec738dd48 [POWERPC] Move of_device allocation into of_device.[ch]
Extract generic of_device allocation code from of_platform_device_create()
and move it into of_device.[ch], called of_device_alloc(). Also, there's now
of_device_free() which puts the device node.

This way, bus drivers that build on of_platform (like ibmebus will) can
build upon this code instead of reinventing the wheel.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:07 +10:00
H. Peter Anvin 3ea3351000 Remove magic macros for screen_info structure members
Stop using magic macros for screen_info structure members.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2007-10-16 22:57:17 -07:00
H. Peter Anvin 30c826451d [x86] remove uses of magic macros for boot_params access
Instead of using magic macros for boot_params access, simply use the
boot_params structure.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2007-10-16 17:38:31 -07:00
Linus Torvalds 2b0460b534 Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6: (33 commits)
  amd74xx: remove /proc/ide/amd74xx
  amd74xx/via82cxxx: don't initialize drive->dn
  sis5513: remove /proc/ide/sis
  ide: remove CONFIG_IDEDMA_ONLYDISK
  ide: add "hdx=nodma" kernel parameter
  ide: remove hwif->autodma and drive->autodma
  ide: remove "idex=dma" kernel parameter
  ide: remove CONFIG_BLK_DEV_IDEDMA_FORCED
  ide: use PCI_VDEVICE() macro
  sis5513: clear prefetch and postwrite for ATAPI devices
  it8213/piix/slc90e66: "de-couple" PIO and UDMA modes
  ide: unexport noautodma
  ide: unexport ide_tune_dma
  ide: remove ->ide_dma_check (take 2)
  ide-pmac: add PIO autotune fallback to ->ide_dma_check
  ide-cris: add PIO autotune fallback to ->ide_dma_check
  sl82c105: add PIO autotune fallback to ->ide_dma_check
  cs5530/sc1200: add PIO autotune fallback to ->ide_dma_check
  ide: remove ide_use_fast_pio()
  ide: remove drive->init_speed zeroing
  ...
2007-10-16 16:56:35 -07:00
Linus Torvalds b883a688ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  SELinux: kills warnings in Improve SELinux performance when AVC misses
  SELinux: improve performance when AVC misses.
  SELinux: policy selectable handling of unknown classes and perms
  SELinux: Improve read/write performance
  SELinux: tune avtab to reduce memory usage
2007-10-16 16:53:20 -07:00
Linus Torvalds 1316ff5d52 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (25 commits)
  firewire: fw-cdev: reorder wakeup vs. spinlock
  firewire: in-code doc updates.
  firewire: a header cleanup
  firewire: adopt read cycle timer ABI from raw1394
  firewire: fw-ohci: check for misconfigured bus (phyID == 63)
  firewire: fw-ohci: missing dma_unmap_single
  firewire: fw-ohci: log posted write errors
  firewire: fw-ohci: reorder includes
  firewire: fw-ohci: fix includes
  firewire: fw-ohci: enforce read order for selfID generation
  firewire: fw-sbp2: use an own workqueue (fix system responsiveness)
  firewire: fw-sbp2: expose module parameter for workarounds
  firewire: fw-sbp2: add support for multiple logical units per target
  firewire: fw-sbp2: always enable IRQs before calling command ORB callback
  firewire: fw-core: local variable shadows a global one
  firewire: optimize fw_core_add_address_handler
  ieee1394: ieee1394_core.c: use DEFINE_SPINLOCK for spinlock definition
  ieee1394: csr1212: proper refcounting
  ieee1394: nodemgr: fix leak of struct csr1212_keyval
  ieee1394: pcilynx: I2C cleanups
  ...
2007-10-16 16:52:21 -07:00
Sylvain Munaut 07e6e93136 [POWERPC] mpc52xx: Update mpc52xx_psc structure with B revision changes
On the mpc5200b the ccr register is 32 bits wide while on the
mpc5200 it's only 16 bits. It's up to the driver to use the
correct format depending on the chip it's running on.

The 5200b also offers some more registers & status in AC97
mode. Again, if not running on a 5200b the driver should not
use those.

Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2007-10-16 17:09:27 -06:00
Yuichi Nakamura 788e7dd4c2 SELinux: Improve read/write performance
It reduces the selinux overhead on read/write by only revalidating
permissions in selinux_file_permission if the task or inode labels have
changed or the policy has changed since the open-time check.  A new LSM
hook, security_dentry_open, is added to capture the necessary state at open
time to allow this optimization.

(see http://marc.info/?l=selinux&m=118972995207740&w=2)

Signed-off-by: Yuichi Nakamura<ynakam@hitachisoft.jp>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2007-10-17 08:59:31 +10:00
Stefan Richter a64408b96b firewire: adopt read cycle timer ABI from raw1394
This duplicates the read cycle timer feature of raw1394 (added in Linux
2.6.21) in firewire-core's userspace ABI.  The argument to the ioctl is
reordered though to ensure 32/64 bit compatibility.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
2007-10-17 00:00:08 +02:00
Bartlomiej Zolnierkiewicz c223701cf6 ide: add "hdx=nodma" kernel parameter
* Add "hdx=nodma" option allowing user to disallow DMA for a given device.

* Obsolete "ide=nodma" option.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-16 22:29:58 +02:00
Bartlomiej Zolnierkiewicz 9ff6f72f43 ide: remove hwif->autodma and drive->autodma
* hpt34x.c: disable DMA masks for HPT345
  (hwif->autodma is zero so DMA won't be enabled anyway).

* trm290.c: disable IDE_HFLAG_TRUST_BIOS_FOR_DMA flag
  (hwif->autodma is zero so DMA won't be enabled anyway).

* Check noautodma global variable instead of drive->autodma in ide_tune_dma().

  This fixes handling of "ide=nodma" kernel parameter for icside, ide-cris,
  au1xxx-ide, pmac, it821x, jmicron, sgiioc4 and siimage host drivers.

* Remove hwif->autodma (it was not checked by IDE core code anyway) and
  drive->autodma (was set by all host drivers - except HPT345/TRM290 special
  cases - unless "ide=nodma" was used).

While at it:
- remove needless printk() from icside.c
- remove stale FIXME/comment from ide-probe.c
- don't force DMA off if PCI bus-mastering had to be enabled in setup-pci.c
  (this setting was always later over-ridden by host drivers anyway)

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-16 22:29:58 +02:00
Bartlomiej Zolnierkiewicz 0ae2e17865 ide: remove ->ide_dma_check (take 2)
* Add IDE_HFLAG_TRUST_BIOS_FOR_DMA host flag for host drivers that depend
  on BIOS for programming device/controller for DMA.  Set it in cy82c693,
  generic, ns87415, opti621 and trm290 host drivers.

* Add IDE_HFLAG_VDMA host flag for host drivers using VDMA.  Set it in cs5520
  host driver.

* Teach ide_tune_dma() about IDE_HFLAG_TRUST_BIOS_FOR_DMA flag.

* Add generic ide_dma_check() helper and remove all open coded ->ide_dma_check
  implementations.  Fix all places checking for presence of ->ide_dma_check
  hook to check for ->ide_dma_on instead.

* Remove no longer needed code from config_drive_for_dma().

* Make ide_tune_dma() static.

v2:
* Fix config_drive_for_dma() return values.

* Fix ide-dma.c build for CONFIG_BLK_DEV_IDEDMA_PCI=n by adding
  dummy config_drive_for_dma() inline.

* Fix IDE_HFLAG_TRUST_BIOS_FOR_DMA handling in ide_dma_check().

* Fix init_hwif_it8213() comment.

There should be no functionality changes caused by this patch.

Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-16 22:29:55 +02:00
Bartlomiej Zolnierkiewicz 65c9cd23ca ide: remove ide_use_fast_pio()
Remove ide_use_fast_pio() and just re-tune PIO unconditionally if DMA tuning
has failed in ->ide_dma_check.  All host drivers using ide_use_fast_pio() set
drive->autotune so PIO is always tuned anyway and in some cases we _really_
need to re-tune PIO because PIO and DMA timings are shared.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-16 22:29:54 +02:00
Anton Blanchard 33ff910f0f Fix powerpc breakage in sg chaining code
Commit 78bdc3106a ("PPC: sg chaining
support") looks to have removed some unrelated ppc code.  Lets put it
back in.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 13:10:58 -07:00
Jeremy Fitzhardinge e3d2697669 xen: fix incorrect vcpu_register_vcpu_info hypercall argument
The kernel's copy of struct vcpu_register_vcpu_info was out of date,
at best causing the hypercall to fail and the guest kernel to fall
back to the old mechanism, or worse, causing random memory corruption.

[ Stable folks: applies to 2.6.23 ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Morten =?utf-8?q?B=C3=B8geskov?= <xen-users@morten.bogeskov.dk>
Cc: Mark Williamson <mark.williamson@cl.cam.ac.uk>
2007-10-16 11:51:31 -07:00
Jeremy Fitzhardinge 8965c1c095 paravirt: clean up lazy mode handling
Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
 1. enter lazy cpu mode
 2. leave lazy cpu mode
 3. enter lazy mmu mode
 4. leave lazy mmu mode
 5. flush pending batched operations

This complicates each paravirt backend, since it needs to deal with
all the possible state transitions, handling flushing, etc. In
particular, flushing is quite distinct from the other 4 functions, and
seems to just cause complication.

This patch removes the set_lazy_mode operation, and adds "enter" and
"leave" lazy mode operations on mmu_ops and cpu_ops.  All the logic
associated with enter and leaving lazy states is now in common code
(basically BUG_ONs to make sure that no mode is current when entering
a lazy mode, and make sure that the mode is current when leaving).
Also, flush is handled in a common way, by simply leaving and
re-entering the lazy mode.

The result is that the Xen, lguest and VMI lazy mode implementations
are much simpler.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Anthony Liguory <aliguori@us.ibm.com>
Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
2007-10-16 11:51:29 -07:00
Jeremy Fitzhardinge 93b1eab3d2 paravirt: refactor struct paravirt_ops into smaller pv_*_ops
This patch refactors the paravirt_ops structure into groups of
functionally related ops:

pv_info - random info, rather than function entrypoints
pv_init_ops - functions used at boot time (some for module_init too)
pv_misc_ops - lazy mode, which didn't fit well anywhere else
pv_time_ops - time-related functions
pv_cpu_ops - various privileged instruction ops
pv_irq_ops - operations for managing interrupt state
pv_apic_ops - APIC operations
pv_mmu_ops - operations for managing pagetables

There are several motivations for this:

1. Some of these ops will be general to all x86, and some will be
   i386/x86-64 specific.  This makes it easier to share common stuff
   while allowing separate implementations where needed.

2. At the moment we must export all of paravirt_ops, but modules only
   need selected parts of it.  This allows us to export on a case by case
   basis (and also choose which export license we want to apply).

3. Functional groupings make things a bit more readable.

Struct paravirt_ops is now only used as a template to generate
patch-site identifiers, and to extract function pointers for inserting
into jmp/calls when patching.  It is only instantiated when needed.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Anthony Liguory <aliguori@us.ibm.com>
Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
2007-10-16 11:51:29 -07:00
Linus Torvalds 821f3eff7c Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
  kbuild: introduce ccflags-y, asflags-y and ldflags-y
  kbuild: enable 'make CPPFLAGS=...' to add additional options to CPP
  kbuild: enable use of AFLAGS and CFLAGS on commandline
  kbuild: enable 'make AFLAGS=...' to add additional options to AS
  kbuild: fix AFLAGS use in h8300 and m68knommu
  kbuild: check for wrong use of CFLAGS
  kbuild: enable 'make CFLAGS=...' to add additional options to CC
  kbuild: fix up CFLAGS usage
  kbuild: make modpost detect unterminated device id lists
  kbuild: call export_report from the Makefile
  kbuild: move Kai Germaschewski to CREDITS
  kconfig/menuconfig: distinguish between selected-by-another options and comments
  kconfig: tristate choices with mixed tristate and boolean values
  include/linux/Kbuild: remove duplicate entries
  kbuild: kill backward compatibility checks
  kbuild: kill EXTRA_ARFLAGS
  kbuild: fix documentation in makefiles.txt
  kbuild: call make once for all targets when O=.. is used
  kbuild: pass -g to assembler under CONFIG_DEBUG_INFO
  kbuild: update _shipped files for kconfig syntax cleanup
  ...

Fix up conflicts in arch/um/sys-{x86_64,i386}/Makefile manually.
2007-10-16 11:23:06 -07:00
Linus Torvalds ebc283118e Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Increase cp0 compare clockevent min_delta_ns from 0x30 to 0x300.
  [MIPS] Cache: Provide more information on cache policy on bootup.
  [MIPS] Fix aliasing bug in copy_user_highpage, take 2.
  [MIPS] VPE loader: convert from struct class_ device to struct device
  [MIPS] MIPSsim: Fix booting from NFS root
  [MIPS] Alchemy: Get rid of au1xxx_irq_map_t.
  [MIPS] Alchemy: Get rid of au_ffz().
  [MIPS] Alchemy: Get rid of au_ffs().
  [MIPS] Alchemy: cleanup interrupt code.
  [MIPS] Lasat: Fix build by conversion to irq_cpu.c.
  [MIPS] Lasat: Add #ifndef ... #endif include warpper to lasatint.h.
  [MIPS] IP22: Enable -Werror.
  [MIPS] IP22: Fix warning.
  [MIPS] IP22: Complain if requesting the front panel irq failed.
  [MIPS] vmlinux.lds.S: Handle KPROBES_TEXT.
  [MIPS] vmlinux.lds.S: Fix handling of .notes in final link.
  [MIPS] vmlinux.lds.S: Remove duplicate comment.
  [MIPS] MSP71XX: Add workarounds file.
  [MIPS] IP32: Fix build by conversion to irq_cpu.c.
2007-10-16 10:44:35 -07:00
Ralf Baechle 0e6799ed07 [MIPS] Alchemy: Get rid of au1xxx_irq_map_t.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-16 18:23:48 +01:00
Ralf Baechle d2126e230d [MIPS] Alchemy: Get rid of au_ffz().
There were no users - and why have a private version anyway.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-16 18:23:48 +01:00
Ralf Baechle 56f621c7f6 [MIPS] Alchemy: Get rid of au_ffs().
It was plain a bad idea ...

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-16 18:23:48 +01:00
Ralf Baechle a5ccfe5c1a [MIPS] Lasat: Fix build by conversion to irq_cpu.c.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-16 18:23:47 +01:00
Ralf Baechle 951e471429 [MIPS] Lasat: Add #ifndef ... #endif include warpper to lasatint.h.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-16 18:23:47 +01:00
Ralf Baechle eae23f2c2a [MIPS] IP22: Fix warning.
CC      arch/mips/sgi-ip22/ip22-berr.o
arch/mips/sgi-ip22/ip22-berr.c: In function 'ip22_be_interrupt':
arch/mips/sgi-ip22/ip22-berr.c💯 warning: passing argument 2 of 'die_if_kernel' discards qualifiers from pointer target type

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-16 18:23:47 +01:00
Ralf Baechle b37bac94de [MIPS] MSP71XX: Add workarounds file.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-16 18:23:46 +01:00
Ralf Baechle dd67b1556e [MIPS] IP32: Fix build by conversion to irq_cpu.c.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-16 18:23:45 +01:00
Linus Torvalds fc8a327db6 Merge branch 'linus' of master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa
* 'linus' of master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa: (264 commits)
  [ALSA] version 1.0.15
  [ALSA] Fix thinko in cs4231 mce down check
  [ALSA] sun-cs4231: improved waiting after MCE down
  [ALSA] sun-cs4231: use cs4231-regs.h
  [ALSA] This simplifies and fixes waiting loops of the mce_down()
  [ALSA] This patch adds support for a wavetable chip on
  [ALSA] This patch removes open_mutex from the ad1848-lib as
  [ALSA] fix bootup crash in snd_gus_interrupt()
  [ALSA] hda-codec - Fix SKU ID function for realtek codecs
  [ALSA] Support  ASUS P701 eeepc [0x1043 0x82a1] support
  [ALSA] hda-codec - Add array terminator for dmic in STAC codec
  [ALSA] hdsp - Fix zero division
  [ALSA] usb-audio - Fix double comment
  [ALSA] hda-codec - Fix STAC922x volume knob control
  [ALSA] Changed Jaroslav Kysela's e-mail from perex@suse.cz to perex@perex.cz
  [ALSA] hda-codec - Fix for Fujitsu Lifebook C1410
  [ALSA] mpu-401: remove MPU401_INFO_UART_ONLY flag
  [ALSA] mpu-401: do not require an ACK byte for the ENTER_UART command
  [ALSA] via82xx - Add DXS quirk for Shuttle AK31v2
  [ALSA] hda-codec - Fix input_mux numbers for vaio stac92xx
  ...
2007-10-16 10:13:38 -07:00
Linus Torvalds 92d15c2ccb Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: (63 commits)
  Fix memory leak in dm-crypt
  SPARC64: sg chaining support
  SPARC: sg chaining support
  PPC: sg chaining support
  PS3: sg chaining support
  IA64: sg chaining support
  x86-64: enable sg chaining
  x86-64: update pci-gart iommu to sg helpers
  x86-64: update nommu to sg helpers
  x86-64: update calgary iommu to sg helpers
  swiotlb: sg chaining support
  i386: enable sg chaining
  i386 dma_map_sg: convert to using sg helpers
  mmc: need to zero sglist on init
  Panic in blk_rq_map_sg() from CCISS driver
  remove sglist_len
  remove blk_queue_max_phys_segments in libata
  revert sg segment size ifdefs
  Fixup u14-34f ENABLE_SG_CHAINING
  qla1280: enable use_sg_chaining option
  ...
2007-10-16 10:09:16 -07:00
Geert Uytterhoeven 9a054fbac8 fb: move and rename extern declaration for global_mode_option
Move the extern declaration for global_mode_option to <linux/fb.h> and rename
the variable to fb_mode_option.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:22 -07:00
Pavel Pisa 9da505d1f9 imxfb: fast read flag and nonstandard field configurable
The i.MX frame-buffer read operation should be faster for all configurations
then drawing each individual character again in response to scroll events.

The nonstandard fields allows to configure frame-buffer special options flags
for different display configurations by board specific initialization code.

One of such specific options is reversed order of pixels in each individual
byte.  i.MX frame-buffer seems to be designed for big-endian use first.  The
byte order is correctly configured for little-endian ordering, but if 1, 2 or
4 bits per pixel are used, pixels ordering is incompatible to Linux generic
frame-buffer drawing functions.

The patch "Allow generic BitBLT functions to work with swapped pixel order in
bytes" introduces required functionality into FBDEV core.  The pixels ordering
selection has to be enabled at compile time CONFIG_FB_CFB_REV_PIXELS_IN_BYTE
and for each display configuration which requires it by flag
FB_NONSTD_REV_PIX_IN_B in "nonstd" field of info structure.

This patch provides way for board specific code to select this option.

Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:21 -07:00
Geert Uytterhoeven ce4c371a9d ps3av: dont distinguish between `boot' and `non-boot' autodetection
don't distinguish between `boot' and `non-boot' autodetection now the
autodetection code has been improved

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:21 -07:00
Geert Uytterhoeven 466c449e5f ps3av: remove unused ps3av_set_mode()
remove unused ps3av_set_mode()

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:21 -07:00
Geert Uytterhoeven fd5621129b ps3av: add autodetection for VESA modes
add autodetection for VESA modes

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:20 -07:00
Geert Uytterhoeven 101aa56d02 ps3av: treat DVI-D like HDMI in autodetect
treat DVI-D monitors like HDMI monitors when autodetecting the best video mode

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:20 -07:00
Geert Uytterhoeven 71a27fecaf ps3av: use PS3 video mode ids in autodetect code
It doesn't make much sense to use the PS3AV_CMD_VIDEO_VID_* values in the
autodetection code, just to convert them to PS3 video mode ids afterwards.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:20 -07:00
Geert Uytterhoeven eea820ab0b ps3av: eliminate PS3AV_DEBUG
ps3av: eliminate PS3AV_DEBUG
  - Move ps3av_cmd_av_monitor_info_dump from ps3av_cmd.c to ps3av.c, as
it's
    used there only
  - Integrate ps3av_cmd_av_hw_conf_dump() into its sole user
  - Use pr_debug() for printing debug info

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:20 -07:00
Michael Hennerich e9fa7c43aa bf54x-lq043fb: framebuffer driver for Blackfin BF54x framebuffer device driver
Blackfin BF54x framebuffer device driver for a SHARP LQ043T1DG01 TFT LCD

[adaplas]
Add 'fb' suffix to driver name.
Move Makefile entry under platform device section

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:20 -07:00
Antonino A. Daplas e400b6ec4e vt/vgacon: Check if screen resize request comes from userspace
Various console drivers are able to resize the screen via the con_resize()
hook.  This hook is also visible in userspace via the TIOCWINSZ, VT_RESIZE and
VT_RESIZEX ioctl's.  One particular utility, SVGATextMode, expects that
con_resize() of the VGA console will always return success even if the
resulting screen is not compatible with the hardware.  However, this
particular behavior of the VGA console, as reported in Kernel Bugzilla Bug
7513, can cause undefined behavior if the user starts with a console size
larger than 80x25.

To work around this problem, add an extra parameter to con_resize().  This
parameter is ignored by drivers except for vgacon.  If this parameter is
non-zero, then the resize request came from a VT_RESIZE or VT_RESIZEX ioctl
and vgacon will always return success.  If this parameter is zero, vgacon will
return -EINVAL if the requested size is not compatible with the hardware.  The
latter is the more correct behavior.

With this change, SVGATextMode should still work correctly while in-kernel and
stty resize calls can expect correct behavior from vgacon.

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:20 -07:00
Krzysztof Halasa 394d3af7ba Intel FB: more interlaced mode support
Intel FB: allow odd- and even-field-first in interlaced modes, and
proper sync to vertical retrace

Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: <sylvain.meyer@worldonline.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:20 -07:00
Pavel Pisa 779121e9f1 fbdev: Support for byte-reversed framebuffer formats
Allow generic frame-buffer code to correctly write texts and blit images for
1, 2 and 4 bit per pixel frame-buffer organizations when pixels in bytes are
organized to in opposite order than bytes in long type.

Overhead should be reasonable.  If option is not selected, than compiler
should eliminate completely all overhead.

The feature is disabled at compile time if CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is
not set.

[adaplas]
Convert helper functions to macros if feature is not enabled.

Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:19 -07:00
Krzysztof Helt 2a36f9c497 pm2fb: hardware cursor support for the Permedia2
This patch adds hardware cursor support for the Permedia 2 chip.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:18 -07:00
Krzysztof Helt e84d436b3c pm3fb: header file cleanup
This patch fixes white spaces, redudant definitions and formating in the pm3fb
header file.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:18 -07:00
Krzysztof Helt 36f31a7084 s3c2410fb: removes lcdcon1 register value from s3c2410fb_display
This patch removes lcdcon1 register field from the s3c2410fb_display as all
bits are calculated from other fields.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:18 -07:00
Krzysztof Helt 69816699fa s3c2410fb: adds pixclock to s3c2410fb_display
This patch adds pixelclock field to the s3c2410fb_display structure and make
use of it in the driver.

The Bast machine defined 9 modes but pixclock and margin values are defined
only for the 640x480 modes so I removed other modes.

This patch also fixes wrong display type constant for the SMDK2440 board.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:18 -07:00
Ralf Baechle 120c0b6d57 vt: Fix warnings in selection.h
<linux/selection.h> assumes that struct tty_struct has previously been
included.  If not, this pile of warnings will result:

  CC [M]  drivers/video/console/newport_con.o
In file included from drivers/video/console/newport_con.c:18:
include/linux/selection.h:16: warning: 'struct tty_struct' declared inside param
eter list
include/linux/selection.h:16: warning: its scope is only this definition or decl
aration, which is probably not what you want
include/linux/selection.h:17: warning: 'struct tty_struct' declared inside param
eter list
include/linux/selection.h:20: warning: 'struct tty_struct' declared inside param
eter list

Fixed by adding a forward declaration of struct tty_struct.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:17 -07:00
Krzysztof Helt e92e739514 s3c2410fb: remove lcdcon2 and lcdcon3 register fields
This patch removes unused lcdcon2 and lcdcon3 register value
from the s3c2410fb_display structure.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:17 -07:00
Krzysztof Helt f28ef573ad s3c2410fb: remove lcdcon3 register from s3c2410fb_display
This patch removes unused lcdcon3 register from the
s3c2410fb_display structure.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:16 -07:00
Krzysztof Helt 1f4115376c s3c2410fb: add margin fields to s3c2410fb_display
This patch adds margins fields to the s3c2410fb_display
structure. It also sets display type and horizontal
margins in all platform files that use the s3c2410fb
driver.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:16 -07:00
Krzysztof Helt 09fe75f6f9 s3c2410fb: multi-display support
This patch adds a new structure to describe and handle
more than one panel (display mode) for the s3c2410 framebuffer.
This structure is added after the pxafb driver.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:16 -07:00
Krzysztof Helt 8f5d050af1 pm2fb: Permedia 2V hardware cursor support
This patch adds hardware cursor support for Permedia 2V chips.
The hardware cursor is disabled by default. It does not blink - the
same issue is mentioned in the x11 driver.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:16 -07:00
Krzysztof Helt 0960bd3db1 tdfxfb: mtrr support
This patch adds mtrr support to the tdfxfb driver.  It also kills one
redundant include and initialization value.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:15 -07:00
Krzysztof Helt 90b0f08536 tdfxfb: hardware cursor
This patch adds hardware cursor support to the tdfxfb driver.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:15 -07:00
Krzysztof Helt 4f05b53b28 tdfxfb: code improvements
This patch improves source code mainly by killing redundant variable loads,
reducing number of variables, simplifying conditional branches, etc.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:15 -07:00
Krzysztof Helt 8af1d50f7f tdfxfb: coding style improvement
This patch contains coding style improvements to the tdfxfb driver (white
spaces, indentations, long lines).

It also moves fb_ops structure to the end of file, so forward declarations of
ops functions are redundant.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:15 -07:00
Hans J. Koch 9ffa739606 pxafb: Add support for other palette formats
This patch adds support for the other three palette formats possible with
the PXA LCD controller. This is required on boards where an LCD is connected
with all its 18 bits. With this patch, it's possible to use an 8-bit mode
with 18-bit palette entries. This used to be possible in 2.4 kernels but
disappeared in 2.6. With current kernels, you can only get wrong colours
out of an LCD connected this way.

Users can choose the palette format by doing something like this
in their board definition:

static struct pxafb_mach_info my_fb_info = {
        [...]
        .lccr3          = LCCR3_OutEnH | LCCR3_PixFlEdg | LCCR3_PDFOR_3,
        .lccr4          = LCCR4_PAL_FOR_2,
        [...]
};

Signed-off-by: Hans J. Koch <hjk@linutronix.de>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:15 -07:00
Raphael Assenat ba282daa91 mbxfb: Improvements and new features
This contains the following changes:

* Overlay surface alpha is configured separately from the overlay. This
prevents display glitches (configure and fill the overlay first, set
alpha to a visible value next)

* Added an ioctl for configuring transparency of the Overlay and graphics
planes. Blend mode, colorkey mode and global alpha mode are supported.

* Added an ioctl for setting the plane order. The overlay plance can be placed
over or
under the graphics plane.

* Added an ioctl for setting and reading chip registers, with mask.

* Updated copyright for 2007

[adaplas]
* Coding style changes

Signed-off-by: Raphael Assenat <raph@8d.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:14 -07:00
Ben Dooks eb78f9b3fa sm501fb: Ensure panel interface is not tristated when setup
When we setup the panel interface whilst configuring the
framebuffer, we should ensure the panel interface is not
in tristate, in case the bootloader or previous setup has
not enabled it.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:14 -07:00
Krzysztof Helt 138a451cce pm2fb: Permedia 2V initialization fixes
This patch:
- initializes correctly the Permedia2V chip if it is not initialized by BIOS
- puts back clock frequency for the ELSA WINNER board to 100kHz
- fixes returned error values from setcolreg() function
- uses more general classes for PCI ids

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:14 -07:00
Krzysztof Helt 91b3a6f4cd pm2fb: accelerated imageblit
This patch adds accelerated imageblit function.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:14 -07:00
Krzysztof Helt f259ebb67b pm3fb: header file reduction
This patch removes constants named AAA_DISABLE with value 0. They are redudant
and misleading ( a |= AAA_DISABLE does nothing and usually should be
a &= ~AAA_ENABLE).

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:14 -07:00
Krzysztof Helt e7f76df964 pm3fb: copyarea and partial imageblit suppor
This patch adds accelerated copyarea and partially accelerated imageblit
functions. There is also fixed one register address in the pm3fb.h file.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:14 -07:00
Michal Januszewski 8bdb3a2d7d uvesafb: the driver core
uvesafb is an enhanced version of vesafb.  It uses a userspace helper (v86d)
to execute calls to the x86 Video BIOS functions.  The driver is not limited
to any specific arch and whether it works on a given arch or not depends on
that arch being supported by the userspace daemon.  It has been tested on
x86_32 and x86_64.

A single BIOS call is represented by an instance of the uvesafb_ktask
structure.  This structure contains a buffer, a completion struct and a
uvesafb_task substructure, containing the values of the x86 registers, a flags
field and a field indicating the length of the buffer.  Whenever a BIOS call
is made in the driver, uvesafb_exec() builds a message using the uvesafb_task
substructure and the contents of the buffer.  This message is then assigned a
random ack number and sent to the userspace daemon using the connector
interface.

The message's sequence number is used as an index for the uvfb_tasks array,
which provides a mapping from the messages coming from userspace to the
in-kernel uvesafb_ktask structs.

The userspace daemon performs the requested operation and sends a reply in the
form of a uvesafb_task struct and, optionally, a buffer.  The seq and ack
numbers in the reply should be exactly the same as those in the request.

Each message from userspace is processed by uvesafb_cn_callback() and after
passing a few sanity checks leads to the completion of a BIOS call request.

Signed-off-by: Michal Januszewski <spock@gentoo.org>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Paulo Marques <pmarques@grupopie.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:13 -07:00
Michal Januszewski cc54f46e39 uvesafb: add connector entries
Add connector idx and val constants for v86d and uvesafb.

Signed-off-by: Michal Januszewski <spock@gentoo.org>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:13 -07:00
Michal Januszewski 3d62a44f74 connector: change connector's max message size
Change the maximum message size to 16k to allow transfers of VBE
data blocks from userspace.

Signed-off-by: Michal Januszewski <spock@gentoo.org>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:13 -07:00
Adrian Bunk cce76f9b96 fs/nfsd/export.c: make 3 functions static
This patch makes the following needlessly global functions static:
- exp_get_by_name()
- exp_parent()
- exp_find()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:10 -07:00
Matthias Kaehlcke 4e3dfacaa0 use mutex instead of semaphore in isdn subsystem common functions
The ISDN subsystem common functions use a semaphore as mutex. Use the
mutex API instead of the (binary) semaphore.

Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Acked-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:10 -07:00
Masami Hiramatsu f438d914b2 kprobes: support kretprobe blacklist
Introduce architecture dependent kretprobe blacklists to prohibit users
from inserting return probes on the function in which kprobes can be
inserted but kretprobes can not.

This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
registers "__switch_to" to the blacklist on x86-64, because that mark is to
prohibit user from inserting only kretprobe.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:10 -07:00
Tony Jones 49dce689ad spi doesn't need class_device
Make the SPI framework and drivers stop using class_device.  Update docs
accordingly ...  highlighting just which sysfs paths should be
"safe"/stable.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:10 -07:00
Paul Jackson 607717a65d cpuset: remove sched domain hooks from cpusets
Remove the cpuset hooks that defined sched domains depending on the setting
of the 'cpu_exclusive' flag.

The cpu_exclusive flag can only be set on a child if it is set on the
parent.

This made that flag painfully unsuitable for use as a flag defining a
partitioning of a system.

It was entirely unobvious to a cpuset user what partitioning of sched
domains they would be causing when they set that one cpu_exclusive bit on
one cpuset, because it depended on what CPUs were in the remainder of that
cpusets siblings and child cpusets, after subtracting out other
cpu_exclusive cpusets.

Furthermore, there was no way on production systems to query the
result.

Using the cpu_exclusive flag for this was simply wrong from the get go.

Fortunately, it was sufficiently borked that so far as I know, almost no
successful use has been made of this.  One real time group did use it to
affectively isolate CPUs from any load balancing efforts.  They are willing
to adapt to alternative mechanisms for this, such as someway to manipulate
the list of isolated CPUs on a running system.  They can do without this
present cpu_exclusive based mechanism while we develop an alternative.

There is a real risk, to the best of my understanding, of users
accidentally setting up a partitioned scheduler domains, inhibiting desired
load balancing across all their CPUs, due to the nonobvious (from the
cpuset perspective) side affects of the cpu_exclusive flag.

Furthermore, since there was no way on a running system to see what one was
doing with sched domains, this change will be invisible to any using code.
Unless they have real insight to the scheduler load balancing choices, they
will be unable to detect that this change has been made in the kernel's
behaviour.

Initial discussion on lkml of this patch has generated much comment.  My
(probably controversial) take on that discussion is that it has reached a
rough concensus that the current cpuset cpu_exclusive mechanism for
defining sched domains is borked.  There is no concensus on the
replacement.  But since we can remove this mechanism, and since its
continued presence risks causing unwanted partitioning of the schedulers
load balancing, we should remove it while we can, as we proceed to work the
replacement scheduler domain mechanisms.

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:09 -07:00
Shannon Nelson 2ed6dc34f9 I/OAT: Add DCA services
Add code to connect to the DCA driver and provide cpu tags for use by
drivers that would like to use Direct Cache Access hints.

    [Adrian Bunk]                Several Kconfig cleanup items
    [Andrew Morten, Chris Leech] Fix for using cpu_physical_id() even when
			         built for uni-processor

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:09 -07:00
Shannon Nelson 7589670f37 DCA: Add Direct Cache Access driver
Direct Cache Access (DCA) is a method for warming the CPU cache before data
is used, with the intent of lessening the impact of cache misses.  This
patch adds a manager and interface for matching up client requests for DCA
services with devices that offer DCA services.

In order to use DCA, a module must do bus writes with the appropriate tag
bits set to trigger a cache read for a specific CPU.  However, different
CPUs and chipsets can require different sets of tag bits, and the methods
for determining the correct bits may be simple hardcoding or may be a
hardware specific magic incantation.  This interface is a way for DCA
clients to find the correct tag bits for the targeted CPU without needing
to know the specifics.

    [Dave Miller] use DEFINE_SPINLOCK()

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:09 -07:00
Shannon Nelson 3e037454bc I/OAT: Add support for MSI and MSI-X
Add support for MSI and MSI-X interrupt handling, including the ability
to choose the desired interrupt method.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
[bunk@kernel.org: drivers/dma/ioat_dma.c: make 3 functions static]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:09 -07:00
Shannon Nelson 223758c77a I/OAT: New device ids
Add device ids for new revs of the Intel I/OAT DMA engine

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:09 -07:00
Jeff Dike f0c4cad99c uml: style fixes in FP code
Tidy the code affected by the floating point fixes.

A bunch of unused stuff is gone, including two sigcontext.c files,
which turned out to be entirely unneeded.

There are the usual fixes -
	whitespace and style cleanups
	copyright updates
	emacs formatting comments gone
	include cleanups
	adding severities to printks

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:07 -07:00
Jeff Dike 058ac308f3 uml: coredumping floating point fixes
Fix core dumping of floating point state.  ELF_CORE_COPY_FPREGS gets a
definitions, and as a result, dump_fpu no longer needs to exist.  Also,
elf_fpregset_t needed a real definition.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:07 -07:00
Jeff Dike e8012b584f uml: ptrace floating point fixes
Handle floating point state better in ptrace.  The code now correctly
distinguishes between PTRACE_[GS]ETFPREGS and PTRACE_[GS]ETFPXREGS.  The FPX
requests get handed off to arch-specific code because that's not generic.

get_fpregs, set_fpregs, set_fpregs, and set_fpxregs needed real
implementations.

Something here exposed a missing include in asm/page.h, which needed
linux/types.h in order to get gfp_t, so that's fixed here.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:07 -07:00
Jeff Dike b21d4b08b6 uml: fix inlines
"extern inline" will have different semantics with gcc 4.3.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:06 -07:00
Jeff Dike 18badddaa8 uml: rename pt_regs general-purpose register file
Before the removal of tt mode, access to a register on the skas-mode side of a
pt_regs struct looked like pt_regs.regs.skas.regs.regs[FOO].  This was bad
enough, but it became pt_regs.regs.regs.regs[FOO] with the removal of the
union from the middle.  To get rid of the run of three "regs", the last field
is renamed to "gp".

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:06 -07:00
Jeff Dike 6c738ffa9f uml: fold mmu_context_skas into mm_context
This patch folds mmu_context_skas into struct mm_context, changing all users
of these structures as needed.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:06 -07:00
Jeff Dike fab95c55e3 uml: get rid of do_longjmp
do_longjmp used to be needed when UML didn't have its own implementation of
setjmp and longjmp.  They came from libc, and couldn't be called directly from
kernel code, as the libc jmp_buf couldn't be imported there.  do_longjmp was a
userspace function which served to provide longjmp access to kernel code.

This is gone, and a number of void * pointers can now be jmp_buf *.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:05 -07:00
Jeff Dike ba180fd437 uml: style fixes pass 3
Formatting changes in the files which have been changed in the course
of folding foo_skas functions into their callers.  These include:
	copyright updates
	header file trimming
	style fixes
	adding severity to printks

These changes should be entirely non-functional.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:05 -07:00
Jeff Dike 77bf440031 uml: remove code made redundant by CHOOSE_MODE removal
This patch makes a number of simplifications enabled by the removal of
CHOOSE_MODE.  There were lots of functions that looked like

	int foo(args){
		foo_skas(args);
	}

The bodies of foo_skas are now folded into foo, and their declarations (and
sometimes entire header files) are deleted.

In addition, the union uml_pt_regs, which was a union between the tt and skas
register formats, is now a struct, with the tt-mode arm of the union being
removed.

It turns out that usr2_handler was unused, so it is gone.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:05 -07:00
Jeff Dike ae2587e412 uml: style fixes pass 2
Formatting changes in the files which have been changed in the course
of removing CHOOSE_MODE.  These include:
	copyright updates
	header file trimming
	style fixes
	adding severity to printks

These changes should be entirely non-functional.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:05 -07:00
Jeff Dike 6aa802ce6a uml: throw out CHOOSE_MODE
The next stage after removing code which depends on CONFIG_MODE_TT is removing
the CHOOSE_MODE abstraction, which provided both compile-time and run-time
branching to either tt-mode or skas-mode code.

This patch removes choose-mode.h and all inclusions of it, and replaces all
CHOOSE_MODE invocations with the skas branch.  This leaves a number of trivial
functions which will be dealt with in a later patch.

There are some changes in the uaccess and tls support which go somewhat beyond
this and eliminate some of the now-redundant functions.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:05 -07:00
Jeff Dike 42fda66387 uml: throw out CONFIG_MODE_TT
This patchset throws out tt mode, which has been non-functional for a while.

This is done in phases, interspersed with code cleanups on the affected files.

The removal is done as follows:
	remove all code, config options, and files which depend on
CONFIG_MODE_TT
	get rid of the CHOOSE_MODE macro, which decided whether to
call tt-mode or skas-mode code, and replace invocations with their
skas portions
	replace all now-trivial procedures with their skas equivalents

There are now a bunch of now-redundant pieces of data structures, including
mode-specific pieces of the thread structure, pt_regs, and mm_context.  These
are all replaced with their skas-specific contents.

As part of the ongoing style compliance project, I made a style pass over all
files that were changed.  There are three such patches, one for each phase,
covering the files affected by that phase but no later ones.

I noticed that we weren't freeing the LDT state associated with a process when
it exited, so that's fixed in one of the later patches.

The last patch is a tidying patch which I've had for a while, but which caused
inexplicable crashes under tt mode.  Since that is no longer a problem, this
can now go in.

This patch:

Start getting rid of tt mode support.

This patch throws out CONFIG_MODE_TT and all config options, code, and files
which depend on it.

CONFIG_MODE_SKAS is gone and everything that depends on it is included
unconditionally.

The few changed lines are in re-written Kconfig help, lines which needed
something skas-related removed from them, and a few more which weren't
strictly deletions.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:05 -07:00
Christoph Hellwig 0ac1555915 m32r: convert to generic sys_ptrace
Convert m32r to the generic sys_ptrace.  The conversion requires an
architecture hook after ptrace_attach which this patch adds.  The hook
will also be needed for a conersion of ia64 to the generic ptrace code.

Thanks to Hirokazu Takata for fixing a bug in the first version of this
code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:04 -07:00
Mariusz Kozlowski 3165c0d16a include/asm-m32r/thread_info.h: kmalloc + memset conversion to kzalloc
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:03 -07:00
Sam Ravnborg b2b5d37d7e alpha: beautify vmlinux.lds
Introduced a consistent style in vmlinux.lds and it now matches the
soon-to-be common style for all arch's vmlinux.lds files.

In addition:
- Replaced hardcoded constant with PAGE_SIZE
- Fix page.h so PAGE_SIZE can be used from assembler and in lds files
- Move a few labels inside brackets so linker alignment will not
  make label point ot a too low address
- Replaced DWARF and STABS sections with definitions from asm-generic

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:03 -07:00
Christoph Hellwig a5f833f3c1 alpha: convert to generic sys_ptrace
This patch converts alpha to the generic sys_ptrace.  We use
force_successful_syscall_return to avoid having to pass the pt_regs pointer
down to the function.  I think the removal of the assemly stub is correct,
but I could only compile-test this patch, so please give it a spin before
commiting :)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:03 -07:00
Greg Ungerer fabc7f66ee M68KNOMMU: remove unused config symbol CONFIG_DISKtel
Remove unused config symbol CONFIG_DISKtel.
Pointed out by Robert P. J. Day <rpjday@mindspring.com>.

Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:03 -07:00
Mariusz Kozlowski 33bbf9597f include/asm-frv/thread_info.h: kmalloc + memset conversion to kzalloc
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:03 -07:00
Benjamin Herrenschmidt ef0fce8556 remove frv usage of flush_tlb_pgtables()
frv is the last user in the tree of that dubious hook, and it's my
understanding that it's not even needed.  It's only called by memory.c
free_pgd_range() which is always called within an mmu_gather, and
tlb_flush() on frv will do a flush_tlb_mm(), which from my reading of the
code, seems to do what flush_tlb_ptables() does, which is to clear the
cached PGE.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:03 -07:00
Adrian Bunk dbcb0f19c8 mm/mempolicy.c: cleanups
This patch contains the following cleanups:
- every file should include the headers containing the prototypes for
  its global functions
- make the follosing needlessly global functions static:
  - migrate_to_node()
  - do_mbind()
  - sp_alloc()
  - mpol_rebind_policy()

[akpm@linux-foundation.org: fix uninitialised var warning]
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:03 -07:00
Adrian Bunk d8dc74f212 mm/shmem.c: make 3 functions static
This patch makes three needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:03 -07:00
Adam Litke 54f9f80d65 hugetlb: Add hugetlb_dynamic_pool sysctl
The maximum size of the huge page pool can be controlled using the overall
size of the hugetlb filesystem (via its 'size' mount option).  However in the
common case the this will not be set as the pool is traditionally fixed in
size at boot time.  In order to maintain the expected semantics, we need to
prevent the pool expanding by default.

This patch introduces a new sysctl controlling dynamic pool resizing.  When
this is enabled the pool will expand beyond its base size up to the size of
the hugetlb filesystem.  It is disabled by default.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Dave McCracken <dave.mccracken@oracle.com>
Cc: William Irwin <bill.irwin@oracle.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:02 -07:00
Yasunori Goto 98f3cfc1dc memory hotplug: Hot-add with sparsemem-vmemmap
This patch is to avoid panic when memory hot-add is executed with
sparsemem-vmemmap.  Current vmemmap-sparsemem code doesn't support memory
hot-add.  Vmemmap must be populated when hot-add.  This is for
2.6.23-rc2-mm2.

Todo: # Even if this patch is applied, the message "[xxxx-xxxx] potential
        offnode page_structs" is displayed. To allocate memmap on its node,
        memmap (and pgdat) must be initialized itself like chicken and
        egg relationship.

      # vmemmap_unpopulate will be necessary for followings.
         - For cancel hot-add due to error.
         - For unplug.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:02 -07:00
KAMEZAWA Hiroyuki 48e94196a5 fix memory hot remove not configured case.
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them. This is against 2.6.23-rc6-mm1.

 - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
 - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
   which returns -EINVAL.
 - removed remove_pages() only used in powerpc.
 - removed no-op remove_memory() in i386, sh, sparc64, x86_64.

 - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
   to return -EINVAL.

Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:02 -07:00
KAMEZAWA Hiroyuki 0c0e619589 memory unplug: page offline
Logic.
 - set all pages in  [start,end)  as isolated migration-type.
   by this, all free pages in the range will be not-for-use.
 - Migrate all LRU pages in the range.
 - Test all pages in the range's refcnt is zero or not.

Todo:
 - allocate migration destination page from better area.
 - confirm page_count(page)== 0 && PageReserved(page) page is safe to be freed..
 (I don't like this kind of page but..
 - Find out pages which cannot be migrated.
 - more running tests.
 - Use reclaim for unplugging other memory type area.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:02 -07:00
KAMEZAWA Hiroyuki a5d76b54a3 memory unplug: page isolation
Implement generic chunk-of-pages isolation method by using page grouping ops.

This patch add MIGRATE_ISOLATE to MIGRATE_TYPES. By this
 - MIGRATE_TYPES increases.
 - bitmap for migratetype is enlarged.

pages of MIGRATE_ISOLATE migratetype will not be allocated even if it is free.
By this, you can isolated *freed* pages from users. How-to-free pages is not
a purpose of this patch. You may use reclaim and migrate codes to free pages.

If start_isolate_page_range(start,end) is called,
 - migratetype of the range turns to be MIGRATE_ISOLATE  if
   its type is MIGRATE_MOVABLE. (*) this check can be updated if other
   memory reclaiming works make progress.
 - MIGRATE_ISOLATE is not on migratetype fallback list.
 - All free pages and will-be-freed pages are isolated.
To check all pages in the range are isolated or not,  use test_pages_isolated(),
To cancel isolation, use undo_isolate_page_range().

Changes V6 -> V7
 - removed unnecessary #ifdef

There are HOLES_IN_ZONE handling codes...I'm glad if we can remove them..

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:02 -07:00
KAMEZAWA Hiroyuki 75884fb1c6 memory unplug: memory hotplug cleanup
A clean up patch for "scanning memory resource [start, end)" operation.

Now, find_next_system_ram() function is used in memory hotplug, but this
interface is not easy to use and codes are complicated.

This patch adds walk_memory_resouce(start,len,arg,func) function.
The function 'func' is called per valid memory resouce range in [start,pfn).

[pbadari@us.ibm.com: Error handling in walk_memory_resource()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:01 -07:00
Christoph Lameter 42a9fdbb12 SLUB: Optimize cacheline use for zeroing
We touch a cacheline in the kmem_cache structure for zeroing to get the
size. However, the hot paths in slab_alloc and slab_free do not reference
any other fields in kmem_cache, so we may have to just bring in the
cacheline for this one access.

Add a new field to kmem_cache_cpu that contains the object size. That
cacheline must already be used in the hotpaths. So we save one cacheline
on every slab_alloc if we zero.

We need to update the kmem_cache_cpu object size if an aliasing operation
changes the objsize of an non debug slab.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:01 -07:00
Christoph Lameter 4c93c355d5 SLUB: Place kmem_cache_cpu structures in a NUMA aware way
The kmem_cache_cpu structures introduced are currently an array placed in the
kmem_cache struct. Meaning the kmem_cache_cpu structures are overwhelmingly
on the wrong node for systems with a higher amount of nodes. These are
performance critical structures since the per node information has
to be touched for every alloc and free in a slab.

In order to place the kmem_cache_cpu structure optimally we put an array
of pointers to kmem_cache_cpu structs in kmem_cache (similar to SLAB).

However, the kmem_cache_cpu structures can now be allocated in a more
intelligent way.

We would like to put per cpu structures for the same cpu but different
slab caches in cachelines together to save space and decrease the cache
footprint. However, the slab allocators itself control only allocations
per node. We set up a simple per cpu array for every processor with
100 per cpu structures which is usually enough to get them all set up right.
If we run out then we fall back to kmalloc_node. This also solves the
bootstrap problem since we do not have to use slab allocator functions
early in boot to get memory for the small per cpu structures.

Pro:
	- NUMA aware placement improves memory performance
	- All global structures in struct kmem_cache become readonly
	- Dense packing of per cpu structures reduces cacheline
	  footprint in SMP and NUMA.
	- Potential avoidance of exclusive cacheline fetches
	  on the free and alloc hotpath since multiple kmem_cache_cpu
	  structures are in one cacheline. This is particularly important
	  for the kmalloc array.

Cons:
	- Additional reference to one read only cacheline (per cpu
	  array of pointers to kmem_cache_cpu) in both slab_alloc()
	  and slab_free().

[akinobu.mita@gmail.com: fix cpu hotplug offline/online path]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "Pekka Enberg" <penberg@cs.helsinki.fi>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:01 -07:00
Christoph Lameter b3fba8da65 SLUB: Move page->offset to kmem_cache_cpu->offset
We need the offset from the page struct during slab_alloc and slab_free. In
both cases we also reference the cacheline of the kmem_cache_cpu structure.
We can therefore move the offset field into the kmem_cache_cpu structure
freeing up 16 bits in the page struct.

Moving the offset allows an allocation from slab_alloc() without touching the
page struct in the hot path.

The only thing left in slab_free() that touches the page struct cacheline for
per cpu freeing is the checking of SlabDebug(page). The next patch deals with
that.

Use the available 16 bits to broaden page->inuse. More than 64k objects per
slab become possible and we can get rid of the checks for that limitation.

No need anymore to shrink the order of slabs if we boot with 2M sized slabs
(slub_min_order=9).

No need anymore to switch off the offset calculation for very large slabs
since the field in the kmem_cache_cpu structure is 32 bits and so the offset
field can now handle slab sizes of up to 8GB.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:01 -07:00
Christoph Lameter 8e65d24c7c SLUB: Do not use page->mapping
After moving the lockless_freelist to kmem_cache_cpu we no longer need
page->lockless_freelist. Restructure the use of the struct page fields in
such a way that we never touch the mapping field.

This is turn allows us to remove the special casing of SLUB when determining
the mapping of a page (needed for corner cases of virtual caches machines that
need to flush caches of processors mapping a page).

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:01 -07:00
Christoph Lameter dfb4f09609 SLUB: Avoid page struct cacheline bouncing due to remote frees to cpu slab
A remote free may access the same page struct that also contains the lockless
freelist for the cpu slab. If objects have a short lifetime and are freed by
a different processor then remote frees back to the slab from which we are
currently allocating are frequent. The cacheline with the page struct needs
to be repeately acquired in exclusive mode by both the allocating thread and
the freeing thread. If this is frequent enough then performance will suffer
because of cacheline bouncing.

This patchset puts the lockless_freelist pointer in its own cacheline. In
order to make that happen we introduce a per cpu structure called
kmem_cache_cpu.

Instead of keeping an array of pointers to page structs we now keep an array
to a per cpu structure that--among other things--contains the pointer to the
lockless freelist. The freeing thread can then keep possession of exclusive
access to the page struct cacheline while the allocating thread keeps its
exclusive access to the cacheline containing the per cpu structure.

This works as long as the allocating cpu is able to service its request
from the lockless freelist. If the lockless freelist runs empty then the
allocating thread needs to acquire exclusive access to the cacheline with
the page struct lock the slab.

The allocating thread will then check if new objects were freed to the per
cpu slab. If so it will keep the slab as the cpu slab and continue with the
recently remote freed objects. So the allocating thread can take a series
of just freed remote pages and dish them out again. Ideally allocations
could be just recycling objects in the same slab this way which will lead
to an ideal allocation / remote free pattern.

The number of objects that can be handled in this way is limited by the
capacity of one slab. Increasing slab size via slub_min_objects/
slub_max_order may increase the number of objects and therefore performance.

If the allocating thread runs out of objects and finds that no objects were
put back by the remote processor then it will retrieve a new slab (from the
partial lists or from the page allocator) and start with a whole
new set of objects while the remote thread may still be freeing objects to
the old cpu slab. This may then repeat until the new slab is also exhausted.
If remote freeing has freed objects in the earlier slab then that earlier
slab will now be on the partial freelist and the allocating thread will
pick that slab next for allocation. So the loop is extended. However,
both threads need to take the list_lock to make the swizzling via
the partial list happen.

It is likely that this kind of scheme will keep the objects being passed
around to a small set that can be kept in the cpu caches leading to increased
performance.

More code cleanups become possible:

- Instead of passing a cpu we can now pass a kmem_cache_cpu structure around.
  Allows reducing the number of parameters to various functions.
- Can define a new node_match() function for NUMA to encapsulate locality
  checks.

Effect on allocations:

Cachelines touched before this patch:

	Write:	page cache struct and first cacheline of object

Cachelines touched after this patch:

	Write:	kmem_cache_cpu cacheline and first cacheline of object
	Read: page cache struct (but see later patch that avoids touching
		that cacheline)

The handling when the lockless alloc list runs empty gets to be a bit more
complicated since another cacheline has now to be written to. But that is
halfway out of the hot path.

Effect on freeing:

Cachelines touched before this patch:

	Write: page_struct and first cacheline of object

Cachelines touched after this patch depending on how we free:

  Write(to cpu_slab):	kmem_cache_cpu struct and first cacheline of object
  Write(to other):	page struct and first cacheline of object

  Read(to cpu_slab):	page struct to id slab etc. (but see later patch that
  			avoids touching the page struct on free)
  Read(to other):	cpu local kmem_cache_cpu struct to verify its not
  			the cpu slab.

Summary:

Pro:
	- Distinct cachelines so that concurrent remote frees and local
	  allocs on a cpuslab can occur without cacheline bouncing.
	- Avoids potential bouncing cachelines because of neighboring
	  per cpu pointer updates in kmem_cache's cpu_slab structure since
	  it now grows to a cacheline (Therefore remove the comment
	  that talks about that concern).

Cons:
	- Freeing objects now requires the reading of one additional
	  cacheline. That can be mitigated for some cases by the following
	  patches but its not possible to completely eliminate these
	  references.

	- Memory usage grows slightly.

	The size of each per cpu object is blown up from one word
	(pointing to the page_struct) to one cacheline with various data.
	So this is NR_CPUS*NR_SLABS*L1_BYTES more memory use. Lets say
	NR_SLABS is 100 and a cache line size of 128 then we have just
	increased SLAB metadata requirements by 12.8k per cpu.
	(Another later patch reduces these requirements)

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:01 -07:00
Mel Gorman 467c996c1e Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo
This patch provides fragmentation avoidance statistics via /proc/pagetypeinfo.
 The information is collected only on request so there is no runtime overhead.
 The statistics are in three parts:

The first part prints information on the size of blocks that pages are
being grouped on and looks like

Page block order: 10
Pages per block:  1024

The second part is a more detailed version of /proc/buddyinfo and looks like

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type  Reclaimable      1      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Reserve      0      4      4      0      0      0      0      1      0      1      0
Node    0, zone   Normal, type    Unmovable    111      8      4      4      2      3      1      0      0      0      0
Node    0, zone   Normal, type  Reclaimable    293     89      8      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Movable      1      6     13      9      7      6      3      0      0      0      0
Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      4

The third part looks like

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve
Node 0, zone      DMA            0            1            2            1
Node 0, zone   Normal            3           17           94            4

To walk the zones within a node with interrupts disabled, walk_zones_in_node()
is introduced and shared between /proc/buddyinfo, /proc/zoneinfo and
/proc/pagetypeinfo to reduce code duplication.  It seems specific to what
vmstat.c requires but could be broken out as a general utility function in
mmzone.c if there were other other potential users.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Mel Gorman d9c2340052 Do not depend on MAX_ORDER when grouping pages by mobility
Currently mobility grouping works at the MAX_ORDER_NR_PAGES level.  This makes
sense for the majority of users where this is also the huge page size.
However, on platforms like ia64 where the huge page size is runtime
configurable it is desirable to group at a lower order.  On x86_64 and
occasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES.

This patch groups pages together based on the value of HUGETLB_PAGE_ORDER.  It
uses a compile-time constant if possible and a variable where the huge page
size is runtime configurable.

It is assumed that grouping should be done at the lowest sensible order and
that the user would not want to override this.  If this is not true,
page_block order could be forced to a variable initialised via a boot-time
kernel parameter.

One potential issue with this patch is that IA64 now parses hugepagesz with
early_param() instead of __setup().  __setup() is called after the memory
allocator has been initialised and the pageblock bitmaps already setup.  In
tests on one IA64 there did not seem to be any problem with using
early_param() and in fact may be more correct as it guarantees the parameter
is handled before the parsing of hugepages=.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Mel Gorman 64c5e135bf don't group high order atomic allocations
Grouping high-order atomic allocations together was intended to allow
bursty users of atomic allocations to work such as e1000 in situations
where their preallocated buffers were depleted.  This did not work in at
least one case with a wireless network adapter needing order-1 allocations
frequently.  To resolve that, the free pages used for min_free_kbytes were
moved to separate contiguous blocks with the patch
bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.

It is felt that keeping the free pages in the same contiguous blocks should
be sufficient for bursty short-lived high-order atomic allocations to
succeed, maybe even with the e1000.  Even if there is a failure, increasing
the value of min_free_kbytes will free pages as contiguous bloks in
contrast to the standard buddy allocator which makes no attempt to keep the
minimum number of free pages contiguous.

This patch backs out grouping high order atomic allocations together to
determine if it is really needed or not.  If a new report comes in about
high-order atomic allocations failing, the feature can be reintroduced to
determine if it fixes the problem or not.  As a side-effect, this patch
reduces by 1 the number of bits required to track the mobility type of
pages within a MAX_ORDER_NR_PAGES block.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Mel Gorman ac0e5b7a6b remove PAGE_GROUP_BY_MOBILITY
Grouping pages by mobility can be disabled at compile-time. This was
considered undesirable by a number of people. However, in the current stack of
patches, it is not a simple case of just dropping the configurable patch as it
would cause merge conflicts.  This patch backs out the configuration option.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Mel Gorman 56fd56b868 Bias the location of pages freed for min_free_kbytes in the same MAX_ORDER_NR_PAGES blocks
The standard buddy allocator always favours the smallest block of pages.
The effect of this is that the pages free to satisfy min_free_kbytes tends
to be preserved since boot time at the same location of memory ffor a very
long time and as a contiguous block.  When an administrator sets the
reserve at 16384 at boot time, it tends to be the same MAX_ORDER blocks
that remain free.  This allows the occasional high atomic allocation to
succeed up until the point the blocks are split.  In practice, it is
difficult to split these blocks but when they do split, the benefit of
having min_free_kbytes for contiguous blocks disappears.  Additionally,
increasing min_free_kbytes once the system has been running for some time
has no guarantee of creating contiguous blocks.

On the other hand, CONFIG_PAGE_GROUP_BY_MOBILITY favours splitting large
blocks when there are no free pages of the appropriate type available.  A
side-effect of this is that all blocks in memory tends to be used up and
the contiguous free blocks from boot time are not preserved like in the
vanilla allocator.  This can cause a problem if a new caller is unwilling
to reclaim or does not reclaim for long enough.

A failure scenario was found for a wireless network device allocating
order-1 atomic allocations but the allocations were not intense or frequent
enough for a whole block of pages to be preserved for MIGRATE_HIGHALLOC.
This was reproduced on a desktop by booting with mem=256mb, forcing the
driver to allocate at order-1, running a bittorrent client (downloading a
debian ISO) and building a kernel with -j2.

This patch addresses the problem on the desktop machine booted with
mem=256mb.  It works by setting aside a reserve of MAX_ORDER_NR_PAGES
blocks, the number of which depends on the value of min_free_kbytes.  These
blocks are only fallen back to when there is no other free pages.  Then the
smallest possible page is used just like the normal buddy allocator instead
of the largest possible page to preserve contiguous pages The pages in free
lists in the reserve blocks are never taken for another migrate type.  The
results is that even if min_free_kbytes is set to a low value, contiguous
blocks will be preserved in the MIGRATE_RESERVE blocks.

This works better than the vanilla allocator because if min_free_kbytes is
increased, a new reserve block will be chosen based on the location of
reclaimable pages and the block will free up as contiguous pages.  In the
vanilla allocator, no effort is made to target a block of pages to free as
contiguous pages and min_free_kbytes pages are scattered randomly.

This effect has been observed on the test machine.  min_free_kbytes was set
initially low but it was kept as a contiguous free block within
MIGRATE_RESERVE.  min_free_kbytes was then set to a higher value and over a
period of time, the free blocks were within the reserve and coalescing.
How long it takes to free up depends on how quickly LRU is rotating.
Amusingly, this means that more activity will free the blocks faster.

This mechanism potentially replaces MIGRATE_HIGHALLOC as it may be more
effective than grouping contiguous free pages together.  It all depends on
whether the number of active atomic high allocations exceeds
min_free_kbytes or not.  If the number of active allocations exceeds
min_free_kbytes, it's worth it but maybe in that situation, min_free_kbytes
should be set higher.  Once there are no more reports of allocation
failures, a patch will be submitted that backs out MIGRATE_HIGHALLOC and
see if the reports stay missing.

Credit to Mariusz Kozlowski for discovering the problem, describing the
failure scenario and testing patches and scenarios.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Mel Gorman 5c0e306647 Fix corruption of memmap on IA64 SPARSEMEM when mem_section is not a power of 2
There are problems in the use of SPARSEMEM and pageblock flags that causes
problems on ia64.

The first part of the problem is that units are incorrect in
SECTION_BLOCKFLAGS_BITS computation.  This results in a map_section's
section_mem_map being treated as part of a bitmap which isn't good.  This
was evident with an invalid virtual address when mem_init attempted to free
bootmem pages while relinquishing control from the bootmem allocator.

The second part of the problem occurs because the pageblock flags bitmap is
be located with the mem_section.  The SECTIONS_PER_ROOT computation using
sizeof (mem_section) may not be a power of 2 depending on the size of the
bitmap.  This renders masks and other such things not power of 2 base.
This issue was seen with SPARSEMEM_EXTREME on ia64.  This patch moves the
bitmap outside of mem_section and uses a pointer instead in the
mem_section.  The bitmaps are allocated when the section is being
initialised.

Note that sparse_early_usemap_alloc() does not use alloc_remap() like
sparse_early_mem_map_alloc().  The allocation required for the bitmap on
x86, the only architecture that uses alloc_remap is typically smaller than
a cache line.  alloc_remap() pads out allocations to the cache size which
would be a needless waste.

Credit to Bob Picco for identifying the original problem and effecting a
fix for the SECTION_BLOCKFLAGS_BITS calculation.  Credit to Andy Whitcroft
for devising the best way of allocating the bitmaps only when required for
the section.

[wli@holomorphy.com: warning fix]
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: William Irwin <bill.irwin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Mel Gorman e010487dbe Group high-order atomic allocations
In rare cases, the kernel needs to allocate a high-order block of pages
without sleeping.  For example, this is the case with e1000 cards configured
to use jumbo frames.  Migrating or reclaiming pages in this situation is not
an option.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE.  The MIGRATE_HIGHATOMIC type are exactly what they sound
like.  Care is taken that pages of other migrate types do not use the same
blocks as high-order atomic allocations.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Mel Gorman e12ba74d8f Group short-lived and reclaimable kernel allocations
This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations.  When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
them and re-reading the information from elsewhere.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Mel Gorman b92a6edd4b Add a configure option to group pages by mobility
The grouping mechanism has some memory overhead and a more complex allocation
path.  This patch allows the strategy to be disabled for small memory systems
or if it is known the workload is suffering because of the strategy.  It also
acts to show where the page groupings strategy interacts with the standard
buddy allocator.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:59 -07:00
Mel Gorman b2a0ac8875 Split the free lists for movable and unmovable allocations
This patch adds the core of the fragmentation reduction strategy.  It works by
grouping pages together based on their ability to migrate or be reclaimed.
Basically, it works by breaking the list in zone->free_area list into
MIGRATE_TYPES number of lists.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:59 -07:00
Mel Gorman 835c134ec4 Add a bitmap that is used to track flags affecting a block of pages
Here is the latest revision of the anti-fragmentation patches.  Of particular
note in this version is special treatment of high-order atomic allocations.
Care is taken to group them together and avoid grouping pages of other types
near them.  Artifical tests imply that it works.  I'm trying to get the
hardware together that would allow setting up of a "real" test.  If anyone
already has a setup and test that can trigger the atomic-allocation problem,
I'd appreciate a test of these patches and a report.  The second major change
is that these patches will apply cleanly with patches that implement
anti-fragmentation through zones.

kernbench shows effectively no performance difference varying between -0.2%
and +2% on a variety of test machines.  Success rates for huge page allocation
are dramatically increased.  For example, on a ppc64 machine, the vanilla
kernel was only able to allocate 1% of memory as a hugepage and this was due
to a single hugepage reserved as min_free_kbytes.  With these patches applied,
17% was allocatable as superpages.  With reclaim-related fixes from Andy
Whitcroft, it was 40% and further reclaim-related improvements should increase
this further.

Changelog Since V28
o Group high-order atomic allocations together
o It is no longer required to set min_free_kbytes to 10% of memory. A value
  of 16384 in most cases will be sufficient
o Now applied with zone-based anti-fragmentation
o Fix incorrect VM_BUG_ON within buffered_rmqueue()
o Reorder the stack so later patches do not back out work from earlier patches
o Fix bug were journal pages were being treated as movable
o Bias placement of non-movable pages to lower PFNs
o More agressive clustering of reclaimable pages in reactions to workloads
  like updatedb that flood the size of inode caches

Changelog Since V27

o Renamed anti-fragmentation to Page Clustering. Anti-fragmentation was giving
  the mistaken impression that it was the 100% solution for high order
  allocations. Instead, it greatly increases the chances high-order
  allocations will succeed and lays the foundation for defragmentation and
  memory hot-remove to work properly
o Redefine page groupings based on ability to migrate or reclaim instead of
  basing on reclaimability alone
o Get rid of spurious inits
o Per-cpu lists are no longer split up per-type. Instead the per-cpu list is
  searched for a page of the appropriate type
o Added more explanation commentary
o Fix up bug in pageblock code where bitmap was used before being initalised

Changelog Since V26
o Fix double init of lists in setup_pageset

Changelog Since V25
o Fix loop order of for_each_rclmtype_order so that order of loop matches args
o gfpflags_to_rclmtype uses gfp_t instead of unsigned long
o Rename get_pageblock_type() to get_page_rclmtype()
o Fix alignment problem in move_freepages()
o Add mechanism for assigning flags to blocks of pages instead of page->flags
o On fallback, do not examine the preferred list of free pages a second time

The purpose of these patches is to reduce external fragmentation by grouping
pages of related types together.  When pages are migrated (or reclaimed under
memory pressure), large contiguous pages will be freed.

This patch works by categorising allocations by their ability to migrate;

Movable - The pages may be moved with the page migration mechanism. These are
	generally userspace pages.

Reclaimable - These are allocations for some kernel caches that are
	reclaimable or allocations that are known to be very short-lived.

Unmovable - These are pages that are allocated by the kernel that
	are not trivially reclaimed. For example, the memory allocated for a
	loaded module would be in this category. By default, allocations are
	considered to be of this type

HighAtomic - These are high-order allocations belonging to callers that
	cannot sleep or perform any IO. In practice, this is restricted to
	jumbo frame allocation for network receive. It is assumed that the
	allocations are short-lived

Instead of having one MAX_ORDER-sized array of free lists in struct free_area,
there is one for each type of reclaimability.  Once a 2^MAX_ORDER block of
pages is split for a type of allocation, it is added to the free-lists for
that type, in effect reserving it.  Hence, over time, pages of the different
types can be clustered together.

When the preferred freelists are expired, the largest possible block is taken
from an alternative list.  Buddies that are split from that large block are
placed on the preferred allocation-type freelists to mitigate fragmentation.

This implementation gives best-effort for low fragmentation in all zones.
Ideally, min_free_kbytes needs to be set to a value equal to 4 * (1 <<
(MAX_ORDER-1)) pages in most cases.  This would be 16384 on x86 and x86_64 for
example.

Our tests show that about 60-70% of physical memory can be allocated on a
desktop after a few days uptime.  In benchmarks and stress tests, we are
finding that 80% of memory is available as contiguous blocks at the end of the
test.  To compare, a standard kernel was getting < 1% of memory as large pages
on a desktop and about 8-12% of memory as large pages at the end of stress
tests.

Following this email are 12 patches that implement thie page grouping feature.
 The first patch introduces a mechanism for storing flags related to a whole
block of pages.  Then allocations are split between movable and all other
allocations.  Following that are patches to deal with per-cpu pages and make
the mechanism configurable.  The next patch moves free pages between lists
when partially allocated blocks are used for pages of another migrate type.
The second last patch groups reclaimable kernel allocations such as inode
caches together.  The final patch related to groupings keeps high-order atomic
allocations.

The last two patches are more concerned with control of fragmentation.  The
second last patch biases placement of non-movable allocations towards the
start of memory.  This is with a view of supporting memory hot-remove of DIMMs
with higher PFNs in the future.  The biasing could be enforced a lot heavier
but it would cost.  The last patch agressively clusters reclaimable pages like
inode caches together.

The fragmentation reduction strategy needs to track if pages within a block
can be moved or reclaimed so that pages are freed to the appropriate list.
This patch adds a bitmap for flags affecting a whole a MAX_ORDER block of
pages.

In non-SPARSEMEM configurations, the bitmap is stored in the struct zone and
allocated during initialisation.  SPARSEMEM statically allocates the bitmap in
a struct mem_section so that bitmaps do not have to be resized during memory
hotadd.  This wastes a small amount of memory per unused section (usually
sizeof(unsigned long)) but the complexity of dynamically allocating the memory
is quite high.

Additional credit to Andy Whitcroft who reviewed up an earlier implementation
of the mechanism an suggested how to make it a *lot* cleaner.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:59 -07:00
KAMEZAWA Hiroyuki 954ffcb35f flush icache before set_pte() on ia64: flush icache at set_pte
Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after*
set_pte().  This is too late.  This patch removes lazy_mmu_prot_update and
add modfied set_pte() for flushing if necessary.

This patch flush icache of a page when
	new pte has exec bit.
	&& new pte has present bit
	&& new pte is user's page.
	&& (old *ptep is not present
            || new pte's pfn is not same to old *ptep's ptn)
	&& new pte's page has no Pg_arch_1 bit.
	   Pg_arch_1 is set when a page is cache consistent.

I think this condition checks are much easier to understand than considering
"Where sync_icache_dcache() should be inserted ?".

pte_user() for ia64 was removed by http://lkml.org/lkml/2007/6/12/67 as
clean-up. So, I added it again.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:59 -07:00
Christoph Lameter 6cb062296f Categorize GFP flags
The function of GFP_LEVEL_MASK seems to be unclear.  In order to clear up
the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP
flags:

GFP_RECLAIM_MASK	Flags used to control page allocator reclaim behavior.

GFP_CONSTRAINT_MASK	Flags used to limit where allocations can occur.

GFP_SLAB_BUG_MASK	Flags that the slab allocator BUG()s on.

These replace the uses of GFP_LEVEL mask in the slab allocators and in
vmalloc.c.

The use of the flags not included in these sets may occur as a result of a
slab allocation standing in for a page allocation when constructing scatter
gather lists.  Extraneous flags are cleared and not passed through to the
page allocator.  __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will
now be ignored if passed to a slab allocator.

Change the allocation of allocator meta data in SLAB and vmalloc to not
pass through flags listed in GFP_CONSTRAINT_MASK.  SLAB already removes the
__GFP_THISNODE flag for such allocations.  Generalize that to also cover
vmalloc.  The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL.

The impact of allocator metadata placement on access latency to the
cachelines of the object itself is minimal since metadata is only
referenced on alloc and free.  The attempt is still made to place the meta
data optimally but we consistently allow fallback both in SLAB and vmalloc
(SLUB does not need to allocate metadata like that).

Allocator metadata may serve multiple in kernel users and thus should not
be subject to the limitations arising from a single allocation context.

[akpm@linux-foundation.org: fix fallback_alloc()]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:59 -07:00
Christoph Lameter 0e1e7c7a73 Memoryless nodes: Use N_HIGH_MEMORY for cpusets
cpusets try to ensure that any node added to a cpuset's mems_allowed is
on-line and contains memory.  The assumption was that online nodes contained
memory.  Thus, it is possible to add memoryless nodes to a cpuset and then add
tasks to this cpuset.  This results in continuous series of oom-kill and
apparent system hang.

Change cpusets to use node_states[N_HIGH_MEMORY] [a.k.a.  node_memory_map] in
place of node_online_map when vetting memories.  Return error if admin
attempts to write a non-empty mems_allowed node mask containing only
memoryless-nodes.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:59 -07:00
Christoph Lameter 523b945855 Memoryless nodes: Fix GFP_THISNODE behavior
GFP_THISNODE checks that the zone selected is within the pgdat (node) of the
first zone of a nodelist.  That only works if the node has memory.  A
memoryless node will have its first node on another pgdat (node).

GFP_THISNODE currently will return simply memory on the first pgdat.  Thus it
is returning memory on other nodes.  GFP_THISNODE should fail if there is no
local memory on a node.

Add a new set of zonelists for each node that only contain the nodes that
belong to the zones itself so that no fallback is possible.

Then modify gfp_type to pickup the right zone based on the presence of
__GFP_THISNODE.

Drop the existing GFP_THISNODE checks from the page_allocators hot path.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:59 -07:00
Christoph Lameter 37c0708dbe Memoryless nodes: Add N_CPU node state
We need the check for a node with cpu in zone reclaim.  Zone reclaim will not
allow remote zone reclaim if a node has a cpu.

[Lee.Schermerhorn@hp.com: Move setup of N_CPU node state mask]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:58 -07:00
Christoph Lameter 7ea1530ab3 Memoryless nodes: introduce mask of nodes with memory
It is necessary to know if nodes have memory since we have recently begun to
add support for memoryless nodes.  For that purpose we introduce a two new
node states: N_HIGH_MEMORY and N_NORMAL_MEMORY.

A node has its bit in N_HIGH_MEMORY set if it has any memory regardless of the
type of mmemory.  If a node has memory then it has at least one zone defined
in its pgdat structure that is located in the pgdat itself.

A node has its bit in N_NORMAL_MEMORY set if it has a lower zone than
ZONE_HIGHMEM.  This means it is possible to allocate memory that is not
subject to kmap.

N_HIGH_MEMORY and N_NORMAL_MEMORY can then be used in various places to insure
that we do the right thing when we encounter a memoryless node.

[akpm@linux-foundation.org: build fix]
[Lee.Schermerhorn@hp.com: update N_HIGH_MEMORY node state for memory hotadd]
[y-goto@jp.fujitsu.com: Fix memory hotplug + sparsemem build]
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:58 -07:00
Christoph Lameter 1380891071 Memoryless nodes: Generic management of nodemasks for various purposes
Why do we need to support memoryless nodes?

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0   memory AAA MB
> Node1: CPU1   memory AAA MB
> Node2: CPU2   memory AAA MB
> Node3: CPU3   memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB)  and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.

Christoph Lameter <clameter@sgi.com> wrote:

> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.

Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory".  When we configure with <100% CLM, the "missing
> percentages" are interleaved by hardware on a cache-line granularity to
> improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)].  When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node.  It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone.  [Interleaved memory resides at
> phys addr 0].  The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes.  This
> can result in memoryless nodes, as well.
>

This patch:

Preparation for memoryless node patches.

Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.

Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.

[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:58 -07:00
Nick Piggin 55144768e1 fs: remove some AOP_TRUNCATED_PAGE
prepare/commit_write no longer returns AOP_TRUNCATED_PAGE since OCFS2 and
GFS2 were converted to the new aops, so we can make some simplifications
for that.

[michal.k.k.piotrowski@gmail.com: fix warning]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:58 -07:00
Nick Piggin 03158cd7eb fs: restore nobh
Implement nobh in new aops.  This is a bit tricky.  FWIW, nobh_truncate is
now implemented in a way that does not create blocks in sparse regions,
which is a silly thing for it to have been doing (isn't it?)

ext2 survives fsx and fsstress. jfs is converted as well... ext3
should be easy to do (but not done yet).

[akpm@linux-foundation.org: coding-style fixes]
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:58 -07:00
Nick Piggin a20fa20c54 With reiserfs no longer using the weird generic_cont_expand, remove it completely.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:56 -07:00
Nick Piggin 89e107877b fs: new cont helpers
Rework the generic block "cont" routines to handle the new aops.  Supporting
cont_prepare_write would take quite a lot of code to support, so remove it
instead (and we later convert all filesystems to use it).

write_begin gets passed AOP_FLAG_CONT_EXPAND when called from
generic_cont_expand, so filesystems can avoid the old hacks they used.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:55 -07:00
Nick Piggin afddba49d1 fs: introduce write_begin, write_end, and perform_write aops
These are intended to replace prepare_write and commit_write with more
flexible alternatives that are also able to avoid the buffered write
deadlock problems efficiently (which prepare_write is unable to do).

[mark.fasheh@oracle.com: API design contributions, code review and fixes]
[akpm@linux-foundation.org: various fixes]
[dmonakhov@sw.ru: new aop block_write_begin fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:55 -07:00
Nick Piggin 2f718ffc16 mm: buffered write iterator
Add an iterator data structure to operate over an iovec.  Add usercopy
operators needed by generic_file_buffered_write, and convert that function
over.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:55 -07:00
Nick Piggin 08291429cf mm: fix pagecache write deadlocks
Modify the core write() code so that it won't take a pagefault while holding a
lock on the pagecache page. There are a number of different deadlocks possible
if we try to do such a thing:

1.  generic_buffered_write
2.   lock_page
3.    prepare_write
4.     unlock_page+vmtruncate
5.     copy_from_user
6.      mmap_sem(r)
7.       handle_mm_fault
8.        lock_page (filemap_nopage)
9.    commit_write
10.  unlock_page

a. sys_munmap / sys_mlock / others
b.  mmap_sem(w)
c.   make_pages_present
d.    get_user_pages
e.     handle_mm_fault
f.      lock_page (filemap_nopage)

2,8	- recursive deadlock if page is same
2,8;2,8	- ABBA deadlock is page is different
2,6;b,f	- ABBA deadlock if page is same

The solution is as follows:
1.  If we find the destination page is uptodate, continue as normal, but use
    atomic usercopies which do not take pagefaults and do not zero the uncopied
    tail of the destination. The destination is already uptodate, so we can
    commit_write the full length even if there was a partial copy: it does not
    matter that the tail was not modified, because if it is dirtied and written
    back to disk it will not cause any problems (uptodate *means* that the
    destination page is as new or newer than the copy on disk).

1a. The above requires that fault_in_pages_readable correctly returns access
    information, because atomic usercopies cannot distinguish between
    non-present pages in a readable mapping, from lack of a readable mapping.

2.  If we find the destination page is non uptodate, unlock it (this could be
    made slightly more optimal), then allocate a temporary page to copy the
    source data into. Relock the destination page and continue with the copy.
    However, instead of a usercopy (which might take a fault), copy the data
    from the pinned temporary page via the kernel address space.

(also, rename maxlen to seglen, because it was confusing)

This increases the CPU/memory copy cost by almost 50% on the affected
workloads. That will be solved by introducing a new set of pagecache write
aops in a subsequent patch.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:54 -07:00
Lee Schermerhorn 754af6f5a8 Mem Policy: add MPOL_F_MEMS_ALLOWED get_mempolicy() flag
Allow an application to query the memories allowed by its context.

Updated numa_memory_policy.txt to mention that applications can use this to
obtain allowed memories for constructing valid policies.

TODO:  update out-of-tree libnuma wrapper[s], or maybe add a new
wrapper--e.g.,  numa_get_mems_allowed() ?

Also, update numa syscall man pages.

Tested with memtoy V>=0.13.

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:54 -07:00
Martin Schwidefsky c92ff1bde0 move mm_struct and vm_area_struct
Move the definitions of struct mm_struct and struct vma_area_struct to
include/mm_types.h.  This allows to define more function in asm/pgtable.h
and friends with inline assemblies instead of macros.  Compile tested on
i386, powerpc, powerpc64, s390-32, s390-64 and x86_64.

[aurelien@aurel32.net: build fix]
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:53 -07:00
Nick Piggin c0bc9875b7 radix-tree: use indirect bit
Rather than sign direct radix-tree pointers with a special bit, sign the
indirect one that hangs off the root.  This means that, given a lookup_slot
operation, the invalid result will be differentiated from the valid
(previously, valid results could have the bit either set or clear).

This does not affect slot lookups which occur under lock -- they can never
return an invalid result.  Is needed in future for lockless pagecache.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:53 -07:00
Nick Piggin 557ed1fa26 remove ZERO_PAGE
The commit b5810039a5 contains the note

  A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
  (and thus mapcounted and count towards shared rss).  These writes to
  the struct page could cause excessive cacheline bouncing on big
  systems.  There are a number of ways this could be addressed if it is
  an issue.

And indeed this cacheline bouncing has shown up on large SGI systems.
There was a situation where an Altix system was essentially livelocked
tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
This situation can be avoided in userspace, but it does highlight the
potential scalability problem with refcounting ZERO_PAGE, and corner
cases where it can really hurt (we don't want the system to livelock!).

There are several broad ways to fix this problem:
1. add back some special casing to avoid refcounting ZERO_PAGE
2. per-node or per-cpu ZERO_PAGES
3. remove the ZERO_PAGE completely

I will argue for 3. The others should also fix the problem, but they
result in more complex code than does 3, with little or no real benefit
that I can see.

Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
false optimisation: if an application is performance critical, it would
not be doing many read faults of new memory, or at least it could be
expected to write to that memory soon afterwards. If cache or memory use
is critical, it should not be working with a significant number of
ZERO_PAGEs anyway (a more compact representation of zeroes should be
used).

As a sanity check -- mesuring on my desktop system, there are never many
mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
increase much without it.

When running a make -j4 kernel compile on my dual core system, there are
about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
is torn down without being COWed). So removing ZERO_PAGE will save 1,000
page faults per second when running kbuild, while keeping it only saves
less than 1 page clearing operation per second. 1 page clear is cheaper
than a thousand faults, presumably, so there isn't an obvious loss.

Neither the logical argument nor these basic tests give a guarantee of no
regressions. However, this is a reasonable opportunity to try to remove
the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
we can reintroduce it and just avoid refcounting it.

The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
much use to them except on benchmarks.  All other users of ZERO_PAGE are
converted just to use ZERO_PAGE(0) for simplicity. We can look at
replacing them all and maybe ripping out ZERO_PAGE completely when we are
more satisfied with this solution.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:53 -07:00
Christoph Lameter aadb4bc4a1 SLUB: direct pass through of page size or higher kmalloc requests
This gets rid of all kmalloc caches larger than page size.  A kmalloc
request larger than PAGE_SIZE > 2 is going to be passed through to the page
allocator.  This works both inline where we will call __get_free_pages
instead of kmem_cache_alloc and in __kmalloc.

kfree is modified to check if the object is in a slab page. If not then
the page is freed via the page allocator instead. Roughly similar to what
SLOB does.

Advantages:
- Reduces memory overhead for kmalloc array
- Large kmalloc operations are faster since they do not
  need to pass through the slab allocator to get to the
  page allocator.
- Performance increase of 10%-20% on alloc and 50% on free for
  PAGE_SIZEd allocations.
  SLUB must call page allocator for each alloc anyways since
  the higher order pages which that allowed avoiding the page alloc calls
  are not available in a reliable way anymore. So we are basically removing
  useless slab allocator overhead.
- Large kmallocs yields page aligned object which is what
  SLAB did. Bad things like using page sized kmalloc allocations to
  stand in for page allocate allocs can be transparently handled and are not
  distinguishable from page allocator uses.
- Checking for too large objects can be removed since
  it is done by the page allocator.

Drawbacks:
- No accounting for large kmalloc slab allocations anymore
- No debugging of large kmalloc slab allocations.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:53 -07:00
Fengguang Wu 57f6b96c09 filemap: convert some unsigned long to pgoff_t
Convert some 'unsigned long' to pgoff_t.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:53 -07:00
Fengguang Wu 535443f515 readahead: remove several readahead macros
Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:52 -07:00
Fengguang Wu 6df8ba4f8a radixtree: introduce radix_tree_next_hole()
Introduce radix_tree_next_hole(root, index, max_scan) to scan radix tree for
the first hole.  It will be used in interleaved readahead.

The implementation is dumb and obviously correct.  It can help debug(and
document) the possible smart one in future.

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:52 -07:00
Fengguang Wu f4e6b498d6 readahead: combine file_ra_state.prev_index/prev_offset into prev_pos
Combine the file_ra_state members
				unsigned long prev_index
				unsigned int prev_offset
into
				loff_t prev_pos

It is more consistent and better supports huge files.

Thanks to Peter for the nice proposal!

[akpm@linux-foundation.org: fix shift overflow]
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:52 -07:00
Fengguang Wu 0bb7ba6b9c readahead: mmap read-around simplification
Fold file_ra_state.mmap_hit into file_ra_state.mmap_miss and make it an int.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:52 -07:00
Fengguang Wu 937085aa35 readahead: compacting file_ra_state
Use 'unsigned int' instead of 'unsigned long' for readahead sizes.

This helps reduce memory consumption on 64bit CPU when a lot of files are
opened.

CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:52 -07:00
Jesper Juhl 39e91e4331 Clean up duplicate includes in include/linux/memory_hotplug.h
This patch cleans up duplicate includes in
	include/linux/memory_hotplug.h

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:52 -07:00
Andy Whitcroft d29eff7bca ppc64: SPARSEMEM_VMEMMAP support
Enable virtual memmap support for SPARSEMEM on PPC64 systems.  Slice a 16th
off the end of the linear mapping space and use that to hold the vmemmap.
Uses the same size mapping as uses in the linear 1:1 kernel mapping.

[pbadari@gmail.com: fix warning]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:51 -07:00
David Miller 46644c2477 SPARC64: SPARSEMEM_VMEMMAP support
[apw@shadowen.org: style fixups]
[apw@shadowen.org: vmemmap sparc64: convert to new config options]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:51 -07:00
Christoph Lameter ef229c5a5e IA64: SPARSEMEM_VMEMMAP 16K page size support
Equip IA64 sparsemem with a virtual memmap.  This is similar to the existing
CONFIG_VIRTUAL_MEM_MAP functionality for DISCONTIGMEM.  It uses a PAGE_SIZE
mapping.

This is provided as a minimally intrusive solution.  We split the 128TB
VMALLOC area into two 64TB areas and use one for the virtual memmap.

This should replace CONFIG_VIRTUAL_MEM_MAP long term.

[apw@shadowen.org: convert to new helper based initialisation]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:51 -07:00
Christoph Lameter 0889eba5b3 x86_64: SPARSEMEM_VMEMMAP 2M page size support
x86_64 uses 2M page table entries to map its 1-1 kernel space.  We also
implement the virtual memmap using 2M page table entries.  So there is no
additional runtime overhead over FLATMEM, initialisation is slightly more
complex.  As FLATMEM still references memory to obtain the mem_map pointer and
SPARSEMEM_VMEMMAP uses a compile time constant, SPARSEMEM_VMEMMAP should be
superior.

With this SPARSEMEM becomes the most efficient way of handling virt_to_page,
pfn_to_page and friends for UP, SMP and NUMA on x86_64.

[apw@shadowen.org: code resplit, style fixups]
[apw@shadowen.org: vmemmap x86_64: ensure end of section memmap is initialised]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:51 -07:00
Andy Whitcroft 29c71111d0 vmemmap: generify initialisation via helpers
Convert the common vmemmap population into initialisation helpers for use by
architecture vmemmap populators.  All architecture implementing the
SPARSEMEM_VMEMMAP variant supply an architecture specific vmemmap_populate()
initialiser, which may make use of the helpers.

This allows us to clean up and remove the initialisation Kconfig entries.
With this patch there is a single SPARSEMEM_VMEMMAP_ENABLE Kconfig option to
indicate use of that variant.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:51 -07:00
Christoph Lameter 8f6aac419b Generic Virtual Memmap support for SPARSEMEM
SPARSEMEM is a pretty nice framework that unifies quite a bit of code over all
the arches.  It would be great if it could be the default so that we can get
rid of various forms of DISCONTIG and other variations on memory maps.  So far
what has hindered this are the additional lookups that SPARSEMEM introduces
for virt_to_page and page_address.  This goes so far that the code to do this
has to be kept in a separate function and cannot be used inline.

This patch introduces a virtual memmap mode for SPARSEMEM, in which the memmap
is mapped into a virtually contigious area, only the active sections are
physically backed.  This allows virt_to_page page_address and cohorts become
simple shift/add operations.  No page flag fields, no table lookups, nothing
involving memory is required.

The two key operations pfn_to_page and page_to_page become:

   #define __pfn_to_page(pfn)      (vmemmap + (pfn))
   #define __page_to_pfn(page)     ((page) - vmemmap)

By having a virtual mapping for the memmap we allow simple access without
wasting physical memory.  As kernel memory is typically already mapped 1:1
this introduces no additional overhead.  The virtual mapping must be big
enough to allow a struct page to be allocated and mapped for all valid
physical pages.  This vill make a virtual memmap difficult to use on 32 bit
platforms that support 36 address bits.

However, if there is enough virtual space available and the arch already maps
its 1-1 kernel space using TLBs (f.e.  true of IA64 and x86_64) then this
technique makes SPARSEMEM lookups even more efficient than CONFIG_FLATMEM.
FLATMEM needs to read the contents of the mem_map variable to get the start of
the memmap and then add the offset to the required entry.  vmemmap is a
constant to which we can simply add the offset.

This patch has the potential to allow us to make SPARSMEM the default (and
even the only) option for most systems.  It should be optimal on UP, SMP and
NUMA on most platforms.  Then we may even be able to remove the other memory
models: FLATMEM, DISCONTIG etc.

[apw@shadowen.org: config cleanups, resplit code etc]
[kamezawa.hiroyu@jp.fujitsu.com: Fix sparsemem_vmemmap init]
[apw@shadowen.org: vmemmap: remove excess debugging]
[apw@shadowen.org: simplify initialisation code and reduce duplication]
[apw@shadowen.org: pull out the vmemmap code into its own file]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:51 -07:00
Andy Whitcroft 540557b943 sparsemem: record when a section has a valid mem_map
We have flags to indicate whether a section actually has a valid mem_map
associated with it.  This is never set and we rely solely on the present bit
to indicate a section is valid.  By definition a section is not valid if it
has no mem_map and there is a window during init where the present bit is set
but there is no mem_map, during which pfn_valid() will return true
incorrectly.

Use the existing SECTION_HAS_MEM_MAP flag to indicate the presence of a valid
mem_map.  Switch valid_section{,_nr} and pfn_valid() to this bit.  Add a new
present_section{,_nr} and pfn_present() interfaces for those users who care to
know that a section is going to be valid.

[akpm@linux-foundation.org: coding-syle fixes]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:51 -07:00
Christoph Hellwig 74a0b57627 x86: optimize page faults like all other achitectures and kill notifier cruft
x86(-64) are the last architectures still using the page fault notifier
cruft for the kprobes page fault hook.  This patch converts them to the
proper direct calls, and removes the now unused pagefault notifier bits
aswell as the cruft in kprobes.c that was related to this mess.

I know Andi didn't really like this, but all other architecture maintainers
agreed the direct calls are much better and besides the obvious cruft
removal a common way of dealing with kprobes across architectures is
important aswell.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc64]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:50 -07:00
Mike Travis d5a7430ddc Convert cpu_sibling_map to be a per cpu variable
Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
variable.  This saves sizeof(cpumask_t) * NR unused cpus.  Access is mostly
from startup and CPU HOTPLUG functions.

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:50 -07:00
Mike Travis 0835761129 x86: Convert cpu_core_map to be a per cpu variable
This is from an earlier message from 'Christoph Lameter':

    cpu_core_map is currently an array defined using NR_CPUS. This means that
    we overallocate since we will rarely really use maximum configured cpu.

    If we put the cpu_core_map into the per cpu area then it will be allocated
    for each processor as it comes online.

    This means that the core map cannot be accessed until the per cpu area
    has been allocated. Xen does a weird thing here looping over all processors
    and zeroing the masks that are not yet allocated and that will be zeroed
    when they are allocated. I commented the code out.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:50 -07:00
Guennadi Liakhovetski b3b708fa27 wake up from a serial port
Enable wakeup from serial ports, make it run-time configurable over sysfs,
e.g.,

echo enabled > /sys/devices/platform/serial8250.0/tty/ttyS0/power/wakeup

Requires

# CONFIG_SYSFS_DEPRECATED is not set

Following suggestions from Alan and Russell moved the may_wake_up checks
to serial_core.c. This time actually tested - it does even work. Could
someone, please, verify, that put_device after device_find_child is
correct?

Also would be nice to test with a Natsemi UART, that can wake up the system,
if such systems exist.

For this you just have to apply the patch below, issue the above "echo"
command to one of your Natsemi port, suspend and resume your system, and
verify that your Natsemi port still works.  If you are actually capable of
waking up the system from that port, would be nice to test that as well.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:50 -07:00
Guennadi Liakhovetski aa5346a212 provide stubs for enable_irq_wake() and disable_irq_wake()
Provide {enable,disable}_irq_wakeup dummies for undefined
cross-compilers for platforms without CONFIG_GENERIC_IRQ.

Needed by wake-up-from-a-serial-port.patch

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:50 -07:00
Alan Cox bf0df636e5 8250_pci: Autodetect mainpine cards
Add support for a whole range of boards. Some are partly autodetected but
not fully correctly others (PCI Express notably) not at all. Stick all
the right entries in.

Thanks to Mainpine for information and testing.

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:50 -07:00
James Bottomley 43d9f7fda1 pcmcia: use DMA_MASK_NONE for the default for all pcmcia devices
Most non cardbus devices can't do dma, so flag them as such in the device
creation routine.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Natalie Protasevich <protasnb@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:50 -07:00
James Bottomley 32e8f70230 introduce DMA_MASK_NONE as a signal for unable to do DMA
Some devices are incapable of DMA and need to be recognised as such.
Introduce a NONE dma mask to facilitate this plus an inline function:
is_device_dma_capable() to check this.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Natalie Protasevich <protasnb@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:50 -07:00
Yoichi Yuasa b5446b514c move a few definitions to au1000_xxs1500.c
Only a few definitions is in xxs1500.h .
They can be move to au1000_xxs1500.c .

[m.kozlowski@tuxland.pl: fix unbalanced parenthesis]
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:50 -07:00
Ralf Baechle 0322a2b840 Add assembler equivalents to __init{,date}_refok
I need __INIT_REFOK to fix a MODPOST warning for a few MIPS configs which
have to call init code from .text very early in the game due to bootloader
issues.  __INITDATA_REFOK is just for consistency.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:49 -07:00
Randy Dunlap bfe8df3d31 slow down printk during boot
Optionally add a boot delay after each kernel printk() call, crudely
measured in milliseconds, with a maximum delay of 10 seconds per printk.

Enable CONFIG_BOOT_PRINTK_DELAY=y and then add (e.g.):
"lpj=loops_per_jiffy boot_delay=100"
to the kernel command line.

It has been useful in cases like "during boot, my machine just reboots or the
screen goes black" by slowing down printk, (and adding initcall_debug), we can
usually see the last thing that happened before the lights went out which is
usually a valuable clue.

[akpm@linux-foundation.org: not all architectures implement CONFIG_HZ]
[akpm@linux-foundation.org: fix lots of stuff]
[bunk@stusta.de: kernel/printk.c: make 2 variables static]
[heiko.carstens@de.ibm.com: fix slow down printk on boot compile error]
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:49 -07:00
Jaroslav Kysela 24837e6f24 [ALSA] version 1.0.15
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
2007-10-16 16:57:46 +02:00
Krzysztof Helt ca2df45a07 [ALSA] This patch removes open_mutex from the ad1848-lib as
open and close operations are called only from pcm layer
and mutexed there with pcm->open_mutex.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
2007-10-16 16:51:26 +02:00
Jaroslav Kysela c1017a4cdb [ALSA] Changed Jaroslav Kysela's e-mail from perex@suse.cz to perex@perex.cz
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
2007-10-16 16:51:18 +02:00
Clemens Ladisch c1099fcb74 [ALSA] mpu-401: remove MPU401_INFO_UART_ONLY flag
Since the last patch made the ENTER_UART command optional, the
enter_uart option and its corresponding flag have become superfluous.
The uart_enter option remains for backward compatibility but just prints
a warning when used.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 16:51:14 +02:00
Keita Maehara 13043984e7 [ALSA] ac97: YMF743 missing controls support (1/2)
These patches enable some YMF743 controls (Tone/3D/IEC958) that won't
be detected with the current version of ALSA.
The first one contains only cosmetic changes to share a few
YMF753-specific symbols with YMF743.

Signed-off-by: Keita Maehara <maehara@debian.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 16:50:56 +02:00
Matthias Kaehlcke c2d7051ed1 [ALSA] Routines for effect processor FX8010: Use list_for_each_entry
Routines for effect processor FX8010: Use list_for_each_entry instead
of list_for_each

Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 16:50:44 +02:00
Takashi Iwai 503fc85a3b [ALSA] Kill useless volatile in pcm.h
The volatile prefix is just useless there.  Let's kill them, and then
gcc will be happier, too.
   sound/acore/pcm.c:867: warning: passing argument 1 of ‘__constant_c_and_count_memset’ discards qualifiers from pointer target type

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 16:49:28 +02:00
Takashi Iwai b9f09a4859 [ALSA] Fix 'discards qualifiers' compile warnings in pcm.h
Fixed cast messes in pcm.h.
    include/sound/pcm.h: In function ‘hw_param_interval_c’:
    include/sound/pcm.h:800: warning: passing argument 1 of ‘hw_param_interval’ discards qualifiers from pointer target type
Simply redefine the inline functions again for const pointers.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 16:49:27 +02:00
Rene Herman 3304cd3610 [ALSA] ad1848: fix AD1848P macro
Consistent variable naming is a good thing, but let's be a little less
sneaky about enforcing it... ;-/

Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 16:49:22 +02:00
Krzysztof Helt f545714ece [ALSA] cs4231 header split
This patch splits the cs4231.h file into two parts:
- cs4231-regs.h which contain register constants and macros
- cs4231.h which includes the above and contain rest of the definitions
This will allow to share register definitions between x86 ISA cs4231
and SPARC cs4231.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:59:56 +02:00
Clemens Ladisch 918f3a0e8c [ALSA] pcm: add snd_pcm_rate_to_rate_bit() helper
Add a snd_pcm_rate_to_rate_bit() function to factor out common code used
by several drivers.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:58:54 +02:00
Clemens Ladisch 7653d55760 [ALSA] pcm: merge rates[] from pcm_misc.c and pcm_native.c
Merge the rates[] arrays from pcm_misc.c and pcm_native.c because they
are both the same.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:58:53 +02:00
Timur Tabi b0c813ceee [ALSA] ASoC CS4270 codec device driver
This patch adds ALSA SoC support for the Cirrus Logic CS4270 codec.  The
following features are suppored:
1) Stand-alone and software mode
2) Software mode via I2C only
3) Master mode, not Slave
4) No power management

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:58:19 +02:00
Takashi Iwai 2807314d46 [ALSA] hda-intel - Add hwdep interface
Added a hwdep interface for each codec (enabled per kconfig).
This interface can be used for reading/writing HD-audio verbs
and other purposes as future extensions.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:58:10 +02:00
Takashi Iwai ef5fa1a49f [ALSA] hdspm - Coding style fixes
Fix codes to follow more to the standard kernel coding style.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:58:10 +02:00
Paul Vojta e2340465ec [ALSA] Fix bugs in mode change/recalibration for opl3sa2 driver
The mode change / recalibration doesn't work always with opl3sa2 devices,
e.g. the first time it's played back.  The patch fixes the problem.

Signed-off-by: Paul Vojta <vojta@math.berkeley.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:58:08 +02:00
James Courtier-Dutton f93abe51e8 [ALSA] snd-emu10k1:Implement SPDIF/ADAT status.
Signed-off-by: James Courtier-Dutton <James@superbug.co.uk>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:58:03 +02:00
James Courtier-Dutton 42f5322695 [ALSA] snd-emu10k1:Improves firmware loading for E-Mu cards.
Details:
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=8176

Signed-off-by: James Courtier-Dutton <James@superbug.co.uk>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:57:51 +02:00
Hans-Christian Egtvedt eafe570847 [ALSA] ALSA sound driver for the AT73C213 DAC using Atmel SSC driver
This patch adds support for the AT73C213 DAC using the misc Atmel SSC driver in
I2S mode. The driver also requires a SPI to setup the registers and control
volume.
It has been tested with an AT32AP7000 on the ATSTK1000 development board. The
driver should also work with any Atmel device with an SSC module supported by
the Atmel SSC driver (atmel-ssc).
The atmel-ssc driver is just submitted to the Linux kernel. Please see mail
thread http://lkml.org/lkml/2007/7/16/32

Signed-off-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:57:50 +02:00
Takashi Iwai a5ce88909d [ALSA] Clean up with common snd_ctl_boolean_*_info callbacks
Clean up codes using the new common snd_ctl_boolean_*_info() callbacks.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:57:45 +02:00
Takashi Iwai b9ed4f2b68 [ALSA] Add helper functions for frequently used callbacks
Added helper functions for frequenty used callbacks:
  snd_ctl_boolean_mono_info() and snd_ctl_boolean_stereo_info()

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:57:44 +02:00
James Courtier-Dutton 90fd5ce5f6 [ALSA] snd-emu10k1: Add support for E-Mu 1616 PCI, 1616M PCI, 0404 PCI, E-Mu
Notebook.
Description: The .device=0x0008 chips have new, but different EMU32 in/out
channels. Driver updated to make use of these EMU32 channels.

Signed-off-by: James Courtier-Dutton <James@superbug.co.uk>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
2007-10-16 15:57:43 +02:00
Jens Axboe 2c941a2040 SPARC64: sg chaining support
This updates the sparc64 iommu/pci dma mappers to sg chaining.

Acked-by: David S. Miller <davem@davemloft.net>

Later updated to newer kernel with unified sparc64 iommu sg handling.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:27:32 +02:00
Jens Axboe 0912a5db0e SPARC: sg chaining support
This updates the sparc iommu/pci dma mappers to sg chaining.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:27:32 +02:00
Jens Axboe 78bdc3106a PPC: sg chaining support
This updates the ppc iommu/pci dma mappers to sg chaining. Includes
further fixes from FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:27:32 +02:00
Jens Axboe d1ed455e30 PS3: sg chaining support
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:27:31 +02:00
Jens Axboe 9b6eccfccb IA64: sg chaining support
This updates the ia64 iommu/pci dma mappers to sg chaining.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:27:26 +02:00
Jens Axboe 46856afa01 x86-64: enable sg chaining
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:26:02 +02:00
Jens Axboe 38d375561f i386: enable sg chaining
We don't need to do more on x86, there's no iommu to be worried about.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:26:01 +02:00
Jens Axboe a17b490420 i386 dma_map_sg: convert to using sg helpers
The dma mapping helpers need to be converted to using
sg helpers as well, so they will work with a chained
sglist setup.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:26:01 +02:00
FUJITA Tomonori 2a7c59e79c remove sglist_len
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:24:44 +02:00
FUJITA Tomonori 9cb83c7529 [SCSI] add use_sg_chaining option to scsi_host_template
This option is true if a low-level driver can support sg
chaining. This will be removed eventually when all the drivers are
converted to support sg chaining. q->max_phys_segments is set to
SCSI_MAX_SG_SEGMENTS if false.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:24:32 +02:00
Jens Axboe 55c16a7004 IDE: sg chaining support
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:21:00 +02:00
Jens Axboe ba2da2f8d6 i2o: sg chaining support
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:21:00 +02:00
Jens Axboe 8726021626 libata: convert to using sg helpers
This converts libata to using the sg helpers for looking up sg
elements, instead of doing it manually.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:14:12 +02:00
Jens Axboe a8474ce23a SCSI: support for allocating large scatterlists
This is what enables large commands. If we need to allocate an
sgtable that doesn't fit in a single page, allocate several
SCSI_MAX_SG_SEGMENTS sized tables and chain them together.

SCSI defaults to large chained sg tables, if the arch supports it.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:12:53 +02:00
Jens Axboe 0cde8d9510 scsi: simplify scsi_free_sgtable()
Just pass in the command, no point in passing in the scatterlist
and scatterlist pool index seperately.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:12:37 +02:00
Jens Axboe 70eb8040dc Add chained sg support to linux/scatterlist.h
The core of the patch - allow the last sg element in a scatterlist
table to point to the start of a new table. We overload the LSB of
the page pointer to indicate whether this is a valid sg entry, or
merely a link to the next list.

Includes a fix from Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
correcting the ifdef ARCH_HAS_SG_CHAIN guarding sg_last().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:08:51 +02:00
Jens Axboe c6132da170 scsi: convert to using sg helpers
This converts the SCSI mid layer to using the sg helpers for looking up
sg elements, instead of doing it manually.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:08:49 +02:00
Jens Axboe 96b418c960 Add sg helpers for iterating over a scatterlist table
First step to being able to change the scatterlist setup without
having to modify drivers (a lot :-)

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:07:10 +02:00
Adrian Bunk bb879463b5 remove ide_get_error_location()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:05:06 +02:00
Jens Axboe fd5d806266 block: convert blkdev_issue_flush() to use empty barriers
Then we can get rid of ->issue_flush_fn() and all the driver private
implementations of that.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:05:02 +02:00
Jens Axboe bf2de6f5a4 block: Initial support for data-less (or empty) barrier support
This implements functionality to pass down or insert a barrier
in a queue, without having data attached to it. The ->prepare_flush_fn()
infrastructure from data barriers are reused to provide this
functionality.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:03:56 +02:00
Jens Axboe a0cd128542 block: add end_queued_request() and end_dequeued_request() helpers
We can use this helper in the elevator core for BLKPREP_KILL, and it'll
also be useful for the empty barrier patch.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:03:53 +02:00
Randy Dunlap e6716b87d5 docbook: fix filesystems content
Fix filesystems docbook warnings.

Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'name'
Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'mode'
Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'parent'
Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'value'
Warning(linux-2.6.23-git8//include/linux/jbd.h:404): No description found for parameter 'h_lockdep_map'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-15 17:56:36 -07:00
Randy Dunlap fd39c86b3d docbook: fix usb content
Fix USB docbook warnings.

Warning(linux-2.6.23-git8//include/linux/usb/gadget.h:487): No description found for parameter 'g'
Warning(linux-2.6.23-git8//include/linux/usb/gadget.h:506): No description found for parameter 'g'

Warning(linux-2.6.23-git8//drivers/usb/core/hub.c:1416): No description found for parameter 'usb_dev'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-15 17:56:36 -07:00
Linus Torvalds 65a6ec0d72 Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (95 commits)
  [ARM] 4578/1: CM-x270: PCMCIA support
  [ARM] 4577/1: ITE 8152 PCI bridge support
  [ARM] 4576/1: CM-X270 machine support
  [ARM] pxa: Avoid pxa_gpio_mode() in gpio_direction_{in,out}put()
  [ARM] pxa: move pxa_set_mode() from pxa2xx_mainstone.c to mainstone.c
  [ARM] pxa: move pxa_set_mode() from pxa2xx_lubbock.c to lubbock.c
  [ARM] pxa: Make cpu_is_pxaXXX dependent on configuration symbols
  [ARM] pxa: PXA3xx base support
  [NET] smc91x: fix PXA DMA support code
  [SERIAL] Fix console initialisation ordering
  [ARM] pxa: tidy up arch/arm/mach-pxa/Makefile
  [ARM] Update arch/arm/Kconfig for drivers/Kconfig changes
  [ARM] 4600/1: fix kernel build failure with build-id-supporting binutils
  [ARM] 4599/1: Preserve ATAG list for use with kexec (2.6.23)
  [ARM] Rename consistent_sync() as dma_cache_maint()
  [ARM] 4572/1: ep93xx: add cirrus logic edb9307 support
  [ARM] 4596/1: S3C2412: Correct IRQs for SDI+CF and add decoding support
  [ARM] 4595/1: ns9xxx: define registers as void __iomem * instead of volatile u32
  [ARM] 4594/1: ns9xxx: use the new gpio functions
  [ARM] 4593/1: ns9xxx: implement generic clockevents
  ...
2007-10-15 16:08:50 -07:00
Linus Torvalds 541010e4b8 Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux:
  nfsd: remove IS_ISMNDLCK macro
  Rework /proc/locks via seq_files and seq_list helpers
  fs/locks.c: use list_for_each_entry() instead of list_for_each()
  NFS: clean up explicit check for mandatory locks
  AFS: clean up explicit check for mandatory locks
  9PFS: clean up explicit check for mandatory locks
  GFS2: clean up explicit check for mandatory locks
  Cleanup macros for distinguishing mandatory locks
  Documentation: move locks.txt in filesystems/
  locks: add warning about mandatory locking races
  Documentation: move mandatory locking documentation to filesystems/
  locks: Fix potential OOPS in generic_setlease()
  Use list_first_entry in locks_wake_up_blocks
  locks: fix flock_lock_file() comment
  Memory shortage can result in inconsistent flocks state
  locks: kill redundant local variable
  locks: reverse order of posix_locks_conflict() arguments
2007-10-15 16:07:40 -07:00
Linus Torvalds e457f790d8 Merge branch 'release' of ssh://master.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of ssh://master.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] build fix for scatterlist
2007-10-15 15:32:57 -07:00
Linus Torvalds a52cefc80f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  [IPV6]: Consolidate the ip6_pol_route_(input|output) pair
  [TCP]: Make snd_cwnd_cnt 32-bit
  [TCP]: Update the /proc/net/tcp documentation
  [NETNS]: Don't panic on creating the namespace's loopback
  [NEIGH]: Ensure that pneigh_lookup is protected with RTNL
  [INET]: kmalloc+memset -> kzalloc in frag_alloc_queue
  [ISDN]: Fix compile with CONFIG_ISDN_X25 disabled.
  [IPV6]: Replace sk_buff ** with sk_buff * in input handlers
  [SELINUX]: Update for netfilter ->hook() arg changes.
  [INET]: Consolidate the xxx_put
  [INET]: Small cleanup for xxx_put after evictor consolidation
  [INET]: Consolidate the xxx_evictor
  [INET]: Consolidate the xxx_frag_destroy
  [INET]: Consolidate xxx_the secret_rebuild
  [INET]: Consolidate the xxx_frag_kill
  [INET]: Collect common frag sysctl variables together
  [INET]: Collect frag queues management objects together
  [INET]: Move common fields from frag_queues in one place.
  [TG3]: Fix performance regression on 5705.
  [ISDN]: Remove local copy of device name to make sure renames work.
  ...
2007-10-15 14:06:58 -07:00
Tony Luck 0df333ce01 [IA64] build fix for scatterlist
include/scsi/scsi_eh.h:79: error: field `sense_sgl' has incomplete type

x86 resolves this by including scatterlist.h from dma-mapping.h which
seems as good a place as any.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-10-15 13:49:43 -07:00
Linus Torvalds f2e1d89f9b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (40 commits)
  Input: use full RCU API
  Input: remove tsdev interface
  Input: add support for Blackfin BF54x Keypad controller
  Input: appletouch - another fix for idle reset logic
  HWMON: hdaps - switch to using input-polldev
  Input: add support for SEGA Dreamcast keyboard
  Input: omap-keyboard - don't pretend we support changing keymap
  Input: lifebook - fix X and Y axis range
  Input: usbtouchscreen - add support for GeneralTouch devices
  Input: fix open count handling in input interfaces
  Input: keyboard - add CapsShift lock
  Input: adbhid - produce all CapsLock key events
  Input: ALPS - add signature for ThinkPad R61
  Input: jornada720_kbd - send MSC_SCAN events
  Input: add support for the HP Jornada 7xx (710/720/728) touchscreen
  Input: add support for HP Jornada 7xx onboard keyboard
  Input: add support for HP Jornada onboard keyboard (HP6XX)
  Input: ucb1400_ts - use schedule_timeout_uninterruptible
  Input: xpad - fix dependancy on LEDS class
  Input: auto-select INPUT for MAC_EMUMOUSEBTN option
  ...

Resolved conflicts manually in drivers/hwmon/applesmc.c: converting from
a class device to a device and converting to use input-polldev created a
few apparently trivial clashes..
2007-10-15 13:41:39 -07:00
Ilpo Järvinen f78a1b3892 [TCP]: Make snd_cwnd_cnt 32-bit
Very little point of having 32-bit snd_cnwd if this is not
32-bit as well, as a number of snd_cwnd incrementation formulas
assume that snd_cwnd_cnt can be at least as large as snd_cwnd.

Whether 32-bit is useful was discussed when e0ef57cc56
was made:
  http://marc.info/?l=linux-netdev&m=117218144409825&w=2

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:59:43 -07:00
Herbert Xu e5bbef20e0 [IPV6]: Replace sk_buff ** with sk_buff * in input handlers
With all the users of the double pointers removed from the IPv6 input path,
this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input
handlers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:50:28 -07:00
Pavel Emelyanov 762cc40801 [INET]: Consolidate the xxx_put
These ones use the generic data types too, so move
them in one place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:43 -07:00
Pavel Emelyanov 8e7999c44e [INET]: Consolidate the xxx_evictor
The evictors collect some statistics for ipv4 and ipv6,
so make it return the number of evicted queues and account
them all at once in the caller.

The XXX_ADD_STATS_BH() macros are just for this case,
but maybe there are places in code, that can make use of
them as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:42 -07:00
Pavel Emelyanov 1e4b82873a [INET]: Consolidate the xxx_frag_destroy
To make in possible we need to know the exact frag queue
size for inet_frags->mem management and two callbacks:

 * to destoy the skb (optional, used in conntracks only)
 * to free the queue itself (mandatory, but later I plan to
   move the allocation and the destruction of frag_queues
   into the common place, so this callback will most likely
   be optional too).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:42 -07:00
Pavel Emelyanov 321a3a99e4 [INET]: Consolidate xxx_the secret_rebuild
This code works with the generic data types as well, so
move this into inet_fragment.c

This move makes it possible to hide the secret_timer
management and the secret_rebuild routine completely in
the inet_fragment.c

Introduce the ->hashfn() callback in inet_frags() to get
the hashfun for a given inet_frag_queue() object.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:41 -07:00
Pavel Emelyanov 277e650ddf [INET]: Consolidate the xxx_frag_kill
Since now all the xxx_frag_kill functions now work
with the generic inet_frag_queue data type, this can
be moved into a common place.

The xxx_unlink() code is moved as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:41 -07:00
Pavel Emelyanov 04128f233f [INET]: Collect common frag sysctl variables together
Some sysctl variables are used to tune the frag queues
management and it will be useful to work with them in
a common way in the future, so move them into one
structure, moreover they are the same for all the frag
management codes.

I don't place them in the existing inet_frags object,
introduced in the previous patch for two reasons:

 1. to keep them in the __read_mostly section;
 2. not to export the whole inet_frags objects outside.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:40 -07:00
Pavel Emelyanov 7eb95156d9 [INET]: Collect frag queues management objects together
There are some objects that are common in all the places
which are used to keep track of frag queues, they are:

 * hash table
 * LRU list
 * rw lock
 * rnd number for hash function
 * the number of queues
 * the amount of memory occupied by queues
 * secret timer

Move all this stuff into one structure (struct inet_frags)
to make it possible use them uniformly in the future. Like
with the previous patch this mostly consists of hunks like

-    write_lock(&ipfrag_lock);
+    write_lock(&ip4_frags.lock);

To address the issue with exporting the number of queues and
the amount of memory occupied by queues outside the .c file
they are declared in, I introduce a couple of helpers.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:39 -07:00
Pavel Emelyanov 5ab11c98d3 [INET]: Move common fields from frag_queues in one place.
Introduce the struct inet_frag_queue in include/net/inet_frag.h
file and place there all the common fields from three structs:

 * struct ipq in ipv4/ip_fragment.c
 * struct nf_ct_frag6_queue in nf_conntrack_reasm.c
 * struct frag_queue in ipv6/reassembly.c

After this, replace these fields on appropriate structures with
this structure instance and fix the users to use correct names
i.e. hunks like

-    atomic_dec(&fq->refcnt);
+    atomic_dec(&fq->q.refcnt);

(these occupy most of the patch)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:38 -07:00
Karsten Keil faca94ffae [ISDN]: Remove local copy of device name to make sure renames work.
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:37 -07:00
Herbert Xu 3db05fea51 [NETFILTER]: Replace sk_buff ** with sk_buff *
With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:29 -07:00
Herbert Xu af1e1cf073 [IPVS]: Replace local version of skb_make_writable
This patch removes the IPVS-specific version of skb_make_writable and
replaces it with the netfilter one.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:28 -07:00
Herbert Xu 37d4187922 [NETFILTER]: Do not copy skb in skb_make_writable
Now that all callers of netfilter can guarantee that the skb is not shared,
we no longer have to copy the skb in skb_make_writable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:27 -07:00
Herbert Xu 776c729e8d [IPV4]: Change ip_defrag to return an integer
Now that ip_frag always returns the packet given to it on input, we can
change it to return an integer indicating error instead.  This patch does
that and updates all its callers accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:25 -07:00
Herbert Xu e0053ec07e [SKBUFF]: Add skb_morph
This patch creates a new function skb_morph that's just like skb_clone
except that it lets user provide the spare skb that will be overwritten
by the one that's to be cloned.

This will be used by IP fragment reassembly so that we get back the same
skb that went in last (rather than the head skb that we get now which
requires us to carry around double pointers all over the place).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:24 -07:00
Yoichi Yuasa 25b31cb118 add new prom.h for AU1x00
Add new prom.h for AU1x00.

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:38:25 -04:00
Michael Ellerman cdbd3865ac Use dcr_host_t.base in dcr_unmap()
With the base stored in dcr_host_t, there's no need for callers to pass
the dcr_n into dcr_unmap(). In fact this removes the possibility of them
passing the incorrect value, which would then be iounmap()'ed.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:29:49 -04:00
Michael Ellerman 83f34df4e7 Add dcr_host_t.base in dcr_read()/dcr_write()
Now that all users of dcr_read()/dcr_write() add the dcr_host_t.base, we
can save them the trouble and do it in dcr_read()/dcr_write().

As some background to why we just went through all this jiggery-pokery,
benh sayeth:

 Initially the goal of the dcr_read/dcr_write routines was to operate like
 mfdcr/mtdcr which take absolute DCR numbers. The reason is that on 4xx
 hardware, indirect DCR access is a pain (goes through a table of
 instructions) and it's useful to have the compiler resolve an absolute DCR
 inline.

 We decided that wasn't worth the API bastardisation since most places
 where absolute DCR values are used are low level 4xx-only code which may
 as well continue using mfdcr/mtdcr, while the new API is designed for
 device "instances" that can exist on 4xx and Axon type platforms and may
 be located at variable DCR offsets.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:29:49 -04:00
Brice Goglin eabd7e35c0 Add skb_is_gso_v6
Add skb_is_gso_v6().

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:24:07 -04:00
Russell King 0181b61a98 Merge branch 'pxa' into devel 2007-10-15 18:56:02 +01:00
Mike Rapoport a8fc078955 [ARM] 4577/1: ITE 8152 PCI bridge support
This patch provides driver for ITE 8152 PCI bridge.

Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-10-15 18:53:59 +01:00
Mike Rapoport 3696a8a426 [ARM] 4576/1: CM-X270 machine support
This patch provides core support for CM-X270 platform.

Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-10-15 18:53:57 +01:00
Russell King 3e0cc7ee04 [ARM] pxa: Avoid pxa_gpio_mode() in gpio_direction_{in,out}put()
pxa_gpio_mode() is a universal call that fiddles with the GAFR
(gpio alternate function register.)  GAFR does not exist on PXA3
CPUs, but instead the alternate functions are controlled via the
MFP support code.

Platforms are expected to configure the MFP according to their
needs in their platform support code rather than drivers.  We
extend this idea to the GAFR, and make the gpio_direction_*()
functions purely operate on the GPIO level.

This means platform support code is entirely responsible for
configuring the GPIOs alternate functions on all PXA CPU types.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-10-15 18:53:55 +01:00
Russell King 36d8b17b43 [ARM] pxa: Make cpu_is_pxaXXX dependent on configuration symbols
Make the cpu_is_pxaXXX() macros define to zero when support for a
particular CPU is disabled.  This allows us to eliminate code for
CPUs which aren't enabled.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-10-15 18:53:45 +01:00
eric miao 2c8086a5d0 [ARM] pxa: PXA3xx base support
Signed-off-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-10-15 18:53:43 +01:00
Linus Torvalds f4921aff5b Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
  NFSv4: Fix a typo in nfs_inode_reclaim_delegation
  NFS: Add a boot parameter to disable 64 bit inode numbers
  NFS: nfs_refresh_inode should clear cache_validity flags on success
  NFS: Fix a connectathon regression in NFSv3 and NFSv4
  NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
  SUNRPC: Don't call xprt_release in call refresh
  SUNRPC: Don't call xprt_release() if call_allocate fails
  SUNRPC: Fix buggy UDP transmission
  [23/37] Clean up duplicate includes in
  [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
  SUNRPC: Use correct type in buffer length calculations
  SUNRPC: Fix default hostname created in rpc_create()
  nfs: add server port to rpc_pipe info file
  NFS: Get rid of some obsolete macros
  NFS: Simplify filehandle revalidation
  NFS: Ensure that nfs_link() returns a hashed dentry
  NFS: Be strict about dentry revalidation when doing exclusive create
  NFS: Don't zap the readdir caches upon error
  NFS: Remove the redundant nfs_reval_fsid()
  NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
  ...

Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
2007-10-15 10:47:35 -07:00
Linus Torvalds 419217cb1d Merge branch 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep
* 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep:
  lockdep: annotate dir vs file i_mutex
  lockdep: per filesystem inode lock class
  lockdep: annotate kprobes irq fiddling
  lockdep: annotate rcu_read_{,un}lock{,_bh}
  lockdep: annotate journal_start()
  lockdep: s390: connect the sysexit hook
  lockdep: x86_64: connect the sysexit hook
  lockdep: i386: connect the sysexit hook
  lockdep: syscall exit check
  lockdep: fixup mutex annotations
  lockdep: fix mismatched lockdep_depth/curr_chain_hash
  lockdep: Avoid /proc/lockdep & lock_stat infinite output
  lockdep: maintainers
2007-10-15 10:40:41 -07:00
Linus Torvalds 4937ce8795 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] update sn2_defconfig
  [IA64] Fix kernel hangup in kdump on INIT
  [IA64] Fix kernel panic in kdump on INIT
  [IA64] Remove vector from ia64_machine_kexec()
  [IA64] Fix race when multiple cpus go through MCA
  [IA64] Remove needless delay in MCA rendezvous
  [IA64] add driver for ACPI methods to call native firmware
  [IA64] abstract SAL_CALL wrapper to allow other firmware entry points
  [IA64] perfmon: Remove exit_pfm_fs()
  [IA64] tree-wide: Misc __cpu{initdata, init, exit} annotations
2007-10-15 09:57:54 -07:00
Linus Torvalds b5869ce7f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (140 commits)
  sched: sync wakeups preempt too
  sched: affine sync wakeups
  sched: guest CPU accounting: maintain guest state in KVM
  sched: guest CPU accounting: maintain stats in account_system_time()
  sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
  sched: guest CPU accounting: add guest-CPU /proc/stat field
  sched: domain sysctl fixes: add terminator comment
  sched: domain sysctl fixes: do not crash on allocation failure
  sched: domain sysctl fixes: unregister the sysctl table before domains
  sched: domain sysctl fixes: use for_each_online_cpu()
  sched: domain sysctl fixes: use kcalloc()
  Make scheduler debug file operations const
  sched: enable wake-idle on CONFIG_SCHED_MC=y
  sched: reintroduce topology.h tunings
  sched: allow the immediate migration of cache-cold tasks
  sched: debug, improve migration statistics
  sched: debug: increase width of debug line
  sched: activate task_hot() only on fair-scheduled tasks
  sched: reintroduce cache-hot affinity
  sched: speed up context-switches a bit
  ...
2007-10-15 08:22:16 -07:00
Linus Torvalds df3d80f5a5 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (207 commits)
  [SCSI] gdth: fix CONFIG_ISA build failure
  [SCSI] esp_scsi: remove __dev{init,exit}
  [SCSI] gdth: !use_sg cleanup and use of scsi accessors
  [SCSI] gdth: Move members from SCp to gdth_cmndinfo, stage 2
  [SCSI] gdth: Setup proper per-command private data
  [SCSI] gdth: Remove gdth_ctr_tab[]
  [SCSI] gdth: switch to modern scsi host registration
  [SCSI] gdth: gdth_interrupt() gdth_get_status() & gdth_wait() fixes
  [SCSI] gdth: clean up host private data
  [SCSI] gdth: Remove virt hosts
  [SCSI] gdth: Reorder scsi_host_template intitializers
  [SCSI] gdth: kill gdth_{read,write}[bwl] wrappers
  [SCSI] gdth: Remove 2.4.x support, in-kernel changelog
  [SCSI] gdth: split out pci probing
  [SCSI] gdth: split out eisa probing
  [SCSI] gdth: split out isa probing
  gdth: Make one abuse of scsi_cmnd less obvious
  [SCSI] NCR5380: Use scsi_eh API for REQUEST_SENSE invocation
  [SCSI] usb storage: use scsi_eh API in REQUEST_SENSE execution
  [SCSI] scsi_error: Refactoring scsi_error to facilitate in synchronous REQUEST_SENSE
  ...
2007-10-15 08:19:33 -07:00
Linus Torvalds 37ca506adc Merge branch 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux
* 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux:
  knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME
  knfsd: nfsv4 delegation recall should take reference on client
  knfsd: don't shutdown callbacks until nfsv4 client is freed
  knfsd: let nfsd manage timing out its own leases
  knfsd: Add source address to sunrpc svc errors
  knfsd: 64 bit ino support for NFS server
  svcgss: move init code into separate function
  knfsd: remove code duplication in nfsd4_setclientid()
  nfsd warning fix
  knfsd: fix callback rpc cred
  knfsd: move nfsv4 slab creation/destruction to module init/exit
  knfsd: spawn kernel thread to probe callback channel
  knfsd: nfs4 name->id mapping not correctly parsing negative downcall
  knfsd: demote some printk()s to dprintk()s
  knfsd: cleanup of nfsd4 cmp_* functions
  knfsd: delete code made redundant by map_new_errors
  nfsd: fix horrible indentation in nfsd_setattr
  nfsd: remove unused cache_for_each macro
  nfsd: tone down inaccurate dprintk
2007-10-15 08:16:53 -07:00
Jiri Kosina 57d292bd7e HID: fix HIDIOCGRDESC memory access in hidraw
Fix bogus copying of data into userspace when HIDIOCGRDESC is issued.
HID-transport layer makes sure that dev->hid->rdesc is not larger than
HID_MAX_DESCRIPTOR_SIZE.

Noticed-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-15 08:12:00 -07:00
Laurent Vivier 94886b84b1 sched: guest CPU accounting: maintain stats in account_system_time()
modify account_system_time() to add cputime to cpustat->guest if we are
running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code
although it is executed in the kernel. We duplicate VCPU time between
guest and user to allow an unmodified "top(1)" to display correct value.
A modified "top(1)" is able to display good cpu user time and cpu guest
time by subtracting cpu guest time from cpu user time. Update "gtime" in
task_struct accordingly.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Laurent Vivier 9ac52315d4 sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.

Modify /proc/<pid>/stat to display these new fields.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Laurent Vivier 5e84cfde51 sched: guest CPU accounting: add guest-CPU /proc/stat field
as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time
used by the CPU to run virtual CPU. Modify /proc/stat to display this
new field.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Ingo Molnar 7a6c6bcee0 sched: enable wake-idle on CONFIG_SCHED_MC=y
most multicore CPUs today have shared L2 caches, so tune things so
that the spreading amongst cores is more aggressive.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Ingo Molnar 95dbb421d1 sched: reintroduce topology.h tunings
reintroduce the 2.6.22 topology.h tunings again - they result in
slightly better balancing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Ingo Molnar cc367732ff sched: debug, improve migration statistics
add new migration statistics when SCHED_DEBUG and SCHEDSTATS
is enabled. Available in /proc/<PID>/sched.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:18 +02:00
Ingo Molnar da84d96176 sched: reintroduce cache-hot affinity
reintroduce a simplified version of cache-hot/cold scheduling
affinity. This improves performance with certain SMP workloads,
such as sysbench.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:18 +02:00
Mike Galbraith 95938a35c5 sched: prevent wakeup over-scheduling
Prevent wakeup over-scheduling.  Once a task has been preempted by a
task of the same or lower priority, it becomes ineligible for repeated
preemption by same until it has been ticked, or slept.  Instead, the
task is marked for preemption at the next tick.  Tasks of higher
priority still preempt immediately.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Dhaval Giani 5cb350baf5 sched: group scheduling, sysfs tunables
Add tunables in sysfs to modify a user's cpu share.

A directory is created in sysfs for each new user in the system.

	/sys/kernel/uids/<uid>/cpu_share

Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.

Ex:
	# cd /sys/kernel/uids/
	# cat 512/cpu_share
	1024
	# echo 2048 > 512/cpu_share
	# cat 512/cpu_share
	2048
	#

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Ingo Molnar 4cf86d77f5 sched: cleanup: rename task_grp to task_group
cleanup: rename task_grp to task_group. No need to save two characters
and 'grp' is annoying to read.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Mike Galbraith af92723262 sched: cleanup, remove the TASK_NONINTERACTIVE flag
Here's another piece of low hanging obsolete fruit.

Remove obsolete TASK_NONINTERACTIVE.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:13 +02:00
Ingo Molnar 5522d5d5f7 sched: mark scheduling classes as const
mark scheduling classes as const. The speeds up the code
a bit and shrinks it:

   text    data     bss     dec     hex filename
  40027    4018     292   44337    ad31 sched.o.before
  40190    3842     292   44324    ad24 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:12 +02:00
Peter Zijlstra 5f6d858ecc sched: speed up and simplify vslice calculations
speed up and simplify vslice calculations.

[ From: Mike Galbraith <efault@gmx.de>: build fix ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:12 +02:00
Ingo Molnar 2d72376b3a sched: clean up schedstats, cnt -> count
rename all 'cnt' fields and variables to the less yucky 'count' name.

yuckage noticed by Andrew Morton.

no change in code, other than the /proc/sched_debug bkl_count string got
a bit larger:

   text    data     bss     dec     hex filename
  38236    3506      24   41766    a326 sched.o.before
  38240    3506      24   41770    a32a sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:12 +02:00
Ingo Molnar 94359f05cb sched: undo some of the recent changes
undo some of the recent changes that are not needed after all,
such as last_min_vruntime.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15 17:00:11 +02:00
Peter Zijlstra 67e9fb2a39 sched: add vslice
add vslice: the load-dependent "virtual slice" a task should
run ideally, so that the observed latency stays within the
sched_latency window.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:10 +02:00
Ingo Molnar c18b8a7cbc sched: remove unneeded tunables
remove unneeded tunables.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:10 +02:00
Ingo Molnar b8efb56172 sched debug: BKL usage statistics
add per task and per rq BKL usage statistics.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:10 +02:00
Srivatsa Vaddagiri 24e377a832 sched: add fair-user scheduler
Enable user-id based fair group scheduling. This is useful for anyone
who wants to test the group scheduler w/o having to enable
CONFIG_CGROUPS.

A separate scheduling group (i.e struct task_grp) is automatically created for 
every new user added to the system. Upon uid change for a task, it is made to 
move to the corresponding scheduling group.

A /proc tunable (/proc/root_user_share) is also provided to tune root
user's quota of cpu bandwidth.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:09 +02:00
Srivatsa Vaddagiri 9b5b77512d sched: clean up code under CONFIG_FAIR_GROUP_SCHED
With the view of supporting user-id based fair scheduling (and not just
container-based fair scheduling), this patch renames several functions
and makes them independent of whether they are being used for container
or user-id based fair scheduling.

Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating
less-sized array for tg->cfs_rq[] and tf->se[]).

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:09 +02:00
Srivatsa Vaddagiri 83b699ed20 sched: revert recent removal of set_curr_task()
Revert removal of set_curr_task.
Use put_prev_task/set_curr_task when changing groups/policies

Signed-off-by: Srivatsa Vaddagiri < vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15 17:00:08 +02:00
Dmitry Adamushko f6b53205e1 sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()
rework enqueue/dequeue_entity() to get rid of 
sched_class::set_curr_task(). This simplifies sched_setscheduler(), 
rt_mutex_setprio() and sched_move_tasks().

   text    data     bss     dec     hex filename
  24330    2734      20   27084    69cc sched.o.before
  24233    2730      20   26983    6967 sched.o.after

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:08 +02:00
Dmitry Adamushko 4530d7ab0f sched: simplify sched_class::yield_task()
the 'p' (task_struct) parameter in the sched_class :: yield_task() is
redundant as the caller is always the 'current'. Get rid of it.

   text    data     bss     dec     hex filename
  24341    2734      20   27095    69d7 sched.o.before
  24330    2734      20   27084    69cc sched.o.after

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:08 +02:00
Dmitry Adamushko 30cfdcfc5f sched: do not keep current in the tree and get rid of sched_entity::fair_key
Get rid of 'sched_entity::fair_key'.

As a side effect, 'current' is not kept withing the tree for 
SCHED_NORMAL/BATCH tasks anymore. This simplifies some parts of code 
(e.g. entity_tick() and yield_task_fair()) and also somewhat optimizes 
them (e.g. a single update_curr() now vs. dequeue/enqueue() before in 
entity_tick()).

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:07 +02:00
Ingo Molnar bbdba7c0e1 sched: remove wait_runtime fields and features
remove wait_runtime based fields and features, now that the CFS
math has been changed over to the vruntime metric.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:06 +02:00
Ingo Molnar e22f5bbf86 sched: remove wait_runtime limit
remove the wait_runtime-limit fields and the code depending on it, now
that the math has been changed over to rely on the vruntime metric.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:06 +02:00
Ingo Molnar e9acbff648 sched: introduce se->vruntime
introduce se->vruntime as a sum of weighted delta-exec's, and use that
as the key into the tree.

the idea to use absolute virtual time as the basic metric of scheduling
has been first raised by William Lee Irwin, advanced by Tong Li and first
prototyped by Roman Zippel in the "Really Fair Scheduler" (RFS) patchset.

also see:

   http://lkml.org/lkml/2007/9/2/76

for a simpler variant of this patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:04 +02:00
Ingo Molnar 8ebc91d936 sched: remove stat_gran
remove the stat_gran code - it was disabled by default and it causes
unnecessary overhead.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:03 +02:00
Ingo Molnar 2bd8e6d422 sched: use constants if !CONFIG_SCHED_DEBUG
use constants if !CONFIG_SCHED_DEBUG.

this speeds up the code and reduces code-size:

    text    data     bss     dec     hex filename
   27464    3014      16   30494    771e sched.o.before
   26929    3010      20   29959    7507 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:02 +02:00
Ingo Molnar eba1ed4b7e sched: debug: track maximum 'slice'
track the maximum amount of time a task has executed while
the CPU load was at least 2x. (i.e. at least two nice-0
tasks were runnable)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:02 +02:00
Thomas Gleixner 1595f452f3 clockevents: introduce force broadcast notifier
The 64bit SMP bootup is slightly different to the 32bit one. It enables
the boot CPU local APIC timer before all CPUs are brought up. Some AMD C1E
systems have the C1E feature flag only set in the secondary CPU. Due to
the early enable of the boot CPU local APIC timer the APIC timer is
registered as a fully functional device. When we detect the wreckage during
the bringup of the secondary CPU, we need to force the boot CPU into
broadcast mode. 

Add a new notifier reason and implement the force broadcast in the clock
events layer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-14 22:57:45 +02:00
Linus Torvalds 4fa435018d Merge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6
* 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6: (53 commits)
  hwmon: (vt8231) fix sparse warning
  hwmon: (sis5595) fix sparse warning
  hwmon: (w83627hf) don't assume bank 0
  hwmon: (w83627hf) Fix setting fan min right after driver load
  hwmon: (w83627hf) De-macro sysfs callback functions
  hwmon: Add new combined driver for FSC chips
  hwmon: (ibmpex) Release IPMI user if hwmon registration fails
  hwmon: (dme1737) Add sch311x support
  hwmon: (dme1737) group functions logically
  hwmon: (dme1737) cleanups
  hwmon: IBM power meter driver
  hwmon: (coretemp) Add support for Celeron 4xx
  hwmon: (lm87) Disable VID when it should be
  hwmon: (w83781d) Add individual alarm and beep files
  hwmon: VRM is not read from registers
  MAINTAINERS: update hwmon subsystem git trees
  hwmon: Fix the code examples in documentation
  hwmon: update sysfs interface document - error handling
  hwmon: (thmc50) Fix a debug message
  hwmon: (thmc50) Don't create temp3 if not enabled
  ...
2007-10-14 12:50:19 -07:00
Al Viro f53f4137ba fix endianness bug in inet_lro
all uses of and almost all assignments to lro_desc->tcp_ack assume that it's
net-endian; one converts net-endian to host-endian and sticks it in
lro_desc->tcp_ack.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:52 -07:00
Al Viro 9df7c98a0f inet_lro: trivial endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:52 -07:00
Al Viro c6b44e50b8 endianness annotations in arm io.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:52 -07:00
Al Viro 5ba253313d more low-hanging fruits - kernel, fs, lib signedness
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:52 -07:00
Al Viro 64b33619a3 long vs. unsigned long - low-hanging fruits in drivers
deal with signedness of the stuff passed to set_bit() et.al.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:51 -07:00
Al Viro 0cc0844bc6 frv: missing casts in cmpxchg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:51 -07:00
Al Viro bda76dd160 endian-clean in_le64/out_le64
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:51 -07:00
Linus Torvalds 52d4e661ac Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (21 commits)
  HID: hidraw_connect() memleak fix
  HID: add hidraw interface
  USB HID: provide hook for hidraw write()
  HID: hiddev: Add 32bit ioctl compatibilty
  HID: Add GeneralTouch touchscreen to the blacklist
  HID: add support for Microsoft Wireless Laser Keyboard 6000
  Input: add KEY_LOGOFF
  USBHID: report descriptor fix for MacBook JIS keyboard
  HID: trivial fixes in hid-debug
  HID: fix input mapping for Microsoft Ergonomic Keyboard
  HID: use hid-plff driver for GreenAsia 0e8f:0003 devices
  USBHID: Add HID_QUIRK_NOGET for ELO Touch Screen 2700 display
  HID: enable hiddev for the SantaRosa MacBookPro IR receiver
  USBHID: add CM109 device to blacklist
  HID: Report usage codes of keys as EV_MSC scancode events
  HID: ignore all non-LED usages in output fields in hid-input
  HID: fix whitespace damage
  HID: add support for Thrustmaster FGT Force Feedback wheel
  HID: minimal autosuspend support for USB HID devices
  HID: add support for Microsoft Natural Ergonomic Keyboard 4000
  ...
2007-10-14 09:03:42 -07:00
Jiri Kosina d057fd4cb8 Merge branch 'hidraw' into for-linus 2007-10-14 14:47:56 +02:00
Jiri Kosina 86166b7bcd HID: add hidraw interface
hidraw is an interface that is going to obsolete hiddev one
day.

Many userland applications are using libusb instead of using
kernel-provided hiddev interface. This is caused by various
reasons - the HID parser in kernel doesn't handle all the
HID hardware on the planet properly, some devices might require
its own specific quirks/drivers, etc.

hiddev interface tries to do its best to parse all the received
reports properly, and presents only parsed usages into userspace.
This is however often not enough, and that's the reason why
many userland applications just don't use hiddev at all, and
rather use libusb to read raw USB events and process them on
their own.

Another drawback of hiddev is that it is USB-specific.

hidraw interface provides userspace readers with really raw HID
reports, no matter what the low-level transport layer is (USB/BT),
and gives the userland applications all the freedom to process
the HID reports in a way they wish to.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2007-10-14 14:47:26 +02:00
Khelben Blackstaff e2bca0749c Input: add KEY_LOGOFF
HUT 1.12 defines Logoff usage 0x19c in Consumer page. There are
keyboards out there emitting this usage code (for example Microsoft
Wireless Laser Keyboard 6000). Add this key so that HID code could
map usages to it.

Signed-off-by: Khelben Blackstaff <eye.of.the.8eholder@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2007-10-14 13:40:02 +02:00
Tomoya Adachi 08f06177f4 USBHID: report descriptor fix for MacBook JIS keyboard
This patch fixes the problem, that Japanese MacBook doesn't recognize some keys
like '\'(yen, or backslash), '|'(pipe), and '_'(underscore).

It is due to that MacBook JIS keyboard (jp106) sends wrong report descriptor.
It saids "logical maximum = 0x65", so Keyboard.0089 is mapped to Key.Unknown,
while it should be accepted as Key.Yen.

Signed-off-by: Tomoya Adachi <adachi@il.is.s.u-tokyo.ac.jp>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2007-10-14 13:40:01 +02:00
Stelian Pop 0ce91cf9ce HID: enable hiddev for the SantaRosa MacBookPro IR receiver
The infrared remote receiver found in the SantaRosa MacBookPro
laptops (MacBookPro3,1) need to be forced to expose a HIDDEV
interface (instead of HIDINPUT) so that lirc can access it using
the 'macmini' driver.

The patch below adds the required quirk for forcing the HIDDEV
interface to be activated (HID_QUIRK_HIDDEV) and introduces a new
quirk which forces the HIDINPUT interface to be ignored
(HID_QUIRK_IGNORE_HIDINPUT).

Note that Apple calls this receiver 'IRController4' (info taken
from Apple's driver Info.plist). Older Mac{Book,Mini,Pro}s seem
to all use the 'IRController1' device (USB id 05ac:8240) which
doesn't need those quirks.

Signed-off-by: Stelian Pop <stelian@popies.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2007-10-14 13:40:01 +02:00
Jiri Kosina 4dc21a8005 Input: add KEY_SPELLCHECK
HUT 1.12 defines Spell Check usage 0x1ab in Consumer page. There are
keyboards out there emitting this usage code (for example Microsoft
Natural Ergonomic Keyboard 4000). Add this key so that HID code could
map usages to it.

Acked-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2007-10-14 13:40:00 +02:00
David S. Miller 256c1df36b [SPARC64]: virt_irq --> bucket mapping no longer necessary
We used to need this to compute virt_irq --> ino, but that
is no longer necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-13 23:50:38 -07:00
David S. Miller bb74b734a6 [SPARC64]: Kill ugly __irq_ino() macro.
We have a place to stick INO information in the
virt_to_real_irq_table[], which is currently only used for VIRQs.
And that is readily accessible from the one __irq_ino() call site.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-13 23:27:48 -07:00
David S. Miller eb2d8d6032 [SPARC64]: Access ivector_table[] using physical addresses.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-13 21:53:15 -07:00
David S. Miller a650d3839e [SPARC64]: Make IVEC pointers 64-bit.
Currently we chain IVEC entries using 32-bit "pointers"
because we know that the ivector_table is in the main
kernel image, thus below 4GB.

This uses proper 64-bit pointers instead.

Whilst this bloats up the kernel image size, this sets
the infrastructure necessary to significantly shrink the
kernel size by using physical addresses and dynamically
allocating the ivector table.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-13 21:53:15 -07:00
David S. Miller 759f89e03c [SPARC64]: Consolidate MSI support code.
This also makes us use the MSI queues correctly.

Each MSI queue is serviced by a normal sun4u/sun4v INO interrupt
handler.  This handler runs the MSI queue and dispatches the
virtual interrupts indicated by arriving MSIs in that MSI queue.

All of the common logic is placed in pci_msi.c, with callbacks to
handle the PCI controller specific aspects of the operations.

This common infrastructure will make it much easier to add MSG
support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-13 21:53:13 -07:00
Robert Reif e8dd16129f [SPARC32]: Add irqflags.h to sparc32 and use it from generic code.
Added asm-sparc/irqflags.h and moved irq related code from system.h to it.
Renamed local_irq functions to raw_local_irq in irq.c.
Modified system.h to include linux/irqflags.h which includes asm/irqflags.h.
Added TRACE_IRQFLAGS_SUPPORT to Kconfig.debug.

This is the first step in adding IRQ-flags state tracing as outlined in
Documentation/irqflags-tracing.txt.  These changes should be harmless
because they just move things around and rename them.

The next step is making the lowlevel entry code modifications which
to be honest are beyond my capabilities at this point.

Boot tested on an ss20 running an SMP kernel.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-13 21:53:11 -07:00
David S. Miller 9bb3c227c4 [SPARC64]: Enable MSI on sun4u Fire PCI-E controllers.
The support code is identical to the hypervisor sun4v stuff,
just replacing the hypervisor calls with register reads and
writes in the Fire controller.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-13 21:53:09 -07:00
Peter Zijlstra 14358e6dda lockdep: annotate dir vs file i_mutex
On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote:
> The circular lock seems to be this:
> 
> #1:
> 
>   sys_mmap2:              down_write(&mm->mmap_sem);
>   nfs_revalidate_mapping: mutex_lock(&inode->i_mutex);
> 
> 
> #0:
> 
>   vfs_readdir:     mutex_lock(&inode->i_mutex);
>    - during the readdir (filldir64), we take a user fault (missing page?)
>     and call do_page_fault -
>   do_page_fault:   down_read(&mm->mmap_sem);
> 
> 
> So it does indeed look like a circular locking. Now the question is, "is
> this a bug?".  Looking like the inode of #1 must be a file or something
> else that you can mmap and the inode of #0 seems it must be a directory.
> I would say "no".
> 
> Now if you can readdir on a file or mmap a directory, then this could be
> an issue.
> 
> Otherwise, I'd love to see someone teach lockdep about this issue! ;-)

Make a distinction between file and dir usage of i_mutex.
The inode should be complete and unused at unlock_new_inode(), re-init
i_mutex depending on its type.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-14 01:38:33 +02:00
Peter Zijlstra d475fd428c lockdep: per filesystem inode lock class
Give each filesystem its own inode lock class. The various filesystems have
different locking order wrt the inode locks; esp. the pseudo filesystems differ
from the rest.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15 14:51:31 +02:00
David Brownell 6662cbb989 i2c: Rename the PEC functionality bit
Rename I2C_FUNC_SMBUS_HWPEC_CALC as I2C_FUNC_SMBUS_PEC, and list that
functionality as always available through the software implementation.
Update documentation accordingly (and list similar requirements).

The way it's currently packaged doesn't present the capability in a
useful way.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-10-13 23:56:33 +02:00
David Brownell 08fb68bb4b i2c: Move i2c-dev interfaces to i2c-dev.h
Move the i2c-dev support into <linux/i2c-dev.h> where it should always
have lived.  Now <linux/i2c.h> no longer holds stuff related to the
optional userspace /dev/i2c-X interface.  Improve the descriptions
for these ioctl requests.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-10-13 23:56:32 +02:00
David Brownell 53be795934 i2c: Remove i2c_algorithm.algo_control()
This removes:

 - An effectively unused hook:  i2c_algorithm.algo_control.

 - The i2c_control() call, used only by i2c-dev to call that
   unused hook or set two barely supported adapter params.

   (That param setting moves into i2c-dev.c ... still iffy
   due to lack of locking, but no other changes.)

As shown by diffstat, this is a net code shrink.  It also reduces the
complexity of the I2C adapter and /dev interfaces.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-10-13 23:56:32 +02:00
David Brownell a64ec07d3d i2c: Document struct i2c_msg
Clarify use of the I2C_M_* flags by highlighting the fact that
most of them depend on I2C_FUNC_PROTOCOL_MANGLING.

Also provide kerneldoc for i2c_smbus_read_block_data() and also
for "struct i2c_msg".

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-10-13 23:56:31 +02:00
Vladimir Barinov 95a7f10ead i2c: Add DaVinci I2C controller support
Signed-off-by: Vladimir Barinov <vbarinov@ru.mvista.com>
Acked-by: Trilok Soni <soni.trilok@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-10-13 23:56:30 +02:00
Adrian Bunk 83eaaed0d0 i2c-core: Make some code static
After the i2c-isa removal some code can become static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-10-13 23:56:30 +02:00
David Brownell 3bbb835d4c i2c: New-style devices can support driver model wakeup flags
We need to be able to flag I2C devices, such as RTCs, which can issue wake
events (usually through IRQ lines).  This adds an i2c_board_info.flags bit,
and uses it to initialize the i2c device node.  (And shrinks a few lines
that were overly long.)

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-10-13 23:56:29 +02:00
Jean Delvare cee37ae407 i2c: Kill struct i2c_device_id
I2C devices do not have any form of ID as PCI or USB devices have.
No driver uses "MODULE_DEVICE_TABLE(i2c, ...)" because it doesn't
make sense. So we can get rid of struct i2c_device_id and the
associated support code.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Greg KH <greg@kroah.com>
2007-10-13 23:56:29 +02:00
Linus Torvalds ffb035591f Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] SMP: Fix use of cpumasks.
  [MIPS] Revert "[MIPS] tlbex.c: Cleanup __init usage."
  [MIPS] CFE: Add missing parenthesis.
2007-10-13 10:14:25 -07:00
Linus Torvalds bcd11eaa22 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (27 commits)
  alim15x3: remove redundant m5229_revision check
  sc1200: fix ->dma_base equal zero handling
  cs5520: fix ->dma_base equal zero handling
  sgiioc4: add missing ->dma_base check
  cs5535: add missing ->dma_base check
  ide: remove CONFIG_IDEDMA_IVB config option
  ide: change master/slave IDENTIFY order
  ide: move ide_config_drive_speed() calls to upper layers (take 2)
  pdc202xx_new: check ide_config_drive_speed() return value
  cs5535: check ide_config_drive_speed() return value
  amd74xx/via82cxxx: check ide_config_drive_speed() return value
  au1xxx: fix au1xxx_set_pio_mode()
  icside: use ide_tune_dma()
  ide-pmac: fix PIO setup and enable autotune
  ide-pmac: use ide_tune_dma() (take 2)
  ide-pmac: remove pmac_ide_do_setfeature() (take 2)
  ide-pmac: remove nIEN clearing from pmac_ide_do_setfeature()
  ide-pmac: use __ide_wait_stat()
  ide-pmac: remove extra good status wait from pmac_ide_do_setfeature()
  ide: add __ide_wait_stat() helper
  ...
2007-10-13 10:13:27 -07:00
Linus Torvalds c8c55bcb43 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (91 commits)
  [MTD] [NAND] Blackfin on-chip NAND Flash Controller driver
  [MTD] [NOR] fix ctrl-alt-del can't reboot for intel flash bug
  [MTD] [NAND] Fix compiler warning in Alauda driver
  [JFFS2] Remove stray debugging printk
  [JFFS2] Handle dirents on the flash with embedded zero bytes in names.
  [JFFS2] Check for creation of dirents with embedded zero bytes in name.
  [JFFS2] Don't count all 'very dirty' blocks except in debug mode
  [JFFS2] Check whether garbage-collection actually obsoleted its victim.
  [JFFS2] Relax threshold for triggering GC due to dirty blocks.
  [MTD] [OneNAND] Fix typo related with recent commit
  [JFFS2] Trigger garbage collection when very_dirty_list size becomes excessive
  [MTD] [NAND] Avoid deadlock in erase callback; release chip lock first.
  [MTD] [NAND] Resume method for CAFÉ NAND controller
  [MTD] [NAND] Fix PCI ident table for CAFÉ NAND controller.
  [MTD] [NAND] s3c2410: fix arch moves
  [MTD] [OneNAND] fix numerous races
  [MTD] map driver for NOR flash on the Intel Vermilion Range chipset
  [JFFS2] Fix unpoint length
  [MTD] fix CFI point method for discontiguous maps
  [MTD] MAPS: Merge Lubbock and Mainstone drivers into common PXA2xx driver
  ...
2007-10-13 10:12:15 -07:00
Linus Torvalds 3749c66c67 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (106 commits)
  KVM: Replace enum by #define
  KVM: Skip pio instruction when it is emulated, not executed
  KVM: x86 emulator: popf
  KVM: x86 emulator: fix src, dst value initialization
  KVM: x86 emulator: jmp abs
  KVM: x86 emulator: lea
  KVM: X86 emulator: jump conditional short
  KVM: x86 emulator: imlpement jump conditional relative
  KVM: x86 emulator: sort opcodes into ascending order
  KVM: Improve emulation failure reporting
  KVM: x86 emulator: pushf
  KVM: x86 emulator: call near
  KVM: x86 emulator: push imm8
  KVM: VMX: Fix exit qualification width on i386
  KVM: Move main vcpu loop into subarch independent code
  KVM: VMX: Move vm entry failure handling to the exit handler
  KVM: MMU: Don't do GFP_NOWAIT allocations
  KVM: Rename kvm_arch_ops to kvm_x86_ops
  KVM: Simplify memory allocation
  KVM: Hoist SVM's get_cs_db_l_bits into core code.
  ...
2007-10-13 10:02:11 -07:00
Al Viro 23ec23c2d3 fix sparc32 breakage (result of vmlinux.lds.S bug)
In commit 4665079cbb ("[NETNS]: Move some
code into __init section when CONFIG_NET_NS=n") we got a new section -
.exit.text.refok (more of 'let's tell modpost that some bogus calls are
not bogus', a-la text.init.refok).

Unfortunately, the commit in question forgot to add it to TEXT_TEXT,
with rather amusing results.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-13 09:58:59 -07:00
Al Viro 13bcd5d0e2 v4l: copy_to_user() is not a good method name
Breaks on any target that has copy_to_user() defined as a non-trivial
macro.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-13 09:58:59 -07:00
Al Viro 2b8232ce51 minimal build fixes for uml (fallout from x86 merge)
a) include/asm-um/arch can't just point to include/asm-$(SUBARCH) now
 b) arch/{i386,x86_64}/crypto are merged now
 c) subarch-obj needed changes
 d) cpufeature_64.h should pull "cpufeature_32.h", not <asm/cpufeature_32.h>
    since it can be included from asm-um/cpufeature.h
 e) in case of uml-i386 we need CONFIG_X86_32 for make and gcc, but not
    for Kconfig
 f) sysctl.c shouldn't do vdso_enabled for uml-i386 (actually, that one
    should be registered from corresponding arch/*/kernel/*, with ifdef
    going away; that's a separate patch, though).

With that and with Stephen's patch ("[PATCH net-2.6] uml: hard_header fix")
we have uml allmodconfig building both on i386 and amd64.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-13 09:57:15 -07:00
Randy Dunlap c4ea43c552 net core: fix kernel-doc for new function parameters
Fix networking code kernel-doc for newly added parameters.

Warning(linux-2.6.23-git2//net/core/sock.c:879): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:570): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:594): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:617): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:641): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:667): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:722): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:959): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:1195): No description found for parameter 'dev'
Warning(linux-2.6.23-git2//net/core/dev.c:2105): No description found for parameter 'n'
Warning(linux-2.6.23-git2//net/core/dev.c:3272): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:3445): No description found for parameter 'net'
Warning(linux-2.6.23-git2//include/linux/netdevice.h:1301): No description found for parameter 'cpu'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-13 09:52:26 -07:00
Linus Torvalds ecaedfa385 Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh64-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh64-2.6:
  sh64: mach-cayman: Build fixes.
  sh64: Symbol export fixups.
  sh64: linker script tidying and alignment fixups.
  sh64: Set KBUILD_IMAGE to make the rpm target happy.
  sh64: Kill off obsolete linux/blk.h reference.
  sh64: cleanup struct irqaction initializers.
  sh64: Kill off dead gdb stub symbol.
  sh64: alphanumeric display only on Cayman.
  sh64: Add defconfigs for mach-sim and mach-harp.
  sh64: update cayman defconfig.
  sh64: Tidy up Kconfig dependencies.
  sh64: Move consistent DMA routines to arch/sh64/mm/.
  sh64: Some symbol exports and build fixes.
  sh64: mach-sim: Build fixes.
  sh64: mach-harp: Build fixes.
  sh64: Kill off duplicate frame pointer option.
  sh64: Kill off dead ROM-RAM and generic boards.
  sh64: Tidy up includes for Cayman board.
  sh64: Move *_p() I/O routine variants to io.h.
2007-10-13 09:50:26 -07:00
Linus Torvalds dcf397f037 Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (124 commits)
  sh: allow building for both r2d boards in same binary.
  sh: fix r2d board detection
  sh: Discard .exit.text/.exit.data at runtime.
  sh: Fix up some section alignments in linker script.
  sh: Fix SH-4 DMAC CHCR masking.
  sh: Rip out left-over nommu cond syscall cruft.
  sh: Make kgdb i-cache flushing less inept.
  sh: kgdb section mismatches and tidying.
  sh: cleanup struct irqaction initializers.
  sh: early_printk tidying.
  video: pvr2fb: Add TV (RGB) support to Dreamcast PVR driver.
  sh: Conditionalize gUSA support.
  sh: Follow gUSA preempt changes in __switch_to().
  sh: Tidy up gUSA preempt handling.
  sh: __copy_user() optimizations for small copies.
  sh: clkfwk: Support multi-level clock propagation.
  sh: Fix URAM start address on SH7785.
  sh: Use boot_cpu_data for CPU probe.
  sh: Support extended mode TLB on SH-X3.
  sh: Bump MAX_ACTIVE_REGIONS for SH7785.
  ...
2007-10-13 09:49:04 -07:00
Matthew Wilcox e92042e5c0 m68k: Export cachectl.h
libffi in GCC 4.2 needs cachectl.h to do its cache flushing.  But we
don't currently export it.  I believe this patch should do the trick.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-13 09:41:03 -07:00
Geert Uytterhoeven a55444494d m68k: ignore restart_syscall
m68k: ignore restart_syscall, which is not needed on m68k.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-13 09:41:03 -07:00
Bartlomiej Zolnierkiewicz 88b2b32bab ide: move ide_config_drive_speed() calls to upper layers (take 2)
* Convert {ide_hwif_t,ide_pci_device_t}->host_flag to be u16.

* Add IDE_HFLAG_POST_SET_MODE host flag to indicate the need to program 
  the host for the transfer mode after programming the device.  Set it
  in au1xxx-ide, amd74xx, cs5530, cs5535, pdc202xx_new, sc1200, pmac
  and via82cxxx host drivers.

* Add IDE_HFLAG_NO_SET_MODE host flag to indicate the need to completely
  skip programming of host/device for the transfer mode ("smart" hosts).
  Set it in it821x host driver and check it in ide_tune_dma().

* Add ide_set_pio_mode()/ide_set_dma_mode() helpers and convert all
  direct ->set_pio_mode/->speedproc users to use these helpers.

* Move ide_config_drive_speed() calls from ->set_pio_mode/->speedproc
  methods to callers.

* Rename ->speedproc method to ->set_dma_mode, make it void and update
  all implementations accordingly.

* Update ide_set_xfer_rate() comments.

* Unexport ide_config_drive_speed().

v2:
* Fix issues noticed by Sergei:
  - export ide_set_dma_mode() instead of moving ->set_pio_mode abuse wrt
    to setting DMA modes from sc1200_set_pio_mode() to do_special()
  - check IDE_HFLAG_NO_SET_MODE in ide_tune_dma()
  - check for (hwif->set_pio_mode) == NULL in ide_set_pio_mode()
  - check for (hwif->set_dma_mode) == NULL in ide_set_dma_mode()
  - return -1 from ide_set_{pio,dma}_mode() if ->set_{pio,dma}_mode == NULL
  - don't set ->set_{pio,dma}_mode on it821x in "smart" mode
  - fix build problem in pmac.c
  - minor fixes in au1xxx-ide.c/cs5530.c/siimage.c
  - improve patch description

Changes in behavior caused by this patch:
- HDIO_SET_PIO_MODE ioctl would now return -ENOSYS for attempts to change
  PIO mode if it821x controller is in "smart" mode
- removal of two debugging printk-s (from cs5530.c and sc1200.c)
- transfer modes 0x00-0x07 passed from user space may be programmed twice on
  the device (not really an issue since 0x00 is not supported correctly by
  any host driver ATM, 0x01 is not supported at all and 0x02-0x07 are invalid)

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-13 17:47:51 +02:00
Bartlomiej Zolnierkiewicz 75d7d963e3 icside: use ide_tune_dma()
* Add "good DMA drives" hack for icside to ide-dma.c::ide_find_dma_mode()
  (in the long-term it should be either removed or generalized for all hosts).

* Use ide_tune_dma() in icside.c::icside_dma_check().

  This results in the following changes in behavior:
  - pre-EIDE SWDMA modes are now also respected
  - drive->autodma is checked instead of hwif->autodma
    (doesn't really matter as icside sets both to "1")

* Make ide-dma.c::__ide_dma_good_drive() static and drop "__" prefix.

Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-13 17:47:50 +02:00
Bartlomiej Zolnierkiewicz aedea5910c ide-pmac: remove pmac_ide_do_setfeature() (take 2)
Use ide_config_drive_speed() instead of pmac_ide_do_setfeature() and remove
the latter, also  ide-iops.c::__ide_wait_stat() could be static again.

Since for IDE PMAC host driver IDE_CONTROL_REG is always true, device's
->quirk_list is always zero and ->ide_dma_host_{on,off} are nops than
the only changes in behavior are:

* if PIO mode is set then ->dma_off_queitly is called to disable DMA

* if setting transfer mode fails ide_dump_status() is called to dump status

v2:
* IDE PMAC controllers allow separate PIO and DMA timings and PPC userland
  depends on this fact, and calls "hdparm -p" without calling "hdparm -d".

  Therefore to compensate for DMA being disabled by ide_config_drive_speed()
  for PIO modes:

  - add IDE_HFLAG_SET_PIO_MODE_KEEP_DMA flag and set it in PMAC host driver

  - add handling of the new flag to ide-io.c::do_special()

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-13 17:47:50 +02:00
Bartlomiej Zolnierkiewicz ddf151026a ide-pmac: use __ide_wait_stat()
* Use __ide_wait_stat() instead of wait_for_ready() in pmac_ide_do_setfeature().

While at it do following changes to match __ide_wait_stat() call in
ide_config_drive_speed():

* Wait WAIT_CMD time (20 sec) instead of 2 sec for device to clear BUSY_STAT.

* Check DRQ_STAT bit (shouldn't be set for good device status).

Also remove no longer needed wait_for_ready() from ide-iops.c.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-13 17:47:49 +02:00
Bartlomiej Zolnierkiewicz 74af21cf4d ide: add __ide_wait_stat() helper
* Split off checking of the status register from ide_wait_stat() to
  __ide_wait_stat() helper.

* Use the new helper in ide_config_drive_speed().  The only change in the
  functionality is that the function now fails if after 20 sec (WAIT_CMD)
  device is still busy (BUSY_STAT bit is set) while previously instead of
  failing the function continued with checking for the correct device status
  (which would give the device additional 10 usec to clear BUSY_STAT bit).

* Remove stale comment for ide_config_drive_speed().

* Remove duplicate comment for ide_wait_stat() from <linux/ide.h>.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-13 17:47:49 +02:00
David Woodhouse ebf8889bd1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2007-10-13 14:58:23 +01:00
David Woodhouse b160292cc2 Merge Linux 2.6.23 2007-10-13 14:43:54 +01:00
Bryan Wu b37bde1478 [MTD] [NAND] Blackfin on-chip NAND Flash Controller driver
This is the driver for latest Blackfin on-chip nand flash controller

 - use nand_chip and mtd_info common nand driver interface
 - provide both PIO and dma operation
 - compiled with ezkit bf548 configuration
 - use hardware 1-bit ECC
 - tested with YAFFS2 and can mount YAFFS2 filesystem as rootfs

ChangeLog from try#1
 - use hweight32() instead of count_bits()
 - replace bf54x with bf5xx and BF54X with BF5XX
 - compare against plat->page_size in 2 cases when enable hardware ECC

ChangeLog from try#2
 - passed nand_test suites
 - use cpu_relax() instead of busy wait loop
 - some coding style issue pointed out by Andrew

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-10-13 14:36:49 +01:00
Kevin Hao c4a9f88daf [MTD] [NOR] fix ctrl-alt-del can't reboot for intel flash bug
When we press ctrl-alt-del,kernel_restart_prepare will invoke 
cfi_intelext_reboot which will set flash to read array mode, but later 
when device_shutdown is invoked which may put current work queue to 
sleep and other process may be scheduled to running and programming 
flash in not FL_READY mode again. So we can't boot up if this flash is 
used for bootloader.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-10-13 14:36:18 +01:00
Avi Kivity 8a45450d0a KVM: Replace enum by #define
Easier for existence test (#ifdef) in userspace.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:29 +02:00
Eddie Dong 96ad2cc613 KVM: in-kernel LAPIC save and restore support
This patch adds a new vcpu-based IOCTL to save and restore the local
apic registers for a single vcpu. The kernel only copies the apic page as
a whole, extraction of registers is left to userspace side. On restore, the
APIC timer is restarted from the initial count, this introduces a little
delay, but works fine.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:25 +02:00
He, Qing 6bf9e962d1 KVM: in-kernel IOAPIC save and restore support
This patch adds support for in-kernel ioapic save and restore (to
and from userspace). It uses the same get/set_irqchip ioctl as
in-kernel PIC.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:25 +02:00
He, Qing 6ceb9d791e KVM: Add get/set irqchip ioctls for in-kernel PIC live migration support
This patch adds two new ioctls to dump and write kernel irqchips for
save/restore and live migration. PIC s/r and l/m is implemented in this
patch.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:25 +02:00
Eddie Dong b6958ce44a KVM: Emulate hlt in the kernel
By sleeping in the kernel when hlt is executed, we simplify the in-kernel
guest interrupt path considerably.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:25 +02:00
Eddie Dong 97222cc831 KVM: Emulate local APIC in kernel
Because lightweight exits (exits which don't involve userspace) are many
times faster than heavyweight exits, it makes sense to emulate high usage
devices in the kernel.  The local APIC is one such device, especially for
Windows and for SMP, so we add an APIC model to kvm.

It also allows in-kernel host-side drivers to inject interrupts without
going through userspace.

[compile fix on i386 from Jindrich Makovicka]

Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:25 +02:00
Eddie Dong 85f455f7dd KVM: Add support for in-kernel PIC emulation
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:24 +02:00
Yang, Sheng 253abdee5e KVM: Communicate cr8 changes to userspace
This allows running 64-bit Windows.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:23 +02:00
Jeff Dike 519ef35341 KVM: add hypercall nr to kvm_run
Add the hypercall number to kvm_run and initialize it.  This changes the ABI,
but as this particular ABI was unusable before this no users are affected.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Rusty Russell 7075bc816c KVM: Use standard CR8 flags, and fix TPR definition
Intel manual (and KVM definition) say the TPR is 4 bits wide.  Also fix
CR8_RESEVED_BITS typo.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Rusty Russell 9eb829ced8 KVM: Trivial: Use standard BITMAP macros, open-code userspace-exposed header
Creating one's own BITMAP macro seems suboptimal: if we use manual
arithmetic in the one place exposed to userspace, we can use standard
macros elsewhere.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Rusty Russell dea8caee7b KVM: Trivial: /dev/kvm interface is no longer experimental.
KVM interface is no longer experimental.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:17 +02:00
Avi Kivity 24cbc7e9cb KVM: Future-proof the exit information union ABI
Note that as the size of struct kvm_run is not part of the ABI, we can add
things at the end.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:17 +02:00
Avi Kivity 81fe96bde7 i386: Expose IOAPIC register definitions even if CONFIG_X86_IO_APIC is not set
KVM reuses the IOAPIC register definitions, and needs them even if the
host is not compiled with IOAPIC support.  Move the #ifdef below so that only
the IOAPIC variables and functions are protected, and the register definitions
are available to all.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:17 +02:00
Michael Hennerich 8f740ef391 Input: add support for Blackfin BF54x Keypad controller
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2007-10-13 00:36:46 -04:00
Nick Piggin b6c7347fff x86: optimise barriers
According to latest memory ordering specification documents from Intel
and AMD, both manufacturers are committed to in-order loads from
cacheable memory for the x86 architecture.  Hence, smp_rmb() may be a
simple barrier.

Also according to those documents, and according to existing practice in
Linux (eg.  spin_unlock doesn't enforce ordering), stores to cacheable
memory are visible in program order too.  Special string stores are safe
-- their constituent stores may be out of order, but they must complete
in order WRT surrounding stores.  Nontemporal stores to WB memory can go
out of order, and so they should be fenced explicitly to make them
appear in-order WRT other stores.  Hence, smp_wmb() may be a simple
barrier.

    http://developer.intel.com/products/processor/manuals/318147.pdf
    http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf

In userspace microbenchmarks on a core2 system, fence instructions range
anywhere from around 15 cycles to 50, which may not be totally
insignificant in performance critical paths (code size will go down
too).

However the primary motivation for this is to have the canonical barrier
implementation for x86 architecture.

smp_rmb on buggy pentium pros remains a locked op, which is apparently
required.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-12 18:41:21 -07:00
Nick Piggin 4071c71855 x86: fix IO write barrier
wmb() on x86 must always include a barrier, because stores can go out of
order in many cases when dealing with devices (eg. WC memory).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-12 18:41:21 -07:00
Dmitry Torokhov b981d8b3f5 Merge master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/macintosh/adbhid.c
2007-10-12 21:27:47 -04:00
Ralf Baechle 57f17e8e15 [MIPS] CFE: Add missing parenthesis.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-13 00:53:00 +01:00
Linus Torvalds ab9c232286 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (119 commits)
  [libata] struct pci_dev related cleanups
  libata: use ata_exec_internal() for PMP register access
  libata: implement ATA_PFLAG_RESETTING
  libata: add @timeout to ata_exec_internal[_sg]()
  ahci: fix notification handling
  ahci: clean up PORT_IRQ_BAD_PMP enabling
  ahci: kill leftover from enabling NCQ over PMP
  libata: wrap schedule_timeout_uninterruptible() in loop
  libata: skip suppress reporting if ATA_EHI_QUIET
  libata: clear ehi description after initial host report
  pata_jmicron: match vendor and class code only
  libata: add ST9160821AS / 3.ALD to NCQ blacklist
  pata_acpi: ACPI driver support
  libata-core: Expose gtm methods for driver use
  libata: add HDT722516DLA380 to NCQ blacklist
  libata: blacklist NCQ on Seagate Barracuda ST380817AS
  [libata] Turn on ACPI by default
  libata_scsi: Fix ATAPI transfer lengths
  libata: correct handling of SRST reset sequences
  libata: Integrate ACPI-based PATA/SATA hotplug - version 5
  ...
2007-10-12 16:16:41 -07:00
Linus Torvalds 6a84258e5f Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (37 commits)
  PCI: merge almost all of pci_32.h and pci_64.h together
  PCI: X86: Introduce and enable PCI domain support
  PCI: Add 'nodomains' boot option, and pci_domains_supported global
  PCI: modify PCI bridge control ISA flag for clarity
  PCI: use _CRS for PCI resource allocation
  PCI: avoid P2P prefetch window for expansion ROMs
  PCI: skip ISA ioresource alignment on some systems
  PCI: remove transparent bridge sizing
  pci: write file size to inode on proc bus file write
  pci: use size stored in proc_dir_entry for proc bus files
  pci: implement "pci=noaer"
  PCI: fix IDE legacy mode resources
  MSI: Use correct data offset for 32-bit MSI in read_msi_msg()
  PCI: Fix incorrect argument order to list_add_tail() in PCI dynamic ID code
  PCI: i386: Compaq EVO N800c needs PCI bus renumbering
  PCI: Remove no longer correct documentation regarding MSI vector assignment
  PCI: re-enable onboard sound on "MSI K8T Neo2-FIR"
  PCI: quirk_vt82c586_acpi: Omit reading PCI revision ID
  PCI: quirk amd_8131_mmrbc: Omit reading pci revision ID
  cpqphp: Use PCI_CLASS_REVISION instead of PCI_REVISION_ID for read
  ...
2007-10-12 15:50:23 -07:00
Linus Torvalds efefc6eb38 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits)
  PM: merge device power-management source files
  sysfs: add copyrights
  kobject: update the copyrights
  kset: add some kerneldoc to help describe what these strange things are
  Driver core: rename ktype_edd and ktype_efivar
  Driver core: rename ktype_driver
  Driver core: rename ktype_device
  Driver core: rename ktype_class
  driver core: remove subsystem_init()
  sysfs: move sysfs file poll implementation to sysfs_open_dirent
  sysfs: implement sysfs_open_dirent
  sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir
  sysfs: make sysfs_root a regular directory dirent
  sysfs: open code sysfs_attach_dentry()
  sysfs: make s_elem an anonymous union
  sysfs: make bin attr open get active reference of parent too
  sysfs: kill unnecessary NULL pointer check in sysfs_release()
  sysfs: kill unnecessary sysfs_get() in open paths
  sysfs: reposition sysfs_dirent->s_mode.
  sysfs: kill sysfs_update_file()
  ...
2007-10-12 15:49:37 -07:00
Linus Torvalds 117494a1b6 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6: (142 commits)
  USB: fix race in autosuspend reschedule
  atmel_usba_udc: Keep track of the device status
  USB: Nikon D40X unusual_devs entry
  USB: serial core should respect driver requirements
  USB: documentation for USB power management
  USB: skip autosuspended devices during system resume
  USB: mutual exclusion for EHCI init and port resets
  USB: allow usbstorage to have LUNS greater than 2Tb
  USB: Adding support for SHARP WS011SH to ipaq.c
  USB: add atmel_usba_udc driver
  USB: ohci SSB bus glue
  USB: ehci build fixes on au1xxx, ppc-soc
  USB: add runtime frame_no quirk for big-endian OHCI
  USB: funsoft: Fix termios
  USB: visor: termios bits
  USB: unusual_devs entry for Nikon DSC D2Xs
  USB: re-remove <linux/usb_sl811.h>
  USB: move <linux/usb_gadget.h> to <linux/usb/gadget.h>
  USB: Export URB statistics for powertop
  USB: serial gadget: Disable endpoints on unload
  ...
2007-10-12 15:49:10 -07:00
Russell King 92633b72d1 Merge branches 'omap1-upstream' and 'omap2-upstream' into devel 2007-10-12 23:44:35 +01:00
Mike Westerhof 033b8ffe3f [ARM] 4599/1: Preserve ATAG list for use with kexec (2.6.23)
This patch resolves a kexec boot failure that can occur because
no ATAGs are passed in to the kexec'd kernel. Currently the
newly-kexec'd kernel may fail if it requires specific ATAGs, or
it may fail because the fixed memory location at which it expects
to find the ATAGs may contain random data instead of ATAGs.

The patch ensures that any ATAGs passed to the current kernel
at boot time are copied to a static buffer, and are copied back
when kexec copies the new kernel into place. Thus the new
kernel sees the same ATAGs from kexec and the boot loader.

The boot parameters are copied without regard to type, content,
or length -- this patch's scope is limited soley to saving and
restoring a fixed-size block of memory containing the kernel's
boot parameters. Additional functionality to examine, alter, or
replace the ATAGs (using kexec, for example) can be implemented
by manipulating the static buffer containing the preserved ATAGs.

Note: the size of the buffer (1.5KB) is selected to comfortably
hold one of each ATAG type, including a maximum-length command
line and the maximum number of ATAG_MEM structures currently
supported by the kernel. Should an ATAG list exceed that limit,
the list will be silently truncated to that limit (to do other-
wise at that point in the boot process would make a simple
problem exceedingly complicated).

[Note: this is the same patch as 4579, modified to accomodate
the ATAG changes introduced in 2.6.23]

Signed-off-by: Mike Westerhof <mwester at dls.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-10-12 23:43:48 +01:00
Russell King 84aa462e2c [ARM] Rename consistent_sync() as dma_cache_maint()
consistent_sync() is used to handle the cache maintainence issues with
DMA operations.  Since we've now removed the misuse of this function
from the two MTD drivers, rename it to prevent future mis-use.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-10-12 23:43:45 +01:00
Ben Dooks f3fb5a556c [ARM] 4596/1: S3C2412: Correct IRQs for SDI+CF and add decoding support
Fix the IRQ numbers of the CF and SDI interface on the S3C2412
and S3C2413. Add support to handle these IRQs properly and
ensure that the SDI controller platform device is correctly
renumbered.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-10-12 23:43:42 +01:00