Commit Graph

3190 Commits

Author SHA1 Message Date
Nathan Scott f5faad7994 [XFS] Fix remount vs no/barrier options by ensuring we clear unwanted
flags from iclog buffers before submitting them for writing.

SGI-PV: 954772
SGI-Modid: xfs-linux-melb:xfs-kern:26605a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-07-28 17:04:44 +10:00
Christoph Hellwig 2a293b7d5a [XFS] All xfs_disk_dquot_t values are (as the name says) disk endian.
Before putting them into struct statfs they should be endian-swapped.

SGI-PV: 954580
SGI-Modid: xfs-linux-melb:xfs-kern:26550a

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-07-28 17:04:26 +10:00
Dave Kleikamp 115ff50bad JFS: Quota support broken, no quota_read and quota_write
jfs_quota_read/write are very near duplicates of ext2_quota_read/write.

Cleaned up jfs_get_block as long as I had to change it to be non-static.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2006-07-26 14:52:13 -05:00
Linus Torvalds b20e481ab5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  JFS: commit_mutex cleanups
2006-07-15 14:43:30 -07:00
Linus Torvalds 6d76fa58b0 Don't allow chmod() on the /proc/<pid>/ files
This just turns off chmod() on the /proc/<pid>/ files, since there is no
good reason to allow it, and had we disallowed it originally, the nasty
/proc race exploit wouldn't have been possible.

The other patches already fixed the problem chmod() could cause, so this
is really just some final mop-up..

This particular version is based off a patch by Eugene and Marcel which
had much better naming than my original equivalent one.

Signed-off-by: Eugene Teo <eteo@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-15 12:26:45 -07:00
Linus Torvalds 92d032855e Mark /proc MS_NOSUID and MS_NOEXEC
Not that we really need this any more, but at the same time there's no
reason not to do this.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-15 12:20:05 -07:00
Shailabh Nagar 2589045466 [PATCH] per-task-delay-accounting: /proc export of aggregated block I/O delays
Export I/O delays seen by a task through /proc/<tgid>/stats for use in top
etc.

Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed
together with all other I/O here (this is not the case in the netlink
interface where the swapin I/O is kept distinct)

[akpm@osdl.org: printk warning fix]
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:57 -07:00
Rolf Eike Beer d247e2c661 [PATCH] add function documentation for register_chrdev()
Documentation for register_chrdev() was missing completely.

[akpm@osdl.org: kerneldocification]
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:54 -07:00
Jeff Mahoney 6fbe82a952 [PATCH] reiserfs: fix handling of device names with /'s in them
On systems with block devices containing a slash (virtual dasd, cciss,
etc), reiserfs will fail to initialize /proc/fs/reiserfs/<dev> due to it
being interpreted as a subdirectory.  The generic block device code changes
the / to !  for use in the sysfs tree.  This patch uses that convention.

Tested by making dm devices use dm/<number> rather than dm-<number>

[akpm@osdl.org: name variables consistently]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:54 -07:00
Kirill Korotaev de45921535 [PATCH] struct file leakage
2.6.16 leaks like hell. While testing, I found massive leakage
(reproduced in openvz) in:

*filp
*size-4096

And 1 object leaks in
*size-32
*size-64
*size-128

It is the fix for the first one.  filp leaks in the bowels of namei.c.

Seems, size-4096 is file table leaking in expand_fdtables.

I have no idea what are the rest and why they show only accompanying
another leaks.  Some debugging structs?

[akpm@osdl.org, Trond: remove the IS_ERR() check]
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: <stable@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Linus Torvalds 9ee8ab9fbf Relax /proc fix a bit
Clearign all of i_mode was a bit draconian. We only really care about
S_ISUID/ISGID, after all.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:48:03 -07:00
Linus Torvalds 18b0bbd8ca Fix nasty /proc vulnerability
We have a bad interaction with both the kernel and user space being able
to change some of the /proc file status.  This fixes the most obvious
part of it, but I expect we'll also make it harder for users to modify
even their "own" files in /proc.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 16:51:34 -07:00
Linus Torvalds 0d10e47f96 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] CIFS_DEBUG2 depends on CIFS
2006-07-13 16:38:58 -07:00
Andrew Morton a29b0b74e7 [PATCH] alloc_fdtable() expansion fix
We're supposed to go the next power of two if nfds==nr.

Of `nr', not of `nfsd'.

Spotted by Rene Scharfe <rene.scharfe@lsrfire.ath.cx>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-12 12:52:55 -07:00
Adam B. Jerome 0635170b54 [PATCH] /fs/proc/: 'larger than buffer size' memory accessed by clear_user()
Address a potential 'larger than buffer size' memory access by
clear_user().  Without this patch, this call to clear_user() can attempt to
clear too many (tsz) bytes resulting in a wrong (-EFAULT) return code by
read_kcore().

Signed-off-by: Adam B. Jerome <abj@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-12 12:52:55 -07:00
Arjan van de Ven 232ba9dbd6 [PATCH] lockdep: annotate the sysfs i_mutex to be a separate class
sysfs has a different i_mutex lock order behavior for i_mutex than the
other filesystems; sysfs i_mutex is called in many places with subsystem
locks held.  At the same time, many of the VFS locking rules do not apply
to sysfs at all (cross directory rename for example).  To untangle this
mess (which gives false positives in lockdep), we're giving sysfs inodes
their own class for i_mutex.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-12 12:52:54 -07:00
Kirill Korotaev d579091b43 [PATCH] fix fdset leakage
When found, it is obvious.  nfds calculated when allocating fdsets is
rewritten by calculation of size of fdtable, and when we are unlucky, we
try to free fdsets of wrong size.

Found due to OpenVZ resource management (User Beancounters).

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-12 12:52:54 -07:00
Linus Torvalds 826adfe49a Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: fix problems with sys_tee()
2006-07-12 08:14:48 -07:00
Shankar Anand e2b209509c [PATCH] knfsd: nfsd4: add per-operation server stats
Add an nfs4 operations count array to nfsd_stats structure.  The count is
incremented in nfsd4_proc_compound() where all the operations are handled
by the nfsv4 server.  This count of individual nfsv4 operations is also
entered into /proc filesystem.

Signed-off-by: Shankar Anand<shanand@novell.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:27 -07:00
Andreas Gruenbacher 36cf96f5e7 [PATCH] Remove leftover ext3 acl declarations
These functions no longer exist; remove their declarations.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:26 -07:00
Andrew Morton 92eb7a2f28 [PATCH] fix weird logic in alloc_fdtable()
There's a fairly obvious infinite loop in there.

Also, use roundup_pow_of_two() rather than open-coding stuff.

Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:25 -07:00
David Howells 6d8c4e3b01 [PATCH] FDPIC: Add coredump capability for the ELF-FDPIC binfmt
Add coredump capability for the ELF-FDPIC binfmt.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:22 -07:00
David Howells b4cac1a022 [PATCH] FDPIC: Move roundup() into linux/kernel.h
Move the roundup() macro from binfmt_elf.c into linux/kernel.h as it's
generally useful.

[akpm@osdl.org: nuke all the other implementations]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:22 -07:00
David Howells 8a2ab7f5df [PATCH] FDPIC: Adjust the ELF-FDPIC driver to conform more to the CodingStyle
Adjust the ELF-FDPIC binfmt driver to conform much more to the CodingStyle,
silly though it may be.

Further changes:

 (*) Drop the casts to long for addresses in kdebug() statements (they're
     unsigned long already).

 (*) Use extra variables to avoid expressions longer than 80 chars by splitting
     the statement into multiple statements and letting the compiler optimise
     them back together.

 (*) Eliminate duplicate call of ksize() when working out how much space was
     actually allocated for the stack.

 (*) Discard the commented-out load_shlib prototype and op pointer as this will
     not be supported in ELF-FDPIC for the foreseeable future.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:21 -07:00
David Howells 21ff821630 [PATCH] NOMMU: Fix execution off of ramfs with mmap()
Fix execution through the FDPIC binfmt of programs stored on ramfs by
preventing the ramfs mmap() returning successfully on a private mapping of
a ramfs file.  This causes NOMMU mmap to make a copy of the mapped portion
of the file and map that instead.

This could be improved by granting direct mapping access to read-only
private mappings for which the data is stored on a contiguous run of pages.
 However, this is only likely to be the case if the file was extended with
truncate before being written.

ramfs is left to map the file directly for shared mappings so that SYSV IPC
and POSIX shared memory both still work.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:21 -07:00
David Howells 1aeb21d626 [PATCH] FDPIC: Fix FDPIC compile errors
Fix FDPIC compile errors.

(akpm: we suspect it fixes a warning)

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:21 -07:00
Zhang, Yanmin b6174df5ee [PATCH] mmap zero-length hugetlb file with PROT_NONE to protect a hugetlb virtual area
Sometimes, applications need below call to be successful although
"/mnt/hugepages/file1" doesn't exist.

fd = open("/mnt/hugepages/file1", O_CREAT|O_RDWR, 0755);
*addr = mmap(NULL, 0x1024*1024*256, PROT_NONE, 0, fd, 0);

As for regular pages (or files), above call does work, but as for huge
pages, above call would fail because hugetlbfs_file_mmap would fail if
(!(vma->vm_flags & VM_WRITE) && len > inode->i_size).

This capability on huge page is useful on ia64 when the process wants to
protect one area on region 4, so other threads couldn't read/write this
area.  A famous JVM (Java Virtual Machine) implementation on IA64 needs the
capability.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Hugh Dickins <hugh@veritas.com>
[ Expand-on-mmap semantics again... this time matching normal fs's. wli ]
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:21 -07:00
Adrian Bunk 69c3a5b8fd [PATCH] fs/read_write.c: EXPORT_UNUSED_SYMBOL
This patch marks an unused export as EXPORT_UNUSED_SYMBOL.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:18 -07:00
Peter Oberparleiter 25e206b54b [PATCH] partitions: let partitions inherit policy from disk
Change the partition code in fs/partitions/check.c to initialize a newly
detected partition's policy field with that of the containing block device
(see patch below).

My reasoning is that function set_disk_ro() in block/genhd.c modifies the
policy field (read-only indicator) of a disk and all contained partitions.
When a partition is detected after the call to set_disk_ro(), the policy
field of this partition will currently not inherit the disk's policy field.
 This behavior poses a problem in cases where a block device can be
'logically de- and reactivated' like e.g.  the s390 DASD driver because
partition detection may run after the policy field has been modified.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Makes-sense-to: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:14 -07:00
Hisashi Hifumi 73ce5934e2 [PATCH] reiserfs: fix journaling issue regarding fsync()
When write() extends a file(i_size is increased) and fsync() is called,
change of inode must be written to journaling area through fsync().
But,currently the i_trans_id is not correctly updated when i_size is
increased.  So fsync() does not kick the journal writer.

Reiserfs_file_write() already updates the transaction when blocks are
allocated, but the case when i_size increases and new blocks are not added
is not correctly treated.

Following patch fix this bug.

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Cc: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:13 -07:00
Jens Axboe aadd06e5c5 [PATCH] splice: fix problems with sys_tee()
Several issues noticed/fixed:

- We cannot reliably block in link_pipe() while holding both input and output
  mutexes. So do preparatory checks before locking down both mutexes and doing
  the link.

- The ipipe->nrbufs vs i check was bad, because we could have dropped the
  ipipe lock in-between. This causes us to potentially look at unknown
  buffers if we were racing with someone else reading this pipe.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-07-10 11:00:01 +02:00
Steve French 8ba10ab128 [CIFS] CIFS_DEBUG2 depends on CIFS
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-07-08 02:17:40 +00:00
Trond Myklebust 72dbac37e3 Merge branch 'locks' 2006-07-05 13:19:25 -04:00
Trond Myklebust 4e0641a7ad NFS: Optimise away an excessive GETATTR call when a file is symlinked
In the case when compiling via a symlink tree, we want to ensure that the
close-to-open GETATTR call is applied only to the final file, and not to
the symlink.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-07-05 13:17:13 -04:00
Trond Myklebust 83715ad54f NFS: Fix NFS page_state usage
The introduction of the FLUSH_INVALIDATE argument to nfs_sync_inode_wait()
does not clear the nr_unstable page state counter for pages that are being
released.

Also fix a longstanding similar bug when nfs_commit_list() fails.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-07-05 13:17:12 -04:00
Trond Myklebust 01c3b861cd NLM,NFSv4: Wait on local locks before we put RPC calls on the wire
Use FL_ACCESS flag to test and/or wait for local locks before we try
requesting a lock from the server

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-07-05 13:13:18 -04:00
Trond Myklebust f07f18dd6f VFS: Add support for the FL_ACCESS flag to flock_lock_file()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-07-05 13:13:18 -04:00
Trond Myklebust 42a2d13eee NFSv4: Ensure nfs4_lock_expired() caches delegated locks
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-07-05 13:13:18 -04:00
Trond Myklebust 9b07357490 NLM,NFSv4: Don't put UNLOCK requests on the wire unless we hold a lock
Use the new behaviour of {flock,posix}_file_lock(F_UNLCK) to determine if
we held a lock, and only send the RPC request to the server if this was the
case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-07-05 13:13:17 -04:00
Trond Myklebust f475ae957d VFS: Allow caller to determine if BSD or posix locks were actually freed
Change posix_lock_file_conf(), and flock_lock_file() so that if called
with an F_UNLCK argument, and the FL_EXISTS flag they will indicate
whether or not any locks were actually freed by returning 0 or -ENOENT.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-07-05 13:13:17 -04:00
Trond Myklebust 5e66dd6d66 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ 2006-07-05 13:13:03 -04:00
Greg Ungerer 31304c909e [PATCH] uclinux: fix proc_task()/get_proc-task() naming
Fix changed name of proc_task() to get_proc_task().

Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 22:37:13 -07:00
Linus Torvalds 0d1782144e Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  [JFFS2][XATTR] Fix memory leak in POSIX-ACL support
  fs/jffs2/: make 2 functions static
  [MTD] NAND: Fix broken sharpsl driver
  [JFFS2][XATTR] Fix xd->refcnt race condition
  MTD: kernel-doc fixes + additions
  MTD: fix all kernel-doc warnings
  [MTD] DOC: Fixup read functions and do a little cleanup
2006-07-03 21:29:08 -07:00
Ingo Molnar 36c8b58689 [PATCH] sched: cleanup, remove task_t, convert to struct task_struct
cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.

Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:11 -07:00
Ingo Molnar 663d440eaa [PATCH] lockdep: annotate blkdev nesting
Teach special (recursive) locking code to the lock validator.

Effects on non-lockdep kernels:

- the introduction of the following function variants:

  extern struct block_device *open_partition_by_devnum(dev_t, unsigned);

  extern int blkdev_put_partition(struct block_device *);

  static int
  blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags);

 which on non-lockdep are the same as open_by_devnum(), blkdev_put()
 and blkdev_get().

- a subclass parameter to do_open(). [unused on non-lockdep]

- a subclass parameter to __blkdev_put(), which is a new internal
  function for the main blkdev_put*() functions. [parameter unused
  on non-lockdep kernels, except for two sanity check WARN_ON()s]

these functions carry no semantical difference - they only express
object dependencies towards the lockdep subsystem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:10 -07:00
Arjan van de Ven 897c6ff956 [PATCH] lockdep: annotate sb ->s_umount
The s_umount rwsem needs to be classified as per-superblock since it's
perfectly legit to keep multiple of those recursively in the VFS locking
rules.

Has no effect on non-lockdep kernels.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:09 -07:00
Ingo Molnar cf51624999 [PATCH] lockdep: annotate ->s_lock
Teach special (per-filesystem) locking code to the lock validator.

Minimal effect on non-lockdep kernels: one extra parameter to alloc_super().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:09 -07:00
Arjan van de Ven 5c81a4197d [PATCH] lockdep: annotate the quota code
The quota code plays interesting games with the lock ordering; to quote Jan:

| i_mutex of inode containing quota file is acquired after all other
| quota locks. i_mutex of all other inodes is acquired before quota
| locks. Quota code makes sure (by resetting inode operations and
| setting special flag on inode) that noone tries to enter quota code
| while holding i_mutex on a quota file...

The good news is that all of this special case i_mutex grabbing happens in the
(per filesystem) low level quota write function.  For this special case we
need a new I_MUTEX_* nesting level, since this just entirely outside any of
the regular VFS locking rules for i_mutex.  I trust Jan on his blue eyes that
this is not ever going to deadlock; and based on that the patch below is what
it takes to inform lockdep of these very interesting new locking rules.

The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the
deepest possible level of nesting for i_mutex, and that this only should be
used in quota write (and possibly read) function of filesystems.  This makes
the lock ordering of the I_MUTEX_* levels:

I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA

Has no effect on non-lockdep kernels.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:08 -07:00
Ingo Molnar 5934537474 [PATCH] lockdep: annotate NTFS locking rules
NTFS uses lots of type-opaque objects which acquire their true identity
runtime - so the lock validator needs to be helped in a couple of places to
figure out object types.

Many thanks to Anton Altaparmakov for giving lots of explanations about NTFS
locking rules.

Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:08 -07:00
Ingo Molnar f2eace23e9 [PATCH] lockdep: annotate i_mutex
Teach special (recursive) locking code to the lock validator.  Has no effect
on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:06 -07:00
Ingo Molnar a90b9c05df [PATCH] lockdep: annotate dcache
Teach special (recursive) locking code to the lock validator.  Has no effect
on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:06 -07:00
Ingo Molnar d8aa905b42 [PATCH] lockdep: annotate direct io
Teach special (rwsem-in-irq) locking code to the lock validator.  Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:06 -07:00
Ingo Molnar e4d9191885 [PATCH] lockdep: locking init debugging improvement
Locking init improvement:

 - introduce and use __SPIN_LOCK_UNLOCKED for array initializations,
   to pass in the name string of locks, used by debugging

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:02 -07:00
Chuck Ebbert ce51059be5 [PATCH] binfmt_elf: fix checks for bad address
Fix check for bad address; use macro instead of open-coding two checks.

Taken from RHEL4 kernel update.

From: Ernie Petrides <petrides@redhat.com>

  For background, the BAD_ADDR() macro should return TRUE if the address is
  TASK_SIZE, because that's the lowest address that is *not* valid for
  user-space mappings.  The macro was correct in binfmt_aout.c but was wrong
  for the "equal to" case in binfmt_elf.c.  There were two in-line validations
  of user-space addresses in binfmt_elf.c, which have been appropriately
  converted to use the corrected BAD_ADDR() macro in the patch you posted
  yesterday.  Note that the size checks against TASK_SIZE are okay as coded.

  The additional changes that I propose are below.  These are in the error
  paths for bad ELF entry addresses once load_elf_binary() has already
  committed to exec'ing the new image (following the tearing down of the
  task's original address space).

  The 1st hunk deals with the interp-side of the outer "if".  There were two
  problems here.  The printk() should be removed because this path can be
  triggered at will by a bogus interpreter image created and used by a
  malicious user.  Further, the error code should not be ENOEXEC, because that
  causes the loop in search_binary_handler() to continue trying other exec
  handlers (twice, in fact).  But it's too late for this to work correctly,
  because the user address space has already been torn down, and an exec()
  failure cannot be returned to the user code because the code no longer
  exists.  The only recovery is to force a SIGSEGV, but it's best to terminate
  the search loop immediately.  I somewhat arbitrarily chose EINVAL as a
  fallback error code, but any error returned by load_elf_interp() will
  override that (but this value will never be seen by user-space).

  The 2nd hunk deals with the non-interp-side of the outer "if".  There were
  two problems here as well.  The SIGSEGV needs to be forced, because a prior
  sigaction() syscall might have set the associated disposition to SIG_IGN.
  And the ENOEXEC should be changed to EINVAL as described above.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Ernie Petrides <petrides@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:26:59 -07:00
Trond Myklebust 026477c114 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ 2006-07-03 13:49:45 -04:00
Dominik Hackl 4ebd9ab387 [PATCH] nfs: non-procfs build fix
This fixes a bug in fs/nfs which makes it impossible to build nfs
without having procfs enabled.

Signed-off-by: Dominik Hackl <dominik@hackl.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-02 15:10:20 -07:00
KaiGai Kohei c7afb0f977 [JFFS2][XATTR] Fix memory leak in POSIX-ACL support
jffs2_clear_acl() which releases acl caches allocated by kmalloc()
was defined but it was never called. Thus, we faced to the risk
of memory leaking.

This patch plugs jffs2_clear_acl() into jffs2_do_clear_inode().
It ensures to release acl cache when inode is cleared.

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-07-02 15:13:46 +01:00
Vladimir Saveliev dd535a5965 [PATCH] reiserfs: update ctime and mtime on expanding truncate
Reiserfs does not update ctime and mtime on expanding truncate via
truncate().  This patch fixes it.

Signed-off-by: Vladimir Saveliev <vs@namesys.com>
Cc: Hans Reiser <reiser@namesys.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:04 -07:00
Evgeniy Dushistov 10e5dce07e [PATCH] ufs: truncate should allocate block for last byte
This patch fixes buggy behaviour of UFS
in such kind of scenario:
open(, O_TRUNC...)
ftruncate(, 1024)
ftruncate(, 0)

Such a scenario causes ufs_panic and remount read-only.  This happen
because of according to specification UFS should always allocate block for
last byte, and many parts of our implementation rely on this, but
`ufs_truncate' doesn't care about this.

To make possible return error code and to know about old size, this patch
removes `truncate' from ufs inode_operations and uses `setattr' method to
call ufs_truncate.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:03 -07:00
Linus Torvalds 22a3e233ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  Remove obsolete #include <linux/config.h>
  remove obsolete swsusp_encrypt
  arch/arm26/Kconfig typos
  Documentation/IPMI typos
  Kconfig: Typos in net/sched/Kconfig
  v9fs: do not include linux/version.h
  Documentation/DocBook/mtdnand.tmpl: typo fixes
  typo fixes: specfic -> specific
  typo fixes in Documentation/networking/pktgen.txt
  typo fixes: occuring -> occurring
  typo fixes: infomation -> information
  typo fixes: disadvantadge -> disadvantage
  typo fixes: aquire -> acquire
  typo fixes: mecanism -> mechanism
  typo fixes: bandwith -> bandwidth
  fix a typo in the RTC_CLASS help text
  smb is no longer maintained

Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30 15:39:30 -07:00
J. Bruce Fields 5c04c46aec [PATCH] knfsd: nfsd: mark rqstp to prevent use of sendfile in privacy case
Add a rq_sendfile_ok flag to svc_rqst which will be cleared in the privacy
case so that the wrapping code will get copies of the read data instead of
real page cache pages.  This makes life simpler when we encrypt the response.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:41 -07:00
J. Bruce Fields 9ecb6a08d8 [PATCH] knfsd: nfsd4: fix open flag passing
Since nfsv4 actually keeps around the file descriptors it gets from open
(instead of just using them for a single read or write operation), we need to
make sure that we can do RDWR opens and not just RDONLY/WRONLY.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
J. Bruce Fields ba5a6a19d8 [PATCH] knfsd: nfsd4: fix some open argument tests
These tests always returned true; clearly that wasn't what was intended.

In keeping with kernel style, make them functions instead of macros while
we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
David M. Richter 270d56e536 [PATCH] knfsd: nfsd: fix misplaced fh_unlock() in nfsd_link()
In the event that lookup_one_len() fails in nfsd_link(), fh_unlock() is
skipped and locks are held overlong.

Patch was tested on 2.6.17-rc2 by causing lookup_one_len() to fail and
verifying that fh_unlock() gets called appropriately.

Signed-off-by: David M. Richter <richterd@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
J. Bruce Fields 6e46d8a9cc [PATCH] knfsd: nfsd4: remove superfluous grace period checks
We're checking nfs_in_grace here a few times when there isn't really any
reason to--bad_stateid is probably the more sensible return value anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
J. Bruce Fields 7fc90ec93a [PATCH] knfsd: nfsd: call nfsd_setuser() on fh_compose(), fix nfsd4 permissions problem
In the typical v2/v3 case the only new filehandles used as arguments to
operations are filehandles taken directly off the wire, which don't get
dentries until fh_verify() is called.

But in v4 the filehandles that are arguments to operations were often created
by previous operations (putrootfh, lookup, etc.) using fh_compose, which sets
the dentry in the filehandle without calling nfsd_setuser().

This also means that, for example, if filesystem B is mounted on filesystem A,
and filesystem A is exported without root-squashing, then a client can bypass
the rootsquashing on B using a compound that starts at a filehandle in A,
crosses into B using lookups, and then does stuff in B.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
J. Bruce Fields a8cddc5dfc [PATCH] knfsd: nfsd4: fix open_confirm locking
Fix an improper unlock in an error path.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
NeilBrown 7e4053645a [PATCH] knfsd: ignore ref_fh when crossing a mountpoint
nfsd tries to return to a client the same sort of filehandle as was used by
the client.  This removes some filehandle aliasing issues and means that a
server upgrade followed by a downgrade will not confused clients not restarted
during that time.

However when crossing a mountpoint, the filehandle used for one filesystem
doesn't provide any useful information on what sort of filehandle should be
used on the other, and can provide misleading information.  So if the
reference filehandle is on a different filesystem to the one being generated,
ignore it.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
NeilBrown 4c9608b2f2 [PATCH] knfsd: remove noise about filehandle being uptodate
There is a perfectly valid situation where fh_update gets called on an already
uptodate filehandle - in nfsd_create_v3 where a CREATE_UNCHECKED finds an
existing file and wants to just set the size.

We could possible optimise out the call in that case, but the only harm
involved is that fh_update prints a warning, so it is easier to remove the
warning.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
Frank Filz 4bdff8c095 [PATCH] knfsd: fixing missing 'expkey' support for fsid type 3
Type '3' is used for the fsid in filehandles when the device number of the
device holding the filesystem has more than 8 bits in either major or minor.
Unfortunately expkey_parse doesn't recognise type 3.  Fix this.

(Slighty modified from Frank's original)

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:39 -07:00
NeilBrown a56f39375a [PATCH] knfsd: improve the test for cross-device-rename in nfsd
Just testing the i_sb isn't really enough, at least the vfsmnt must be the
same.  Thanks Al.

Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:39 -07:00
David Quigley a1836a42da [PATCH] SELinux: Add security hook definition for getioprio and insert hooks
Add a new security hook definition for the sys_ioprio_get operation.  At
present, the SELinux hook function implementation for this hook is
identical to the getscheduler implementation but a separate hook is
introduced to allow this check to be specialized in the future if
necessary.

This patch also creates a helper function get_task_ioprio which handles the
access check in addition to retrieving the ioprio value for the task.

Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:37 -07:00
Christoph Lameter f8891e5e1f [PATCH] Light weight event counters
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:36 -07:00
Christoph Lameter d2c5e30c9a [PATCH] zoned vm counters: conversion of nr_bounce to per zone counter
Conversion of nr_bounce to a per zone counter

nr_bounce is only used for proc output.  So it could be left as an event
counter.  However, the event counters may not be accurate and nr_bounce is
categorizing types of pages in a zone.  So we really need this to also be a
per zone counter.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:36 -07:00
Christoph Lameter fd39fc8561 [PATCH] zoned vm counters: conversion of nr_unstable to per zone counter
Conversion of nr_unstable to a per zone counter

We need to do some special modifications to the nfs code since there are
multiple cases of disposition and we need to have a page ref for proper
accounting.

This converts the last critical page state of the VM and therefore we need to
remove several functions that were depending on GET_PAGE_STATE_LAST in order
to make the kernel compile again.  We are only left with event type counters
in page state.

[akpm@osdl.org: bugfixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:36 -07:00
Christoph Lameter ce866b34ae [PATCH] zoned vm counters: conversion of nr_writeback to per zone counter
Conversion of nr_writeback to per zone counter.

This removes the last page_state counter from arch/i386/mm/pgtable.c so we
drop the page_state from there.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:35 -07:00
Christoph Lameter b1e7a8fd85 [PATCH] zoned vm counters: conversion of nr_dirty to per zone counter
This makes nr_dirty a per zone counter.  Looping over all processors is
avoided during writeback state determination.

The counter aggregation for nr_dirty had to be undone in the NFS layer since
we summed up the page counts from multiple zones.  Someone more familiar with
NFS should probably review what I have done.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:35 -07:00
Christoph Lameter df849a1529 [PATCH] zoned vm counters: conversion of nr_pagetables to per zone counter
Conversion of nr_page_table_pages to a per zone counter

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:35 -07:00
Christoph Lameter 9a865ffa34 [PATCH] zoned vm counters: conversion of nr_slab to per zone counter
- Allows reclaim to access counter without looping over processor counts.

- Allows accurate statistics on how many pages are used in a zone by
  the slab. This may become useful to balance slab allocations over
  various zones.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:35 -07:00
Christoph Lameter f3dbd34460 [PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPED
The current NR_FILE_MAPPED is used by zone reclaim and the dirty load
calculation as the number of mapped pagecache pages.  However, that is not
true.  NR_FILE_MAPPED includes the mapped anonymous pages.  This patch
separates those and therefore allows an accurate tracking of the anonymous
pages per zone.

It then becomes possible to determine the number of unmapped pages per zone
and we can avoid scanning for unmapped pages if there are none.

Also it may now be possible to determine the mapped/unmapped ratio in
get_dirty_limit.  Isnt the number of anonymous pages irrelevant in that
calculation?

Note that this will change the meaning of the number of mapped pages reported
in /proc/vmstat /proc/meminfo and in the per node statistics.  This may affect
user space tools that monitor these counters!  NR_FILE_MAPPED works like
NR_FILE_DIRTY.  It is only valid for pagecache pages.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:35 -07:00
Christoph Lameter 347ce434d5 [PATCH] zoned vm counters: conversion of nr_pagecache to per zone counter
Currently a single atomic variable is used to establish the size of the page
cache in the whole machine.  The zoned VM counters have the same method of
implementation as the nr_pagecache code but also allow the determination of
the pagecache size per zone.

Remove the special implementation for nr_pagecache and make it a zoned counter
named NR_FILE_PAGES.

Updates of the page cache counters are always performed with interrupts off.
We can therefore use the __ variant here.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:34 -07:00
Christoph Lameter 65ba55f500 [PATCH] zoned vm counters: convert nr_mapped to per zone counter
nr_mapped is important because it allows a determination of how many pages of
a zone are not mapped, which would allow a more efficient means of determining
when we need to reclaim memory in a zone.

We take the nr_mapped field out of the page state structure and define a new
per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off
from NR_MAPPED in the next patch).

We replace the use of nr_mapped in various kernel locations.  This avoids the
looping over all processors in try_to_free_pages(), writeback, reclaim (swap +
zone reclaim).

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:34 -07:00
Jörn Engel 6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Paul Collins 779cbf0bbc v9fs: do not include linux/version.h
I noticed that part of v9fs was being rebuilt when version.h changed.

Signed-off-by: Paul Collins <paul@ondioline.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 18:50:03 +02:00
Adrian Bunk 0418726bb5 typo fixes: aquire -> acquire
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2006-06-30 18:23:04 +02:00
Linus Torvalds 501b7c77de Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks()
  ocfs2: clean up some osb fields
  ocfs2: fix init of uuid_net_key
  ocfs2: silence a debug print
  ocfs2: silence ENOENT during lookup of broken links
  ocfs2: Cleanup message prints
  ocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup
  [PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static
  ocfs2: warn the user on a dead timeout mismatch
  ocfs2: OCFS2_FS must depend on SYSFS
  ocfs2: Compile-time disabling of ocfs2 debugging output.
  configfs: Clear up a few extra spaces where there should be TABs.
  configfs: Release memory in configfs_example.
2006-06-29 17:44:21 -07:00
Florin Malita 184d7d20d3 ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks()
Signed-off-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 16:13:35 -07:00
Mark Fasheh 784270435b ocfs2: clean up some osb fields
Get rid of osb->uuid, osb->proc_sub_dir, and osb->osb_id. Those fields were
unused, or could easily be removed. As a result, we also no longer need
MAX_OSB_ID or ocfs2_globals_lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 16:10:13 -07:00
Mark Fasheh a75a6e4c3a ocfs2: fix init of uuid_net_key
ocfs2_initialize_super() should be copying from the beginning of the uuid.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 16:06:43 -07:00
Mark Fasheh e7607ab3da ocfs2: silence a debug print
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 16:03:17 -07:00
Sunil Mushran d426721cf1 ocfs2: silence ENOENT during lookup of broken links
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 15:59:52 -07:00
Sunil Mushran 781ee3e2b1 ocfs2: Cleanup message prints
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 15:56:26 -07:00
Joel Becker a43db30c7c ocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 15:52:56 -07:00
Adrian Bunk 8169cae5a1 [PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static
dlm_lockres_master_requery() became global without any external usage.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 15:49:29 -07:00
Mark Fasheh 0db638f44e ocfs2: warn the user on a dead timeout mismatch
Print a warning to the user when a node with a different dead count joins
the region.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 15:45:35 -07:00
Adrian Bunk c05d52c748 fs/jffs2/: make 2 functions static
This patch makes two needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-29 23:08:49 +01:00
Adrian Bunk 4ba63adce0 ocfs2: OCFS2_FS must depend on SYSFS
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 14:56:12 -07:00
Joel Becker 2b388c6790 ocfs2: Compile-time disabling of ocfs2 debugging output.
Give gcc the chance to compile out the debug logging code in ocfs2.
This saves some size at the expense of being able to debug the code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 14:48:30 -07:00
Joel Becker e7515d065d configfs: Clear up a few extra spaces where there should be TABs.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 14:43:01 -07:00
Linus Torvalds 602cada851 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6: (22 commits)
  [PATCH] devfs: Remove it from the feature_removal.txt file
  [PATCH] devfs: Last little devfs cleanups throughout the kernel tree.
  [PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV
  [PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the line_driver devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the videodevice devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
  [PATCH] devfs: Remove devfs_remove() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_symlink() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree
  [PATCH] devfs: Remove devfs_*_tape() functions from the kernel tree
  [PATCH] devfs: Remove devfs support from the sound subsystem
  [PATCH] devfs: Remove devfs support from the ide subsystem.
  [PATCH] devfs: Remove devfs support from the serial subsystem
  [PATCH] devfs: Remove devfs from the init code
  [PATCH] devfs: Remove devfs from the partition code
  ...
2006-06-29 14:19:21 -07:00