threads, the incore superblock lock becomes the limiting factor for
buffered write throughput. Make the contended fields in the incore
superblock use per-cpu counters so that there is no global lock to limit
scalability.
SGI-PV: 946630
SGI-Modid: xfs-linux-melb:xfs-kern:25106a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
actually use it. Kill this dead code. Signed-off-by: Christoph Hellwig
<hch@lst.de>
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:25086a
Signed-off-by: Nathan Scott <nathans@sgi.com>
is provided by a vector through the superblock export operations when the
filesystem is exported by NFS. The fix is to call that vector instead of
using the exported symbol directly.
SGI-PV: 948858
SGI-Modid: xfs-linux-melb:xfs-kern:25062a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
One can do "chattr +j" on a file to change its journalling mode. Fix
writeback mode with "nobh" handling for it.
Even though, we mount ext3 filesystem in writeback mode with "nobh" option,
some one can do "chattr +j" on a single file to force it to do journalled
mode. In order to do journaling, ext3_block_truncate_page() need to
fallback to default case of creating buffers and adding them to transaction
etc.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in
ext3_symlink(). Such allocation may re-enter ext3 code from
try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal
handle in task_struct and, hence, is not reentrable.
This bug led to "Assertion failure in journal_dirty_metadata()" messages.
http://bugzilla.openvz.org/show_bug.cgi?id=115
Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix some bugs in mtd/jffs2 on 64bit platform.
The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h.
And some variables in jffs2 are declared as uint32_t but used to hold
size_t values.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A recent change to compat. dev_ifconf() in fs/compat_ioctl.c
causes ifconf data to be truncated 1 entry too early when copying it
to userspace. The correct amount of data (length) is returned,
but the final entry is empty (zero, not filled in).
The for-loop 'i' check should use <= to allow the final struct
ifreq32 to be copied. I also used the ifconf-corruption program
in kernel bugzilla #4746 to make sure that this change does not
re-introduce the corruption.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miscellaneous fixes related to accessing uninitialized variables or memory
that was already freed.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
DASD allows to open a device as soon as gendisk is registered, which means the
device is a fake device (capacity=0) and we do know nothing about blocksize
and partitions at that point of time. In case the device is opened by
someone, the bdev and inode creation is done with the fake device info and the
following partition detection code is just using the wrong data.
To avoid this modify the DASD state machine to make sure that the open is
rejected until the device analysis is either finished or an unformatted device
was detected.
Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I have benchmarked this on an x86_64 NUMA system and see no significant
performance difference on kernbench. Tested on both x86_64 and powerpc.
The way we do file struct accounting is not very suitable for batched
freeing. For scalability reasons, file accounting was
constructor/destructor based. This meant that nr_files was decremented
only when the object was removed from the slab cache. This is susceptible
to slab fragmentation. With RCU based file structure, consequent batched
freeing and a test program like Serge's, we just speed this up and end up
with a very fragmented slab -
llm22:~ # cat /proc/sys/fs/file-nr
587730 0 758844
At the same time, I see only a 2000+ objects in filp cache. The following
patch I fixes this problem.
This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api. In the sysctl handler for
nr_files, we populate files_stat.nr_files before returning to user.
Counting files as an when they are created and destroyed (as opposed to
inside slab) allows us to correctly count open files with RCU.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug in udf where it would write uid/gid = 0 to the disk for files
owned by the id given with the uid=/gid= mount options. It also adds 4 new
mount options: uid/gid=forget and uid/gid=ignore. Without any options the
id in core and on disk always match. Giving uid/gid=nnn specifies a
default ID to be used in core when the on disk ID is -1. uid/gid=ignore
forces the in core ID to allways be used no matter what the on disk ID is.
uid/gid=forget forces the on disk ID to always be written out as -1.
The use of these options allows you to override ownerships on a disk or
disable ownwership information from being written, allowing the media to be
used portably between different computers and possibly different users
without permissions issues that would require root to correct.
Signed-off-by: Phillip Susi <psusi@cfl.rr.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We don't do interruptible waits for the pipe mutex anywhere else any
more either, so don't do it in fifo_open() either.
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The point of the smaps "shared" is to count the number of pages that are
mapped by more than one process, according to Mauricio Lin. However, smaps
uses page_count for this, so it will return a false positive for every page
that is mapped by just that one process, which is also in pagecache or
swapcache. There are false positive situations for anonymous pages not in
swapcache as well: - page reclaim, migration - get_user_pages (eg.
direct-io, ptrace)
Use page_mapcount instead, to count the number of mappings to the page.
Use vm_normal_page so that weird things like /dev/mem aren't counted either.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ramfs neglects to update the directory mtime and ctime fields when creating
a new symbolic link. Ramfs was modified in 2.6.15 to update these fields
when other types of entries are created. The symlink support is separate
from that other support, so that change did not cover quite all of the
possibilities.
All of the directory content manipulation entry points now seem to be
covered with respect to these time field updates.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix handling of cramfs images created by util-linux containing empty
regular files. Images created by cramfstools 1.x were ok.
Fill out inode contents in cramfs_iget5_set() instead of get_cramfs_inode()
to prevent issues if cramfs_iget5_test() is called with I_LOCK|I_NEW still
set.
Signed-off-by: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
Cc: Olaf Hering <olh@suse.de>
Cc: Chris Mason <mason@suse.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
session when multiply mounted.
Fixes slow response when cifs client is mounted to shares on multiple
servers and oplock break occurs (usually due to attempt to multiply open a
file). When treeids on mutiple mounted shares match and we find the wrong
match first, we searched for the wrong cached files to send oplock break
response for which usually meant that no matching file was found and thus
the server would have to timeout the notification. Oplock break timeout is
about 20 seconds on some servers so this could cause significantly slower
performance on file open calls in a few cases (in particular when multiple
shares are mounted from multiple servers, tree ids match, and we have a
cached file which is later opened multiple times). This was the most
important of the bugs that was found and fixed at Connectathon
(interoperability testing event) this week.
Acked-by: Shaggy (shaggy@austin.ibm.com)
Signed-off-by: Steve French (sfrench@us.ibm.com)
The bitmaps associated with generation numbers for directory entries
are declared as an array of ints. On some platforms, this causes alignment
exceptions.
The following patch uses the standard bitmap declaration macros to
declare the bitmaps, fixing the problem.
Originally from Takashi Iwai.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes bugs in reiserfs where unsigned integers were checked
whether they are less then 0.
Signed-off-by: Vladimir V. Saveliev <vs@namesys.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
v9fs has been plagued by an over-complicated approach trying to map Linux
dentry semantics to Plan 9 fid semantics. Our previous approach called for
aggressive flushing of the dcache resulting in several problems (including
wierd cwd behavior when running /bin/pwd).
This patch dramatically simplifies our handling of this fid management. Fids
will not be clunked as promptly, but the new approach is more functionally
correct. We now clunk un-open fids only when their dentry ref_count reaches 0
(and d_delete is called).
Another simplification is we no longer seek to match fids to the process-id or
uid of the action initiator. The uid-matching will need to be revisited when
we fix the security model.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lucho's atomic create+open fix had a bug in the super block initialization
causing all mounts to fail. He was freeing an fcall too early. This patch
fixes that oversight.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In order to assure atomic create+open v9fs stores the open fid produced by
v9fs_vfs_create in the dentry, from where v9fs_file_open retrieves it and
associates it with the open file.
This patch modifies v9fs to use nameidata.intent.open values to do the atomic
create+open.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Switch from list_head to hlist_head. Make the size of the hash dependent
upon the allocated area, rather than a constant.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
to prevent confusion when a virtual ip is created on the same interface
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The extent map code has long noticed when the on-disk extent information
is corrupt. However, so far it has only returned an error. We should
take the filesystem read-only, as it is corrupt.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Orphan dir recovery can deadlock with another process in
ocfs2_delete_inode() in some corner cases. Fix this by tracking recovery
state more closely and allowing it to handle inode wipes which might
deadlock.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch finishes cleaning up the node manager allocations if it fails
to initialize.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The check to determine which format string is appopriate for u64 and
friends works in most cases, but UML on x86_64 doesn't define CONFIG_X86_64,
so it results in screen fulls of compile-time warnings.
This patch fixes it to handle that case.
fs/ocfs2/cluster/masklog.h | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch adds mm->task_size to keep track of the task size of a given mm
and uses that to fix the powerpc vdso so that it uses the mm task size to
decide what pages to fault in instead of the current thread flags (which
broke when ptracing).
(akpm: I expect that mm_struct.task_size will become the way in which we
finally sort out the confusion between 32-bit processes and 32-bit mm's. It
may need tweaks, but at this stage this patch is powerpc-only.)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If negative entries (nodeid == 0) were sent in reply to LOOKUP requests,
two bugs could be triggered:
- looking up a negative entry would return -EIO,
- revaildate on an entry which turned negative would send a FORGET
request with zero nodeid, which would cause an abort() in the
library.
The above would only happen if the 'negative_timeout=N' option was used,
otherwise lookups reply -ENOENT, which worked correctly.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
obscure corruption case
SGI-PV: 942658
SGI-Modid: xfs-linux-melb:xfs-kern:207119a
Signed-off-by: Eric Sandeen <sandeen@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
regressed recently via the fix for inherited quota inode attributes.
SGI-PV: 947312
SGI-Modid: xfs-linux-melb:xfs-kern:25318a
Signed-off-by: Nathan Scott <nathans@sgi.com>
RTC_IRQP_SET/RTC_EPOCH_SET don't take a pointer to an argument, but the
argument itself. This actually simplifies the code and makes it work.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes a local DOS on Intel systems that lead to an endless
recursive fault. AMD machines don't seem to be affected.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Phil Marek <philipp.marek@bmlv.gv.at> points out that ramfs forgets to update
a directory's mtime and ctime when it is modified.
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm currently at the POSIX meeting and one thing covered was the
incompatibility of Linux's link() with the POSIX definition. The name.
Linux does not follow symlinks, POSIX requires it does.
Even if somebody thinks this is a good default behavior we cannot change this
because it would break the ABI. But the fact remains that some application
might want this behavior.
We have one chance to help implementing this without breaking the behavior.
For this we could use the new linkat interface which would need a new
flags parameter. If the new parameter is AT_SYMLINK_FOLLOW the new
behavior could be invoked.
I do not want to introduce such a patch now. But we could add the
parameter now, just don't use it. The patch below would do this. Can we
get this late patch applied before the release more or less fixes the
syscall API?
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Windows copes with this and even chkdsk does not detect or fix this
so we have to cope with it, too. Thanks to Pawel Kot for reporting
the problem.
- Miscellaneous updates to layout.h.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>