Fix sysfs printk format warning:
fs/sysfs/bin.c:62: warning: format '%d' expects type 'int', but argument 4 has type 'size_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ocfs2-specific ip_attr. Hence, when someone sets these flags via a different
interface than ioctl, they are stored correctly.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
use __set_current_state(TASK_*) instead of current->state = TASK_*, in
fs/ocfs2
Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
OCFS2_I(inode)->ip_alloc_sem is a read-write semaphore protecting
local concurrent access of ocfs2 inodes. However, ocfs2 directories were
not taking the semaphore while they accessed or modified the allocation
tree.
ocfs2_extend_dir() needs to take the semaphore in a write mode when it
adds to the allocation. All other directory users get there via
ocfs2_bread(), which takes the semaphore in read mode.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch makes the following needlessly global functions static:
- aops.c: ocfs2_write_data_page()
- dlmglue.c: ocfs2_dump_meta_lvb_info()
- file.c: ocfs2_set_inode_size()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
vfat implements compat handlers for these ioctls, but when they
were executed on other file systems the kernel would still complain
about an unknown compat ioctl. Just declare them as compatible
and let them be rejected when not needed by the normal path.
This makes wine runs a lot quieter
Signed-off-by: Andi Kleen <ak@suse.de>
Define a new IGNORE_IOCTL() to let a compat ioctl not be warned about even when
it is not implemented.
This is the same as COMPATIBLE_IOCTL internally, but better self documentng.
Valid reasons to use this:
- It is implemented with ->compat_ioctl on some device, but programs
call it on others too.
- The ioctl is not implemented in the native kernel, but programs
call it commonly anyways.
Most other reasons are not valid.
Signed-off-by: Andi Kleen <ak@suse.de>
The specific case I am encountering is kdump under Xen with a 64 bit
hypervisor and 32 bit kernel/userspace. The dump created is 64 bit due to
the hypervisor but the dump kernel is 32 bit for maximum compatibility.
It's possibly less likely to be useful in a purely native scenario but I
see no reason to disallow it.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Horms <horms@verge.net.au>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove some stale comments about hard limits which went away in 2.5.
Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The ACL that the server sets may not be exactly the one we set--for
example, it may silently turn off bits that it does not support. So we
should remove any cached ACL so that any subsequent request for the ACL
will go to the server.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that the patch from -mm has gone upstream, we can uncomment the code
in GFS2 which uses sprintf_symbol.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Robert Peterson <rpeterso@redhat.com>
Replace some printk with log_print, and fix some simple cases of lines
over 80. Also, return -ENOTCONN if lowcomms_start fails due to no local
IP address being available.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
alpha:
fs/gfs2/dir.c: In function 'gfs2_dir_read_leaf':
fs/gfs2/dir.c:1322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t'
fs/gfs2/dir.c: In function 'gfs2_dir_read':
fs/gfs2/dir.c:1455: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type '__u64'
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
If a stuffed file is mmaped and a page fault is generated at some offset
above the initial page, we need to create a zero page to hang the buffer
heads off before we can unstuff the file. This is a fix for bz #236087
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch converts the mount option parsing to use the kernels lib/parser stuff
like all of the other filesystems. I tested this and it works well. Thank you,
Signed-off-by: Josef Bacik <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix a few range & initialization bugs in lowcomms.
- max_nodeid is really the highest nodeid encountered, so all loops must include
it in their iterations.
- clean dlm_local_count & connection_idr so we can do a clean restart.
- Remove a spurious BUG_ON
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When you attempt to release a lockspace in DLM, it will hang trying to down a
semaphore that has already been downed. The attached patch fixes the problem.
Signed-off-by: Josef Bacik <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Patrick Caulfield <pcaulfie@redhat.com>
There are flags to enable two specialized features in the dlm:
1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by
changing the granted mode of locks to NL.
2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR
or CW to grant them if the normal requested mode can't be granted.
GFS direct i/o exercises both of these features, especially when mixed
with buffered i/o. The dlm has problems with them.
The first problem is on the master node. If it demotes a lock as a part of
converting it, the actual step of converting the lock isn't being done
after the demotion, the lock is just left sitting on the granted queue
with a granted mode of NL. I think the mistaken assumption was that the
call to grant_pending_locks() would grant it, but that function naturally
doesn't look at locks on the granted queue.
The second problem is on the process node. If the master either demotes
or gives an altmode, the munging of the gr/rq modes is never done in the
process copy of the lock, leaving the master/process copies out of sync.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The patch below consists of the following changes (in code order):
1. I fixed a minor compiler warning regarding the printing of
a kernel symbol address.
2. I implemented a suggestion from Dave Teigland that moves
the debugfs information for gfs2 into a subdirectory so
we can easily expand our use of debugfs in the future.
The current code keeps the glock information in:
/debug/gfs2/<fs>
With the patch, the new code keeps the glock information in:
/debug/gfs2/<fs>/glock
That will allow us to create more debugfs files in the future.
3. This fixes a bug whereby a failed mount attempt causes the
debugfs file to not be deleted. Failed mount attempts should
always clean up after themselves, including deleting the
debugfs file and/or directory.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch detects when the number of entries in a leaf block or inode
block (in the case of stuffed directories) is corrupt and informs the
user. It prevents us from running off the end of the array thats been
allocated for the sorting in this case,
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx
(lock dump) seen at the "gfs2 summit". This also fixes the bug that caused
garbage to be printed by the "initialized at" field. I apologize for the
kludge, but that code will all be ripped out anyway when the official
sprint_symbol function becomes available in the Linux kernel. I also
changed some formatting so that spaces are replaced by proper tabs.
Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Every file should include the headers containing the prototypes for
it's global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch consolidates the TCP & SCTP protocols for the DLM into a single file
and makes it switchable at run-time (well, at least before the DLM actually
starts up!)
For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel
socket API but that has already been twice ACKed so it should be OK.
The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c
& lowcomms-tcp.c files.
Signed-off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removes a redundant (and incorrect) assignment from compat_output
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Ths following patch makes GFS2 use the rgrp flags properly. Although
there are also separate flags for both data and metadata as well, I've
not implemented these as there seems little use for them. On the
otherhand, the "noalloc" flag is generally useful for future changes we
might which to make, so this ensures that we interpret it correctly.
In addition I fixed the comment above the function which was incorrect.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
A lock id is a uint32 and is used as an opaque reference to the lock. For
userland apps, the lkid is passed up, through libdlm, as the return value
from a write() on the dlm device. This created a problem when the high
bit was 1, making the lkid look like an error. This is fixed by changing
how the lkid is composed. The low 16 bits identified the hash bucket for
the lock and the high 16 bits were a per-bucket counter (which eventually
hit 0x8000 causing the problem). These are simply swapped around; the
number of hash table buckets is far below 0x8000, making all lkid's
positive when viewed as signed.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Add code to accept purge commands from userland.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Add code for purging orphan locks. A process can also purge all of its
own non-orphan locks by passing a pid of zero. Code already exists for
processes to create persistent locks that become orphans when the process
exits, but the complimentary capability for another process to then purge
these orphans has been missing.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This splits the current create_message() function into two parts so that
later patches can call the new lower-level _create_message() function when
they don't have an rsb struct. No functional change in this patch.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We always want to see the details of the error returned to gfs, but
log_debug is often turned off, so use log_error (printk).
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Full cancel and force-unlock support. In the past, cancel and force-unlock
wouldn't work if there was another operation in progress on the lock. Now,
both cancel and unlock-force can overlap an operation on a lock, meaning there
may be 2 or 3 operations in progress on a lock in parallel. This support is
important not only because cancel and force-unlock are explicit operations
that an app can use, but both are used implicitly when a process exits while
holding locks.
Summary of changes:
- add-to and remove-from waiters functions were rewritten to handle situations
with more than one remote operation outstanding on a lock
- validate_unlock_args detects when an overlapping cancel/unlock-force
can be sent and when it needs to be delayed until a request/lookup
reply is received
- processing request/lookup replies detects when cancel/unlock-force
occured during the op, and carries out the delayed cancel/unlock-force
- manipulation of the "waiters" (remote operation) state of a lock moved under
the standard rsb mutex that protects all the other lock state
- the two recovery routines related to locks on the waiters list changed
according to the way lkb's are now locked before accessing waiters state
- waiters recovery detects when lkb's being recovered have overlapping
cancel/unlock-force, and may not recover such locks
- revert_lock (cancel) returns a value to distinguish cases where it did
nothing vs cases where it actually did a cancel; the cancel completion ast
should only be done when cancel did something
- orphaned locks put on new list so they can be found later for purging
- cancel must be called on a lock when making it an orphan
- flag user locks (ENDOFLIFE) at the end of their useful life (to the
application) so we can return an error for any further cancel/unlock-force
- we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
either a completion or blocking ast
- clear an unread bast on a lock that's become unlocked
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Replacement patch to remove redundant code rather than moving it around.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In Testing the previously posted and accepted patch for
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540
I uncovered some gfs2 badness. It turns out that the current
gfs2 code saves off a process pointer when glocks is taken
in both the glock and glock holder structures. Those
structures will persist in memory long after the process has
ended; pointers to poisoned memory.
This problem isn't caused by the 228540 fix; the new capability
introduced by the fix just uncovered the problem.
I wrote this patch that avoids saving process pointers
and instead saves off the process pid. Rather than
referencing the bad pointers, it now does process lookups.
There is special code that makes the output nicer for
printing holder information for processes that have ended.
This patch also adds a stub for the new "sprint_symbol"
function that exists in Andrew Morton's -mm patch set, but
won't go into the base kernel until 2.6.22, since it adds
functionality but doesn't fix a bug.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is a fix for bz #208514. When GFS2 frees up space, the freed blocks
aren't available for reuse until the resource group is successfully written
to the ondisk journal. So in rare cases, GFS2 operations will fail, saying
that the filesystem is out of space, when in reality, you are just waiting for
a log flush. For instance, on a 1Gig filesystem, if I continually write 10 Mb
to a file, and then truncate it, after a hundred interations, the write will
fail with -ENOSPC, even though the filesystem is just 1% full.
The attached patch calls a log flush in these cases. I tested this patch
fairly heavily to check if there were any locking issues that I missed, and
it seems to work just fine. Also, this patch only does the log flush if
get_local_rgrp makes a complete loop of resource groups without skipping
any do to locking issues. The code would be slightly simpler if it just always
did the log flush after the first failed pass, and you could only ever have
to go through the loop twice, instead of up to three times. However, I guessed
that failing to find a rg simply do to locking issues would be common enough
to skip the log flush in that case, but I'm not certain that this is the right
way to go. Either way, I don't suppose this code will be hit all that often.
Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When glock_lo_add and rg_lo_add attempt to add an element to the log, they
check to see if has already been added before locking the log. If another
process adds that element to the log in this window between the check and
locking the log, the element will be added to the list twice. This causes
the log element list to become corrupted in such a way that the log element
can never be successfully removed from the list. This patch pulls the
list_empty() check inside the log lock, to remove this window.
Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The following patch speeds up lock_dlm's locking by moving the sprintf
out from the lock acquisition path and into the lock creation path. This
reduces the amount of CPU time used in acquiring locks by a fair amount.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: David Teigland <teigland@redhat.com>
Currently if the lockspace removal fails the misc device associated with a
lockspace is left deleted. After that there is no way to access the orphaned
lockspace from userland.
This patch recreates the misc device if th dlm_release_lockspace fails. I
believe this is better than attempting to remove the lockspace first because
that leaves an unattached device lying around. The potential gap in which there
is no access to the lockspace between removing the misc device and recreating it
is acceptable ... after all the application is trying to remove it, and only new
users of the lockspace will be affected.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since gcc didn't evaluate the last two terms of the expression in
glock.c:1881 as a constant expression, it resulted in an error on
i386 due to the lack of a 64bit divide instruction. This adds some
brackets to fix the problem.
This was reported by Andrew Morton.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
This patch prevents the printing of a warning message in cases where
the fs is functioning normally by handing off responsibility for
unlinked, but still open inodes, to another node for eventual deallocation.
Also, there is now an improved system for ensuring that such requests
to other nodes do not get lost. The callback on the iopen lock is
only ever called when i_nlink == 0 and when a node is unable to deallocate
it due to it still being in use on another node. When a node receives
the callback therefore, it knows that i_nlink must be zero, so we mark
it as such (in gfs2_drop_inode) in order that it will then attempt
deallocation of the inode itself.
As an additional benefit, queuing a demote request no longer requires
a memory allocation. This simplifies the code for dealing with gfs2_holders
as it removes one special case.
There are two new fields in struct gfs2_glock. gl_demote_state is the
state which the remote node has requested and gl_demote_time is the
time when the request came in. Both fields are only valid when the
GLF_DEMOTE flag is set in gl_flags.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
If we are writing a file, and in the middle of writing the file
another node attempts to get a shared lock on that file (by doing a du for
example) the process doing the writing will hang waiting on lock_page. The
reason for this is because when we have waiters on a exclusive glock, we will go
through and flush out all dirty pages associated with that inode and release the
lock. The problem is that when we flush the dirty pages, we could hit a page
that we have locked durring the generic_file_buffered_write part of this
operation. This patch unlocks the page before we go to dequeue the lock and
locks it immediatly afterwards, since generic_file_buffered_write needs the page
locked when the commit_write is completed. This patch resolves the problem,
however if somebody sees a better way to do this please don't hesistate to yell.
Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The length of the second element of the kvec array was not initialised before
being added to the first one. This could cause invalid lengths to be passed to
kernel_recvmsg
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
If you specify an invalid mount option when trying to mount a gfs2 filesystem,
gfs2 will oops. The attached patch resolves this problem.
Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The attached patch resolves bz 228540. This adds the capability
for gfs2 to dump gfs2 locks through the debugfs file system.
This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
gfs2 because all the ioctls were stripped out. Please see the bugzilla
for more history about the fix. This patch is also attached to the bugzilla
record.
The patch is against Steve Whitehouse's latest nmw git tree kernel
(2.6.21-rc1) and has been tested on system trin-10.
Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Try running this script in an NFS mounted directory (Client relatively
recent - 2.6.18 has the problem as does 2.6.20).
------------------------------------------------------
#!/bin/bash
#
# This script will produce the following errormessage from tar:
#
# tar: newdir/innerdir/innerfile: file changed as we read it
# create dirs
rm -rf nfstest
mkdir -p nfstest/dir/innerdir
# create files (should not be empty)
echo "Hello World!" >nfstest/dir/file
echo "Hello World!" >nfstest/dir/innerdir/innerfile
# problem only happens if we sleep before chmod
sleep 1
# change file modes
chmod -R a+r nfstest
# rename dir
mv nfstest/dir nfstest/newdir
# tar it
tar -cf nfstest/nfstest.tar -C nfstest newdir
# restore old dir name
mv nfstest/newdir nfstest/dir
--------------------------------------------------------
What happens:
The 'chmod -R' does a readdir_plus in each directory and the results
get cached in the page cache. It then updates the ctime on each file
by one second. When this happens, the post-op attributes are used to
update the ctime stored on the client to match the value in the kernel.
The 'mv' calls shrink_dcache_parent on the directory tree which
flushes all the dentries (so a new lookup will be required) but
doesn't flush the inodes or pagecache.
The 'tar' does a readdir on each directory, but (in the case of
'innerdir' at least) satisfies it from the pagecache and uses the
READDIRPLUS data to update all the inodes. In the case of
'innerdir/innerfile', the ctime is out of date.
'tar' then calls 'lstat' on innerdir/innerfile getting an old ctime.
It then opens the file (triggering a GETATTR), reads the content, and
then calls fstat to see if anything has changed. It finds that ctime
has changed and so complains.
The problem seems to be that the cache readdirplus info is kept around
for too long.
My patch below discards pagecache data for directories when
dentry_iput is called on them. This effectively removes the symptom
which convinces me that I correctly understand the problem. However
I'm not convinced that is a proper solution, as there could easily be
other races that trigger the same problem without being affected by
this 'fix'.
One possibility would be to require that readdirplus pagecache data be
only used *once* to instantiate an inode. Somehow it should then be
invalidated so that if the dentry subsequently disappears, it will
cause a new request to the server to fill in the stat data.
Another possibility is to compare the cache_change_attribute on the
inode with something similar for the readdirplus info and reject the
info from readdirplus if it is too old.
I haven't tried to implement these and would value other opinions
before I do.
Thanks,
NeilBrown
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Don't use uninitialsed value for fattr->time_start in readdirplus results.
The 'fattr' structure filled in by nfs3_decode_direct does not get a
value for ->time_start set.
Thus if an entry is for an inode that we already have in cache,
when nfs_readdir_lookup calls nfs_fhget, it will call nfs_refresh_inode
and may update the inode with out-of-date information.
Directories are read a page at a time, so each page could have a
different timestamp that "should" be used to set the time_start for
the fattr for info in that page. However storing the timestamp per
page is awkward. (We could stick in the first 4 bytes and only read 4092
bytes, but that is a bigger code change than I am interested it).
This patch ignores the readdir_plus attributes if a readdir finds the
information already in cache, and otherwise sets ->time_start to the time
the readdir request was sent to the server.
It might be nice to store - in the directory inode - the time stamp for
the earliest readdir request that is still in the page cache, so that we
don't ignore attribute data that we don't have to. This patch doesn't do
that.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
READDIRPLUS can be a performance hindrance when the client is working with
large directories. In addition, some servers still have bugs in their
implementations (e.g. Tru64 returns wrong values for the fsid).
Add a mount flag to enable users to turn it off at mount time following the
implementation in Apple's NFS client.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It is arguable whether NFSROOT will support IPv6, and thus whether
rpcb_getport_external needs to support rpcbind versions greater than 2.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.
To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.
Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two. That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.
And, we can finally be rid of RPC_SLACK_SPACE!
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NLM version 4 requests estimate the call and reply header sizes rather
conservatively, using the very maximum size allowed in the protocol even
though Linux always uses only a small fraction of the allowable space.
Reduce the size of caller and lock arguments to conserve RPC buffer space
while XDR encoding NLM4 arguments. Add compile-time checks to ensure the
hostname string won't overflow NLM protocol maximums.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently we do write coalescing in a very inefficient manner: one pass in
generic_writepages() in order to lock the pages for writing, then one pass
in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather
the locked pages for coalescing into RPC requests of size "wsize".
In fact, it turns out there is actually a deadlock possible here since we
only start I/O on the second pass. If the user signals the process while
we're in nfs_sync_mapping_wait(), for instance, then we may exit before
starting I/O on all the requests that have been queued up.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Do the coalescing of read requests into block sized requests at start of
I/O as we scan through the pages instead of going through a second pass.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It is redundant, and will interfere with the call to
balance_dirty_pages_ratelimited_nr in generic_file_write().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfs statfs function returns a success code on error, and fills the
output buffer with invalid values. The attached patch makes it return a
correct error code instead.
Signed-off-by: Amnon Aaronsohn <amnonaar@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(Modified patch to reinstate the dprintk())
We're getting lockdep warnings due to a post-2.6.21-rc7 bugfix.
The xattr_sem can never be taken in the manner described. Internal inodes
are protected by I_PRIVATE. Add the appropriate annotation.
Cc: <stable@kernel.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When CIFS Unix Extensions are negotiated we get the Unix uid and gid
owners of the file from the server (on the Unix Query Path Info
levels), but if the server's uids don't match the client uid's users
were having to disable the Unix Extensions (which turned off features
they still wanted). The changeset patch allows users to override uid
and/or gid for file/directory owner with a default uid and/or gid
specified at mount (as is often done when mounting from Linux cifs
client to Windows server). This changeset also displays the uid
and gid used by default in /proc/mounts (if applicable).
Also cleans up code by adding some of the missing spaces after
"if" keywords per-kernel style guidelines (as suggested by Randy Dunlap
when he reviewed the patch).
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
[PATCH] elevator: elv_list_lock does not need irq disabling
[BLOCK] Don't pin lots of memory in mempools
cfq-iosched: speedup cic rb lookup
ll_rw_blk: add io_context private pointer
cfq-iosched: get rid of cfqq hash
cfq-iosched: tighten queue request overlap condition
cfq-iosched: improve sync vs async workloads
cfq-iosched: never allow an async queue idling
cfq-iosched: get rid of ->dispatch_slice
cfq-iosched: don't pass unused preemption variable around
cfq-iosched: get rid of ->cur_rr and ->cfq_list
cfq-iosched: slice offset should take ioprio into account
[PATCH] cfq-iosched: style cleanups and comments
cfq-iosched: sort IDLE queues into the rbtree
cfq-iosched: sort RT queues into the rbtree
[PATCH] cfq-iosched: speed up rbtree handling
cfq-iosched: rework the whole round-robin list concept
cfq-iosched: minor updates
cfq-iosched: development update
cfq-iosched: improve preemption for cooperating tasks
Currently we scale the mempool sizes depending on memory installed
in the machine, except for the bio pool itself which sits at a fixed
256 entry pre-allocation.
There's really no point in "optimizing" this OOM path, we just need
enough preallocated to make progress. A single unit is enough, lets
scale it down to 2 just to be on the safe side.
This patch saves ~150kb of pinned kernel memory on a 32-bit box.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.infradead.org/mtd-2.6: (46 commits)
[MTD] [MAPS] drivers/mtd/maps/ck804xrom.c: convert pci_module_init()
[MTD] [NAND] CM-x270 MTD driver
[MTD] [NAND] Wrong calculation of page number in nand_block_bad()
[MTD] [MAPS] fix plat-ram printk format
[JFFS2] Fix compr_rubin.c build after include file elimination.
[JFFS2] Handle inodes with only a single metadata node with non-zero isize
[JFFS2] Tidy up licensing/copyright boilerplate.
[MTD] [OneNAND] Exit loop only when column start with 0
[MTD] [OneNAND] Fix access the past of the real oobfree array
[MTD] [OneNAND] Update Samsung OneNAND official URL
[JFFS2] Better fix for all-zero node headers
[JFFS2] Improve read_inode memory usage, v2.
[JFFS2] Improve failure mode if inode checking leaves unchecked space.
[JFFS2] Fix cross-endian build.
[MTD] Finish conversion mtd_blkdevs to use the kthread API
[JFFS2] Obsolete dirent nodes immediately on unlink, where possible.
Use menuconfig objects: MTD
[MTD] mtd_blkdevs: Convert to use the kthread API
[MTD] Fix fwh_lock locking
[JFFS2] Speed up mount for directly-mapped NOR flash
...
Fixes for various arch compilation problems:
(*) Missing module exports.
(*) Variable name collision when rxkad and af_rxrpc both built in
(rxrpc_debug).
(*) Large constant representation problem (AFS_UUID_TO_UNIX_TIME).
(*) Configuration dependencies.
(*) printk() format warnings.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the wakeup transitions after a VLocation record update completes
one way or another. This builds on Dave Miller's partial fix.
Also move wakeups outside the spinlocked sections.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (46 commits)
dev_dbg: check dev_dbg() arguments
drivers/base/attribute_container.c: use mutex instead of binary semaphore
mod_sysfs_setup() doesn't return errno when kobject_add_dir() failure occurs
s2ram: add arch irq disable/enable hooks
define platform wakeup hook, use in pci_enable_wake()
security: prevent permission checking of file removal via sysfs_remove_group()
device_schedule_callback() needs a module reference
s390: cio: Delay uevents for subchannels
sysfs: bin.c printk fix
Driver core: use mutex instead of semaphore in DMA pool handler
driver core: bus_add_driver should return an error if no bus
debugfs: Add debugfs_create_u64()
the overdue removal of the mount/umount uevents
kobject: Comment and warning fixes to kobject.c
Driver core: warn when userspace writes to the uevent file in a non-supported way
Driver core: make uevent-environment available in uevent-file
kobject core: remove rwsem from struct subsystem
qeth: Remove usage of subsys.rwsem
PHY: remove rwsem use from phy core
IEEE1394: remove rwsem use from ieee1394 core
...
Prevent permission checking from being performed when the kernel wants to
unconditionally remove a sysfs group, by introducing an kernel-only variant
of lookup_one_len(), lookup_one_len_kern().
Additionally, as sysfs_remove_group() does not check the return value of
the lookup before using it, a BUG_ON has been added to pinpoint the cause
of any problems potentially caused by this (and as a form of annotation).
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Nagendra Singh Tomar <nagendra_tomar@adaptec.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as896b) fixes an oversight in the design of
device_schedule_callback(). It is necessary to acquire a reference to the
module owning the callback routine, to prevent the module from being
unloaded before the callback can run.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Satyam Sharma <satyam.sharma@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
fs/sysfs/bin.c: In function 'read':
fs/sysfs/bin.c:77: warning: format '%zd' expects type 'signed size_t', but argument 4 has type 'int'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I went to use this the other day, only to find it didn't exist.
It's a straight copy of the debugfs u32 code, then s/u32/u64/. A quick
test shows it seems to be working.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch contains the overdue removal of the mount/umount uevents.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of git://git.infradead.org/ubi-2.6:
UBI: remove unused variable
UBI: add me to MAINTAINERS
JFFS2: add UBI support
UBI: Unsorted Block Images
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (27 commits)
ocfs2: Cache extent records
ocfs2: Remember rw lock level during direct io
ocfs2: Fix up i_blocks calculation to know about holes
ocfs2: Fix extent lookup to return true size of holes
ocfs2: Read from an unwritten extent returns zeros
ocfs2: make room for unwritten extents flag
ocfs2: Use own splice write actor
ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate()
[PATCH] Turn do_sync_file_range() into do_sync_mapping_range()
ocfs2: zero tail of sparse files on truncate
ocfs2: Teach ocfs2_get_block() about holes
ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write()
ocfs2: teach ocfs2_file_aio_write() about sparse files
ocfs2: Turn off shared writeable mmap for local files systems with holes.
ocfs2: abstract out allocation locking
ocfs2: teach extend/truncate about sparse files
ocfs2: temporarily remove extent map caching
ocfs2: sparse b-tree support
ocfs2: small cleanup of ocfs2_request_delete()
ocfs2: remove unused code
...
This patch make JFFS2 able to work with UBI volumes via the emulated MTD
devices which are directly mapped to these volumes.
Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
cmpxchg() is not available on every processor so can't
be used in generic code.
Replace with spinlock protection on the ->state changes,
wakeups, and wait loops.
Add what appears to be a missing wakeup on transition
to AFS_VL_VALID state in afs_vlocation_updater().
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the create, link, symlink, unlink, mkdir, rmdir and
rename VFS operations to the in-kernel AFS filesystem.
Also:
(1) Fix dentry and inode revalidation. d_revalidate should only look at
state of the dentry. Revalidation of the contents of an inode pointed to
by a dentry is now separate.
(2) Fix afs_lookup() to hash negative dentries as well as positive ones.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the CB.InitCallBackState3 operation for the fileserver to
call. This reduces the amount of network traffic because if this op
is aborted, the fileserver will then attempt an CB.InitCallBackState
operation.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the CB.GetCapabilities operation with which the fileserver can
ask the client for the following information:
(1) The list of network interfaces it has available as IPv4 address + netmask
plus the MTUs.
(2) The client's UUID.
(3) The extended capabilities of the client, for which the only current one
is unified error mapping (abort code interpretation).
To support this, the patch adds the following routines to AFS:
(1) A function to iterate through all the network interfaces using RTNETLINK
to extract IPv4 addresses and MTUs.
(2) A function to iterate through all the network interfaces using RTNETLINK
to pull out the MAC address of the lowest index interface to use in UUID
construction.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add security support to the AFS filesystem. Kerberos IV tickets are added as
RxRPC keys are added to the session keyring with the klog program. open() and
other VFS operations then find this ticket with request_key() and either use
it immediately (eg: mkdir, unlink) or attach it to a file descriptor (open).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle multiple mounts of an AFS superblock correctly, checking to see
whether the superblock is already initialised after calling sget()
rather than just unconditionally stamping all over it.
Also delete the "silent" parameter to afs_fill_super() as it's not
used and can, in any case, be obtained from sb->s_flags.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Delete the old RxRPC code as it's now no longer used.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up the AFS sources.
Also remove references to AFS keys. RxRPC keys are used instead.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The extent map code was ripped out earlier because of an inability to deal
with holes. This patch adds back a simpler caching scheme requiring far less
code.
Our old extent map caching was designed back when meta data block caching in
Ocfs2 didn't work very well, resulting in many disk reads. These days our
metadata caching is much better, resulting in no un-necessary disk reads. As
a result, extent caching doesn't have to be as fancy, nor does it have to
cache as many extents. Keeping the last 3 extents seen should be sufficient
to give us a small performance boost on some streaming workloads.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cluster locking might have been redone because a direct write won't
complete, so this needs to be reflected in the iocb.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Older file systems which didn't support holes did a dumb calculation of
i_blocks based on i_size. This is no longer accurate, so fix things up to
take actual allocation into account.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>