Commit Graph

65 Commits

Author SHA1 Message Date
Chuck Lever 91d5b47023 NFS: add I/O performance counters
Invoke the byte and event counter macros where we want to count bytes and
events.

Clean-up: fix a possible NULL dereference in nfs_lock, and simplify
nfs_file_open.

Test-plan:
fsx and iozone on UP and SMP systems, with and without pre-emption.  Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:14 -05:00
Chuck Lever d9ef5a8c26 NFS: introduce mechanism for tracking NFS client metrics
Add a per-superblock performance counter facility to the NFS client.  This
facility mimics the counters available for block devices and for
networking.  Expose these new counters via the new /proc/self/mountstats
interface.

Thanks to Andrew Morton and Trond Myklebust for their review and comments.

Test plan:
fsx and iozone on UP and SMP systems, with and without pre-emption.  Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:13 -05:00
Chuck Lever c8bded96aa NFS: clean up some mount options
Get rid of "lock" and "posix", and spell out "vers=".

Test plan:
None.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:13 -05:00
Chuck Lever 7a480e250c NFS: show retransmit settings when displaying mount options
Sometimes it's important to know the exact RPC retransmit settings the
kernel is using for an NFS mount point.  Add this facility to the NFS
client's show_options method.

Test plan:
Set various retransmit settings via the mount command, and check that the
settings are reflected in /proc/mounts.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:12 -05:00
Eric Sesterhenn bd6475454c NFS: kzalloc conversion in fs/nfs
this converts fs/nfs to kzalloc() usage.
compile tested with make allyesconfig

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:10 -05:00
Trond Myklebust 967b928136 NFSv4: Do not call rpciod_down() before call to destroy_nfsv4_state()
The reason is that the idmapper cleanup may call flush_workqueue() on
rpciod_workqueue.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:09 -05:00
Trond Myklebust fb374d24f2 NFS: reduce the number of false cache invalidations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:08 -05:00
Trond Myklebust ca62b9c3f7 NFSv4: Don't invalidate cached attributes if change attribute is unchanged
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:07 -05:00
Trond Myklebust 755c1e20cd NFS: writes should not clobber utimes() calls
Ensure that we flush out writes in the case when someone calls utimes() in
order to set the file times.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:06 -05:00
Trond Myklebust b92dccf65b NFS: Fix a busy inodes issue...
The nfs_open_context may live longer than the file descriptor that spawned
it, so it needs to carry a reference to the vfsmount. If not, then
generic_shutdown_super() may end up being called before reads and writes
have been flushed out.

Make a couple of functions static while we're at it...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:03 -05:00
Christoph Hellwig fc33a7bb9c [PATCH] per-mountpoint noatime/nodiratime
Turn noatime and nodiratime into per-mount instead of per-sb flags.

After all the preparations this is a rather trivial patch.  The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore.  Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.

While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:34 -08:00
OGAWA Hirofumi 28fd129827 [PATCH] Fix and add EXPORT_SYMBOL(filemap_write_and_wait)
This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it.

See mm/filemap.c:

And changes the filemap_write_and_wait() and filemap_write_and_wait_range().

Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite()
returns error.  However, even if filemap_fdatawrite() returned an
error, it may have submitted the partially data pages to the device.
(e.g. in the case of -ENOSPC)

<quotation>
Andrew Morton writes,

If filemap_fdatawrite() returns an error, this might be due to some
I/O problem: dead disk, unplugged cable, etc.  Given the generally
crappy quality of the kernel's handling of such exceptions, there's a
good chance that the filemap_fdatawait() will get stuck in D state
forever.
</quotation>

So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO.

Trond, could you please review the nfs part?  Especially I'm not sure,
nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not.

Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:47 -08:00
Trond Myklebust 58df095b73 NFSv4: Allow entries in the idmap cache to expire
If someone changes the uid/gid mapping in userland, then we do eventually
 want those changes to be propagated to the kernel. Currently the kernel
 assumes that it may cache entries forever.

 Add an expiration time + garbage collector for idmap entries.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:58 -05:00
Chuck Lever f518e35aec SUNRPC: get rid of cl_chatty
Clean up: Every ULP that uses the in-kernel RPC client, except the NLM
 client, sets cl_chatty.  There's no reason why NLM shouldn't set it, so
 just get rid of cl_chatty and always be verbose.

 Test-plan:
 Compile with CONFIG_NFS enabled.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:56 -05:00
Trond Myklebust a72b44222d NFSv4: Allow user to set the port used by the NFSv4 callback channel
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:52 -05:00
Trond Myklebust a895b4a198 NFS: Clean up weak cache consistency code
...and ensure that nfs_update_inode() respects wcc

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:52 -05:00
Trond Myklebust 70b9ecbdb9 NFS: Make stat() return updated mtimes after a write()
The SuS states that a call to write() will cause mtime to be updated on
 the file. In order to satisfy that requirement, we need to flush out
 any cached writes in nfs_getattr().
 Speed things up slightly by not committing the writes.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:50 -05:00
Chuck Lever 40859d7ee6 NFS: support large reads and writes on the wire
Most NFS server implementations allow up to 64KB reads and writes on the
 wire.  The Solaris NFS server allows up to a megabyte, for instance.

 Now the Linux NFS client supports transfer sizes up to 1MB, too.  This will
 help reduce protocol and context switch overhead on read/write intensive NFS
 workloads, and support larger atomic read and write operations on servers
 that support them.

 Test-plan:
 Connectathon and iozone on mount point with wsize=rsize>32768 over TCP.
 Tests with NFS over UDP to verify the maximum RPC payload size cap.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:49 -05:00
Chuck Lever 325cfed9ae NFS: make "inode number mismatch" message more useful
To help NFS users and server developers, make the "inode number mismatch"
 message display more useful information.

 Test-plan:
 None.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:49 -05:00
Chuck Lever dc20f80390 NFS: get rid of useless kernel log message
nfs_statfs() generates a log message when GETATTR returns an error.  This
 is usually a useless message.  Make it a dprintk.

 Test plan:
 None

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:48 -05:00
Chuck Lever 6b59a75460 NFS: Fix error recovery code in fs/nfs/inode.c:__init_nfs()
Red Hat found a problem in the error recovery logic in __init_nfs.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:48 -05:00
Trond Myklebust 286d7d6a0c NFSv4: Remove requirement for machine creds for the "setclientid" operation
Use a cred from the nfs4_client->cl_state_owners list.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:47 -05:00
Trond Myklebust 29884df0d8 NFS: Fix another O_DIRECT race
Ensure we call unmap_mapping_range() and sync dirty pages to disk before
 doing an NFS direct write.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-19 23:12:09 -05:00
Trond Myklebust 24aa1fe677 NFS: Fix a few further cache consistency regressions
Steve Dickson writes:
 Doing the following:
 1. On server:
 $ mkdir ~/t
 $ echo Hello > ~/t/tmp

 2. On client, wait for a string to appear in this file:
 $ until grep -q foo t/tmp ; do echo -n . ; sleep 1 ; done

 3. On server, create a *new* file with the same name containing that
 string:
 $ mv ~/t/tmp ~/t/tmp.old; echo foo > ~/t/tmp

 will show how the client will never (and I mean never ;-) ) see
 the updated file.

 The problem is that we do not update nfsi->cache_change_attribute when the
 file changes on the server (we only update it when our client makes the
 changes). This again means that functions like nfs_check_verifier() will
 fail to register when the parent directory has changed and should trigger
 a dentry lookup revalidation.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03 15:20:07 -05:00
Steve Dickson 223db122bf NFS: Fix cache consistency regression
Make sure cache_change_attribute is initialized to jiffies
 so when the mtime changes on directory, the directory
 will be refreshed.

 Signed-off by: Steve Dickson <steved@redhat.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03 15:20:03 -05:00
Trond Myklebust b37b03b705 NFS: Fix a spinlock recursion inside nfs_update_inode()
In cases where the server has gone insane, nfs_update_inode() may end
 up calling nfs_invalidate_inode(), which again calls stuff that takes
 the inode->i_lock that we're already holding.

 In addition, given the sort of things we have in NFS these days that
 need to be cleaned up on inode release, I'm not sure we should ever
 be calling make_bad_inode().

 Fix up spinlock recursion, and limit nfs_invalidate_inode() to clearing
 the caches, and marking the inode as being stale.

 Thanks to Steve Dickson <SteveD@redhat.com> for spotting this.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-25 17:11:29 -05:00
Jesper Juhl f99d49adf5 [PATCH] kfree cleanup: fs
This is the fs/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in fs/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:54:06 -08:00
Trond Myklebust d530838bfa NFSv4: Fix problem with OPEN_DOWNGRADE
RFC 3530 states that for OPEN_DOWNGRADE "The share_access and share_deny
 bits specified must be exactly equal to the union of the share_access and
 share_deny bits specified for some subset of the OPENs in effect for
 current openowner on the current file.

 Setattr is currently violating the NFSv4 rules for OPEN_DOWNGRADE in that
 it may cause a downgrade from OPEN4_SHARE_ACCESS_BOTH to
 OPEN4_SHARE_ACCESS_WRITE despite the fact that there exists no open file
 with O_WRONLY access mode.

 Fix the problem by replacing nfs4_find_state() with a modified version of
 nfs_find_open_context().

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-04 15:33:38 -05:00
Trond Myklebust d3f8cf4899 [PATCH] NFS: Remove unbalanced spin_unlock() calls from nfs_refresh_inode()
Doh!

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 14:46:47 -08:00
Trond Myklebust bec273b491 NFS: Allow files that are open for write to invalidate caches
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:45 -04:00
Trond Myklebust decf491f30 NFS: Don't let nfs_end_data_update() clobber attribute update information
Since we almost always call nfs_end_data_update() after we called
 nfs_refresh_inode(), we now end up marking the inode metadata
 as needing revalidation immediately after having updated it.

 This patch rearranges things so that we mark the inode as needing
 revalidation _before_ we call nfs_refresh_inode() on those operations
 that need it.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:39 -04:00
Trond Myklebust 33801147a8 NFS: Optimise inode attribute cache updates
Allow nfs_refresh_inode() also to update attributes on the inode if the
 RPC call was sent after the last call to nfs_update_inode().

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:39 -04:00
Trond Myklebust 913a70fc17 NFS: Convert cache_change_attribute into a jiffy-based value
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:38 -04:00
Trond Myklebust 642ac54923 NFSv4: Return delegations in case we're changing ACLs
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:19 -07:00
Trond Myklebust cae7a073a4 NFSv4: Return delegation upon rename or removal of file.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:19 -07:00
Trond Myklebust cff6bf9709 Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6 2005-10-18 13:50:52 -07:00
Trond Myklebust 6ce969171d [PATCH] NFS: Fix Oopsable/unnecessary i_count manipulations in nfs_wait_on_inode()
Oopsable since nfs_wait_on_inode() can get called as part of iput_final().

Unnecessary since the caller had better be damned sure that the inode won't
disappear from underneath it anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 14:47:16 -07:00
Trond Myklebust b3c52da33c [PATCH] NFS: Fix cache consistency races
If the data cache has been marked as potentially invalid by nfs_refresh_inode,
we should invalidate it rather than assume that changes are due to our own
activity.

Also ensure that we always start with a valid cache before declaring it
to be protected by a delegation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 14:47:16 -07:00
Trond Myklebust 3063d8a166 NFS: Make /proc/mounts display the protocol used by NFSv4
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:59 -04:00
Chuck Lever 03bf4b707e [PATCH] RPC: parametrize various transport connect timeouts
Each transport implementation can now set unique bind, connect,
 reestablishment, and idle timeout values.  These are variables,
 allowing the values to be modified dynamically.  This permits
 exponential backoff of any of these values, for instance.

 As an example, we implement exponential backoff for the connection
 reestablishment timeout.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:53 -04:00
Chuck Lever eab5c084b8 [PATCH] NFS: use a constant value for TCP retransmit timeouts
Implement a best practice: don't use exponential backoff when computing
 retransmit timeout values on TCP connections, but simply retransmit
 at regular intervals.

 This also fixes a bug introduced when xprt_reset_majortimeo() was added.

 Test-plan:
 Enable RPC debugging and watch timeout behavior on a NFS/TCP mount.

 Version: Thu, 11 Aug 2005 16:02:19 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:06 -04:00
Mark Fasheh fef266580e [PATCH] update filesystems for new delete_inode behavior
Update the file systems in fs/ implementing a delete_inode() callback to
call truncate_inode_pages().  One implementation note: In developing this
patch I put the calls to truncate_inode_pages() at the very top of those
filesystems delete_inode() callbacks in order to retain the previous
behavior.  I'm guessing that some of those could probably be optimized.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:27 -07:00
Chuck Lever dc59250c6e [PATCH] NFS: Introduce the use of inode->i_lock to protect fields in nfsi
Down the road we want to eliminate the use of the global kernel lock entirely
from the NFS client.  To do this, we need to protect the fields in the
nfs_inode structure adequately.  Start by serializing updates to the
"cache_validity" field.

Note this change addresses an SMP hang found by njw@osdl.org, where processes
deadlock because nfs_end_data_update and nfs_revalidate_mapping update the
"cache_validity" field without proper serialization.

Test plan:
 Millions of fsx ops on SMP clients.  Run Nick Wilson's breaknfs program on
 large SMP clients.

Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-18 12:53:57 -07:00
Chuck Lever 412d582ec1 [PATCH] NFS: use atomic bitops to manipulate flags in nfsi->flags
Introduce atomic bitops to manipulate the bits in the nfs_inode structure's
"flags" field.

Using bitops means we can use a generic wait_on_bit call instead of an ad hoc
locking scheme in fs/nfs/inode.c, so we can remove the "nfs_i_wait" field from
nfs_inode at the same time.

The other new flags field will continue to use bitmask and logic AND and OR.
This permits several flags to be set at the same time efficiently.  The
following patch adds a spin lock to protect these flags, and this spin lock
will later cover other fields in the nfs_inode structure, amortizing the cost
of using this type of serialization.

Test plan:
 Millions of fsx ops on SMP clients.

Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-18 12:53:56 -07:00
Chuck Lever 5529680981 [PATCH] NFS: split nfsi->flags into two fields
Certain bits in nfsi->flags can be manipulated with atomic bitops, and some
are better manipulated via logical bitmask operations.

This patch splits the flags field into two.  The next patch introduces atomic
bitops for one of the fields.

Test plan:
 Millions of fsx ops on SMP clients.

Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-18 12:53:56 -07:00
Trond Myklebust 65e4308d25 [PATCH] NFS: Ensure we always update inode->i_mode when doing O_EXCL creates
When the client performs an exclusive create and opens the file for writing,
a Netapp filer will first create the file using the mode 01777. It does this
since an NFSv3/v4 exclusive create cannot immediately set the mode bits.
The 01777 mode then gets put into the inode->i_mode. After the file creation
is successful, we then do a setattr to change the mode to the correct value
(as per the NFS spec).

The problem is that nfs_refresh_inode() no longer updates inode->i_mode, so
the latter retains the 01777 mode. A bit later, the VFS notices this, and calls
remove_suid(). This of course now resets the file mode to inode->i_mode & 0777.
Hey presto, the file mode on the server is now magically changed to 0777. Duh...

Fixes http://bugzilla.linux-nfs.org/show_bug.cgi?id=32

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-16 09:30:58 -07:00
Trond Myklebust 3da28eb1c6 [PATCH] NFS: Replace nfs_page insertion sort with a radix sort
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:39 -04:00
Trond Myklebust fe51beecc5 [PATCH] NFS: Ensure that fstat() always returns the correct mtime
Even if the file is open for writes.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:37 -04:00
Trond Myklebust 7d52e86274 [PATCH] NFS: Cleanup of caching code, and slight optimization of writes.
Unless we're doing O_APPEND writes, we really don't care about revalidating
 the file length. Just make sure that we catch any page cache invalidations.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:37 -04:00
Trond Myklebust 951a143b3f [PATCH] NFS: Fix the file size revalidation
Instead of looking at whether or not the file is open for writes before
 we accept to update the length using the server value, we should rather
 be looking at whether or not we are currently caching any writes.

 Failure to do so means in particular that we're not updating the file
 length correctly after obtaining a POSIX or BSD lock.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:36 -04:00