Split the LRU lists in two, one set for pages that are backed by real file
systems ("file") and one for pages that are backed by memory and swap
("anon"). The latter includes tmpfs.
The advantage of doing this is that the VM will not have to scan over lots
of anonymous pages (which we generally do not want to swap out), just to
find the page cache pages that it should evict.
This patch has the infrastructure and a basic policy to balance how much
we scan the anon lists and how much we scan the file lists. The big
policy changes are in separate patches.
[lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
[kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
[kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
[hugh@veritas.com: memcg swapbacked pages active]
[hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
[akpm@linux-foundation.org: fix /proc/vmstat units]
[nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
[kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
[kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cache_change_attribute is used to decide whether or not a directory has
changed, in which case we may need to look it up again. Again, the use of
'jiffies' leads to an issue of resolution.
Once again, the fix is to change nfs_inode->cache_change_attribute, and
just make it a simple counter.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It appears that 'jiffies' timestamps do not have high enough resolution for
nfs_inode_attrs_need_update(). One problem is that a GETATTR can be
launched within < 1 jiffy of the last operation that updated the attribute.
Another problem is that RPC calls can take < 1 jiffy to execute.
We can fix this by switching the variables to use a simple global counter
that gets incremented every time we start another GETATTR call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add the flag NFS_MOUNT_LOOKUP_CACHE_NONEG to turn off the caching of
negative dentries. In reality what we do is to force
nfs_lookup_revalidate() to always discard negative dentries.
Add the flag NFS_MOUNT_LOOKUP_CACHE_NONE for enforcing stricter
revalidation of dentries. It forces the revalidate code to always do a
lookup instead of just checking the cached mtime of the parent directory.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* kill nameidata * argument; map the 3 bits in ->flags anybody cares
about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
MAY_... found in mask.
The obvious next target in that direction is permission(9)
folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Page accesses are serialised using the page locks, whereas all attribute
updates are serialised using the inode->i_lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Page cache accesses are serialised using page locks, whereas attribute
updates are serialised using inode->i_lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Attribute updates are safe, and dentry operations are protected using VFS
level locks. Defer removing the BKL from sillyrename until a separate
patch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
All dentry-related operations are already BKL-safe, since they are
protected by the VFS locking. No extra locks should be needed in the NFS
code.
In the case of nfs_revalidate_inode(), we're only doing an attribute
update (protected by the inode->i_lock).
In the case of nfs_lookup(), we're instantiating a new dentry, so there
should be no contention possible until after we call d_materialise_unique.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_instantiate() does not require the BKL, neither do the attribute
updates or the RPC code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
All the NFSv4 stateful operations are already protected by other locks (in
particular by the rpc_sequence locks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: some fops use NFSDBG_FILE, some use NFSDBG_VFS. Let's use
NFSDBG_FILE for all fops, and consistently report file names instead
of inode numbers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Report the same debugging info and count function calls the
same for files and directories in nfs_opendir() and nfs_file_open().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Report the same debugging info in nfs_llseek_dir() and
nfs_llseek_file().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Report the same debugging info, count function calls the same,
and use similar function naming in nfs_fsync_dir() and nfs_fsync().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
invalidate_inode_pages2_range() takes page offset arguments, not byte
ranges.
Another thought is that individual pages might perhaps get evicted by VM
pressure, in which case we might perhaps want to re-read not only the
evicted page, but all subsequent pages too (in case the server returns
more/less data per page so that the alignment of the next entry
changes). We should therefore remove the condition that we only do this on
page->index==0.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If we depend on the inodes for writeability, we will not catch the r/o mounts
when implemented.
This patches uses __mnt_want_write(). It does not guarantee that the mount
will stay writeable after the check. But, this is OK for one of the checks
because it is just for a printk().
The other two are probably unnecessary and duplicate existing checks in the
VFS. This won't make them better checks than before, but it will make them
detect r/o mounts.
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The NFSv4 protocol allows clients to negotiate security protocols on the
fly in the case where an administrator on the server changes the export
settings and/or in the case where we may have a filesystem migration event.
Instead of having the NFS client code cache credentials that are tied to a
particular AUTH method it is therefore preferable to have a generic credential
that can be converted into whatever AUTH is in use by the RPC client when
the read/write/sillyrename/... is put on the wire.
We do this by means of the new "generic" credential, which basically just
caches the minimal information that is needed to look up an RPCSEC_GSS,
AUTH_SYS, or AUTH_NULL credential.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
As long as the directory contents haven't changed, we should just let the
path walk proceed to cross the mountpoint. Apart from being an optimisation
in the case of 'nohide' mountpoint traversals, it also fixes an issue with
referrals: referral inodes don't have valid filehandles, so calling
nfs_revalidate_inode() on them is a bug.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The error field in nfs_readdir_descriptor_t is never used outside of the
function in which it is set. Remove the field and change the place that
does use it to use an existing local variable.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Otherwise, there is a potential deadlock if the last dput() from an NFSv4
close() or other asynchronous operation leads to nfs_clear_inode calling
the synchronous delegreturn.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server returns an ENOENT error, we still need to do a d_delete() in
order to ensure that the dentry is deleted.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that readdir revalidates its data cache after blocking on
sillyrename.
Also fix a typo in nfs_do_call_unlink(): swap the ^= for an |=. The result
is the same, since we've already checked that the flag is unset, but it
makes the code more readable.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Every file should include the headers containing the prototypes for its global
functions (in this case nfs_access_cache_shrinker()).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
lookup() and sillyrename() can race one another because the sillyrename()
completion cannot take the parent directory's inode->i_mutex since the
latter may be held by whoever is calling dput().
We therefore have little option but to add extra locking to ensure that
nfs_lookup() and nfs_atomic_open() do not race with the sillyrename
completion.
If somebody has looked up the sillyrenamed file in the meantime, we just
transfer the sillydelete information to the new dentry.
Please refer to the bug-report at
http://bugzilla.linux-nfs.org/show_bug.cgi?id=150
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This boot parameter will allow legacy 32-bit applications which call stat()
to continue to function even if the NFSv3/v4 server uses 64-bit inode
numbers.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
- NFS_READTIME, NFS_CHANGE_ATTR are completely unused.
- Inline the few remaining uses of NFS_ATTRTIMEO, and remove.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We don't need to call nfs_revalidate_inode() on the directory if we already
know that the verifiers don't match.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The lower level routines in fs/nfs/proc.c, fs/nfs/nfs3proc.c and
fs/nfs/nfs4proc.c should already be dealing with the revalidation issues.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The fact that we're in the process of modifying the inode does not mean
that we should not invalidate the attribute and data caches. The defensive
thing is to always invalidate when we're confronted with inode
mtime/ctime or change_attribute updates that we do not immediately
recognise.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the ->lookup() call causes the directory verifier to change, then there
is still no need to use the old verifier, since our dentry has been
verified.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>