Commit Graph

213 Commits

Author SHA1 Message Date
Miklos Szeredi c3696046be fuse: separate pqueue for clones
Make each fuse device clone refer to a separate processing queue.  The only
constraint on userspace code is that the request answer must be written to
the same device clone as it was read off.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-07-01 16:26:09 +02:00
Miklos Szeredi cc080e9e9b fuse: introduce per-instance fuse_dev structure
Allow fuse device clones to refer to be distinguished.  This patch just
adds the infrastructure by associating a separate "struct fuse_dev" with
each clone.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:08 +02:00
Miklos Szeredi 77cd9d488b fuse: add req flag for private list
When an unlocked request is aborted, it is moved from fpq->io to a private
list.  Then, after unlocking fpq->lock, the private list is processed and
the requests are finished off.

To protect the private list, we need to mark the request with a flag, so if
in the meantime the request is unlocked the list is not corrupted.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:06 +02:00
Miklos Szeredi 45a91cb1a4 fuse: pqueue locking
Add a fpq->lock for protecting members of struct fuse_pqueue and FR_LOCKED
request flag.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:06 +02:00
Miklos Szeredi e96edd94d0 fuse: duplicate ->connected in pqueue
This will allow checking ->connected just with the processing queue lock.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:04 +02:00
Miklos Szeredi 3a2b5b9cd9 fuse: separate out processing queue
This is just two fields: fc->io and fc->processing.

This patch just rearranges the fields, no functional change.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:04 +02:00
Miklos Szeredi e16714d875 fuse: duplicate ->connected in iqueue
This will allow checking ->connected just with the input queue lock.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:01 +02:00
Miklos Szeredi f88996a933 fuse: separate out input queue
The input queue contains normal requests (fc->pending), forgets
(fc->forget_*) and interrupts (fc->interrupts).  There's also fc->waitq and
fc->fasync for waking up the readers of the fuse device when a request is
available.

The fc->reqctr is also moved to the input queue (assigned to the request
when the request is added to the input queue.

This patch just rearranges the fields, no functional change.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:01 +02:00
Miklos Szeredi 33e14b4dfd fuse: req state use flags
Use flags for representing the state in fuse_req.  This is needed since
req->list will be protected by different locks in different states, hence
we'll want the state itself to be split into distinct bits, each protected
with the relevant lock in that state.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-07-01 16:26:01 +02:00
Miklos Szeredi 7a3b2c7547 fuse: simplify req states
FUSE_REQ_INIT is actually the same state as FUSE_REQ_PENDING and
FUSE_REQ_READING and FUSE_REQ_WRITING can be merged into a common
FUSE_REQ_IO state.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:00 +02:00
Miklos Szeredi dc00809a53 fuse: use per req lock for lock/unlock_request()
Reuse req->waitq.lock for protecting FR_ABORTED and FR_LOCKED flags.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:58 +02:00
Miklos Szeredi 825d6d3395 fuse: req use bitops
Finer grained locking will mean there's no single lock to protect
modification of bitfileds in fuse_req.

So move to using bitops.  Can use the non-atomic variants for those which
happen while the request definitely has only one reference.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:58 +02:00
Christoph Hellwig 9d5722b777 fuse: handle synchronous iocbs internally
Based on a patch from Maxim Patlasov <MPatlasov@parallels.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-13 12:10:15 -04:00
Miklos Szeredi 9759bd5189 fuse: add memory barrier to INIT
Theoretically we need to order setting of various fields in fc with
fc->initialized.

No known bug reports related to this yet.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-01-06 10:45:35 +01:00
Miklos Szeredi 7078187a79 fuse: introduce fuse_simple_request() helper
The following pattern is repeated many times:

	req = fuse_get_req_nopages(fc);
	/* Initialize req->(in|out).args */
	fuse_request_send(fc, req);
	err = req->out.h.error;
	fuse_put_request(req);

Create a new replacement helper:

	/* Initialize args */
	err = fuse_simple_request(fc, &args);

In addition to reducing the code size, this will ease moving from the
complex arg-based to a simpler page-based I/O on the fuse device.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-12 09:49:05 +01:00
Miklos Szeredi f704dcb538 fuse: reduce max out args
The third out-arg is never actually used.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-12 09:49:05 +01:00
Miklos Szeredi baebccbe99 fuse: hold inode instead of path after release
path_put() in release could trigger a DESTROY request in fuseblk.  The
possible deadlock was worked around by doing the path_put() with
schedule_work().

This complexity isn't needed if we just hold the inode instead of the path.
Since we now flush all requests before destroying the super block we can be
sure that all held inodes will be dropped.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-12 09:49:04 +01:00
Miklos Szeredi 580640ba5d fuse: flush requests on umount
Use fuse_abort_conn() instead of fuse_conn_kill() in fuse_put_super().
This flushes and aborts requests still on any queues.  But since we've
already reset fc->connected, those requests would not be useful anyway and
would be flushed when the fuse device is closed.

Next patches will rely on requests being flushed before the superblock is
destroyed.

Use fuse_abort_conn() in cuse_process_init_reply() too, since it makes no
difference there, and we can get rid of fuse_conn_kill().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-12 09:49:04 +01:00
Al Viro d22a943f44 fuse: pull iov_iter initializations up
... to fuse_direct_{read,write}().  ->direct_IO() path uses the
iov_iter passed by the caller instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:51 -04:00
Miklos Szeredi 1560c974dc fuse: add renameat2 support
Support RENAME_EXCHANGE and RENAME_NOREPLACE flags on the userspace ABI.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28 16:43:44 +02:00
Maxim Patlasov ab9e13f7c7 fuse: allow ctime flushing to userspace
The patch extends fuse_setattr_in, and extends the flush procedure
(fuse_flush_times()) called on ->write_inode() to send the ctime as well as
mtime.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28 14:19:24 +02:00
Miklos Szeredi 1e18bda86e fuse: add .write_inode
...and flush mtime from this.  This allows us to use the kernel
infrastructure for writing out dirty metadata (mtime at this point, but
ctime in the next patches and also maybe atime).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28 14:19:23 +02:00
Fabian Frederick 7736e8cc51 fuse: add __exit to fuse_ctl_cleanup
fuse_ctl_cleanup is only called by __exit fuse_exit

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28 14:19:21 +02:00
Pavel Emelyanov ea8cd33390 fuse: Fix O_DIRECT operations vs cached writeback misorder
The problem is:

1. write cached data to a file
2. read directly from the same file (via another fd)

The 2nd operation may read stale data, i.e. the one that was in a file
before the 1st op. Problem is in how fuse manages writeback.

When direct op occurs the core kernel code calls filemap_write_and_wait
to flush all the cached ops in flight. But fuse acks the writeback right
after the ->writepages callback exits w/o waiting for the real write to
happen. Thus the subsequent direct op proceeds while the real writeback
is still in flight. This is a problem for backends that reorder operation.

Fix this by making the fuse direct IO callback explicitly wait on the
in-flight writeback to finish.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-02 15:38:50 +02:00
Maxim Patlasov b0aa760652 fuse: Trust kernel i_mtime only
Let the kernel maintain i_mtime locally:
 - clear S_NOCMTIME
 - implement i_op->update_time()
 - flush mtime on fsync and last close
 - update i_mtime explicitly on truncate and fallocate

Fuse inode flag FUSE_I_MTIME_DIRTY serves as indication that local i_mtime
should be flushed to the server eventually.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-02 15:38:48 +02:00
Pavel Emelyanov d5cd66c58e fuse: Connection bit for enabling writeback
Off (0) by default. Will be used in the next patches and will be turned
on at the very end.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-02 15:38:48 +02:00
Andrew Gallagher 7678ac5061 fuse: support clients that don't implement 'open'
open/release operations require userspace transitions to keep track
of the open count and to perform any FS-specific setup.  However,
for some purely read-only FSs which don't need to perform any setup
at open/release time, we can avoid the performance overhead of
calling into userspace for open/release calls.

This patch adds the necessary support to the fuse kernel modules to prevent
open/release operations from hitting in userspace. When the client returns
ENOSYS, we avoid sending the subsequent release to userspace, and also
remember this so that future opens also don't trigger a userspace
operation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-01-22 19:36:59 +01:00
Andrew Gallagher 451418fc92 fuse: don't invalidate attrs when not using atime
Various read operations (e.g. readlink, readdir) invalidate the cached
attrs for atime changes.  This patch adds a new function
'fuse_invalidate_atime', which checks for a read-only super block and
avoids the attr invalidation in that case.

Signed-off-by: Andrew Gallagher <andrewjcg@fb.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-01-22 19:36:58 +01:00
Linus Torvalds 9bc9ccd7db Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "All kinds of stuff this time around; some more notable parts:

   - RCU'd vfsmounts handling
   - new primitives for coredump handling
   - files_lock is gone
   - Bruce's delegations handling series
   - exportfs fixes

  plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
  ecryptfs: ->f_op is never NULL
  locks: break delegations on any attribute modification
  locks: break delegations on link
  locks: break delegations on rename
  locks: helper functions for delegation breaking
  locks: break delegations on unlink
  namei: minor vfs_unlink cleanup
  locks: implement delegations
  locks: introduce new FL_DELEG lock flag
  vfs: take i_mutex on renamed file
  vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  vfs: don't use PARENT/CHILD lock classes for non-directories
  vfs: pull ext4's double-i_mutex-locking into common code
  exportfs: fix quadratic behavior in filehandle lookup
  exportfs: better variable name
  exportfs: move most of reconnect_path to helper function
  exportfs: eliminate unused "noprogress" counter
  exportfs: stop retrying once we race with rename/remove
  exportfs: clear DISCONNECTED on all parents sooner
  exportfs: more detailed comment for path_reconnect
  ...
2013-11-13 15:34:18 +09:00
Al Viro dd3e2c55a4 fuse: rcu-delay freeing fuse_conn
makes ->permission() and ->d_revalidate() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:45:13 -04:00
Miklos Szeredi b70a80e7a1 vfs: introduce d_instantiate_no_diralias()
...which just returns -EBUSY if a directory alias would be created.

This is to be used by fuse mkdir to make sure that a buggy or malicious
userspace filesystem doesn't do anything nasty.  Previously fuse used a
private mutex for this purpose, which can now go away.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-10-24 23:41:37 -04:00
Miklos Szeredi 8b284dc472 fuse: writepages: handle same page rewrites
As Maxim Patlasov pointed out, it's possible to get a dirty page while it's
copy is still under writeback, despite fuse_page_mkwrite() doing its thing
(direct IO).

This could result in two concurrent write request for the same offset, with
data corruption if they get mixed up.

To prevent this, fuse needs to check and delay such writes.  This
implementation does this by:

 1. check if page is still under writeout, if so create a new, single page
    secondary request for it

 2. chain this secondary request onto the in-flight request

 2/a. if a seconday request for the same offset was already chained to the
    in-flight request, then just copy the contents of the page and discard
    the new secondary request.  This makes sure that for each page will
    have at most two requests associated with it

 3. when the in-flight request finished, send off all secondary requests
    chained onto it

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-10-01 16:44:53 +02:00
Miklos Szeredi 6314efee3c fuse: readdirplus: fix RCU walk
Doing dput(parent) is not valid in RCU walk mode.  In RCU mode it would
probably be okay to update the parent flags, but it's actually not
necessary most of the time...

So only set the FUSE_I_ADVISE_RDPLUS flag on the parent when the entry was
recently initialized by READDIRPLUS.

This is achieved by setting FUSE_I_INIT_RDPLUS on entries added by
READDIRPLUS and only dropping out of RCU mode if this flag is set.
FUSE_I_INIT_RDPLUS is cleared once the FUSE_I_ADVISE_RDPLUS flag is set in
the parent.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
2013-10-01 16:41:22 +02:00
Maxim Patlasov 06a7c3c278 fuse: hotfix truncate_pagecache() issue
The way how fuse calls truncate_pagecache() from fuse_change_attributes()
is completely wrong. Because, w/o i_mutex held, we never sure whether
'oldsize' and 'attr->size' are valid by the time of execution of
truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
released fc->lock in the middle of fuse_change_attributes(), we completely
loose control of actions which may happen with given inode until we reach
truncate_pagecache. The list of potentially dangerous actions includes
mmap-ed reads and writes, ftruncate(2) and write(2) extending file size.

The typical outcome of doing truncate_pagecache() with outdated arguments
is data corruption from user point of view. This is (in some sense)
acceptable in cases when the issue is triggered by a change of the file on
the server (i.e. externally wrt fuse operation), but it is absolutely
intolerable in scenarios when a single fuse client modifies a file without
any external intervention. A real life case I discovered by fsx-linux
looked like this:

1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
to the server synchronously, then calls fuse_change_attributes(). The
latter updates i_size, releases fc->lock, but before comparing oldsize vs
attr->size..
3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
updating attributes and i_size, but now oldsize is equal to
outarg.attr.size because i_size has just been updated (step 2). Hence,
fuse_do_setattr() returns w/o calling truncate_pagecache().
4. As soon as ftruncate(2) completes, the user extends file size by
write(2) making a hole in the middle of file, then reads data from the hole
either by read(2) or mmap-ed read. The user expects to get zero data from
the hole, but gets stale data because truncate_pagecache() is not executed
yet.

The scenario above illustrates one side of the problem: not truncating the
page cache even though we should. Another side corresponds to truncating
page cache too late, when the state of inode changed significantly.
Theoretically, the following is possible:

1. As in the previous scenario fuse_dentry_revalidate() discovered that
i_size changed (due to our own fuse_do_setattr()) and is going to call
truncate_pagecache() for some 'new_size' it believes valid right now. But
by the time that particular truncate_pagecache() is called ...
2. fuse_do_setattr() returns (either having called truncate_pagecache() or
not -- it doesn't matter).
3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
4. mmap-ed write makes a page in the extended region dirty.

The result will be the lost of data user wrote on the fourth step.

The patch is a hotfix resolving the issue in a simplistic way: let's skip
dangerous i_size update and truncate_pagecache if an operation changing
file size is in progress. This simplistic approach looks correct for the
cases w/o external changes. And to handle them properly, more sophisticated
and intrusive techniques (e.g. NFS-like one) would be required. I'd like to
postpone it until the issue is well discussed on the mailing list(s).

Changed in v2:
 - improved patch description to cover both sides of the issue.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
2013-09-03 13:41:58 +02:00
Miklos Szeredi 60b9df7a54 fuse: add flag to turn on async direct IO
Without async DIO write requests to a single file were always serialized.
With async DIO that's no longer the case.

So don't turn on async DIO by default for fear of breaking backward
compatibility.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-05-01 14:37:21 +02:00
Maxim Patlasov efb9fa9e91 fuse: truncate file if async dio failed
The patch improves error handling in fuse_direct_IO(): if we successfully
submitted several fuse requests on behalf of synchronous direct write
extending file and some of them failed, let's try to do our best to clean-up.

Changed in v2: reuse fuse_do_setattr(). Thanks to Brian for suggestion.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-18 10:55:24 +02:00
Maxim Patlasov 36cf66ed9f fuse: make fuse_direct_io() aware about AIO
The patch implements passing "struct fuse_io_priv *io" down the stack up to
fuse_send_read/write where it is used to submit request asynchronously.
io->async==0 designates synchronous processing.

Non-trivial part of the patch is changes in fuse_direct_io(): resources
like fuse requests and user pages cannot be released immediately in async
case.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-17 21:50:59 +02:00
Maxim Patlasov 01e9d11a3e fuse: add support of async IO
The patch implements a framework to process an IO request asynchronously. The
idea is to associate several fuse requests with a single kiocb by means of
fuse_io_priv structure. The structure plays the same role for FUSE as 'struct
dio' for direct-io.c.

The framework is supposed to be used like this:
 - someone (who wants to process an IO asynchronously) allocates fuse_io_priv
   and initializes it setting 'async' field to non-zero value.
 - as soon as fuse request is filled, it can be submitted (in non-blocking way)
   by fuse_async_req_send()
 - when all submitted requests are ACKed by userspace, io->reqs drops to zero
   triggering aio_complete()

In case of IO initiated by libaio, aio_complete() will finish processing the
same way as in case of dio_complete() calling aio_complete(). But the
framework may be also used for internal FUSE use when initial IO request
was synchronous (from user perspective), but it's beneficial to process it
asynchronously. Then the caller should wait on kiocb explicitly and
aio_complete() will wake the caller up.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-17 21:50:59 +02:00
Maxim Patlasov 796523fb24 fuse: add flag fc->initialized
Existing flag fc->blocked is used to suspend request allocation both in case
of many background request submitted and period of time before init_reply
arrives from userspace. Next patch will skip blocking allocations of
synchronous request (disregarding fc->blocked). This is mostly OK, but
we still need to suspend allocations if init_reply is not arrived yet. The
patch introduces flag fc->initialized which will serve this purpose.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-17 12:31:44 +02:00
Maxim Patlasov 8b41e6715e fuse: make request allocations for background processing explicit
There are two types of processing requests in FUSE: synchronous (via
fuse_request_send()) and asynchronous (via adding to fc->bg_queue).

Fortunately, the type of processing is always known in advance, at the time
of request allocation. This preparatory patch utilizes this fact making
fuse_get_req() aware about the type. Next patches will use it.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-17 12:31:44 +02:00
Eric Wong 634734b63a fuse: allow control of adaptive readdirplus use
For some filesystems (e.g. GlusterFS), the cost of performing a
normal readdir and readdirplus are identical.  Since adaptively
using readdirplus has no benefit for those systems, give
users/filesystems the option to control adaptive readdirplus use.

v2 of this patch incorporates Miklos's suggestion to simplify the code,
as well as improving consistency of macro names and documentation.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-02-07 14:25:44 +01:00
Feng Shuo 4582a4ab2a FUSE: Adapt readdirplus to application usage patterns
Use the same adaptive readdirplus mechanism as NFS:

http://permalink.gmane.org/gmane.linux.nfs/49299

If the user space implementation wants to disable readdirplus
temporarily, it could just return ENOTSUPP. Then kernel will
recall it with readdir.

Signed-off-by: Feng Shuo <steve.shuo.feng@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-31 17:08:11 +01:00
Anatol Pomozov c2132c1bc7 Do not use RCU for current process credentials
Commit c69e8d9c0 added rcu lock to fuse/dir.c It was assuming
that 'task' is some other process but in fact this parameter always
equals to 'current'. Inline this parameter to make it more readable
and remove RCU lock as it is not needed when access current process
credentials.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-31 17:08:10 +01:00
Miklos Szeredi fb05f41f5f fuse: cleanup fuse_direct_io()
Fix the following sparse warnings:

fs/fuse/file.c:1216:43: warning: cast removes address space of expression
fs/fuse/file.c:1216:43: warning: incorrect type in initializer (different address spaces)
fs/fuse/file.c:1216:43:    expected void [noderef] <asn:1>*iov_base
fs/fuse/file.c:1216:43:    got void *<noident>
fs/fuse/file.c:1241:43: warning: cast removes address space of expression
fs/fuse/file.c:1241:43: warning: incorrect type in initializer (different address spaces)
fs/fuse/file.c:1241:43:    expected void [noderef] <asn:1>*iov_base
fs/fuse/file.c:1241:43:    got void *<noident>
fs/fuse/file.c:1267:43: warning: cast removes address space of expression
fs/fuse/file.c:1267:43: warning: incorrect type in initializer (different address spaces)
fs/fuse/file.c:1267:43:    expected void [noderef] <asn:1>*iov_base
fs/fuse/file.c:1267:43:    got void *<noident>

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:28 +01:00
Maxim Patlasov b2430d7567 fuse: add per-page descriptor <offset, length> to fuse_req
The ability to save page pointers along with lengths and offsets in fuse_req
will be useful to cover several iovec-s with a single fuse_req.

Per-request page_offset is removed because anybody who need it can use
req->page_descs[0].offset instead.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:27 +01:00
Maxim Patlasov b111c8c0e3 fuse: categorize fuse_get_req()
The patch categorizes all fuse_get_req() invocations into two categories:
 - fuse_get_req_nopages(fc) - when caller doesn't care about req->pages
 - fuse_get_req(fc, n) - when caller need n page pointers (n > 0)

Adding fuse_get_req_nopages() helps to avoid numerous fuse_get_req(fc, 0)
scattered over code. Now it's clear from the first glance when a caller need
fuse_req with page pointers.

The patch doesn't make any logic changes. In multi-page case, it silly
allocates array of FUSE_MAX_PAGES_PER_REQ page pointers. This will be amended
by future patches.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:25 +01:00
Maxim Patlasov 4250c0668e fuse: general infrastructure for pages[] of variable size
The patch removes inline array of FUSE_MAX_PAGES_PER_REQ page pointers from
fuse_req. Instead of that, req->pages may now point either to small inline
array or to an array allocated dynamically.

This essentially means that all callers of fuse_request_alloc[_nofs] should
pass the number of pages needed explicitly.

The patch doesn't make any logic changes.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:25 +01:00
Anand V. Avati 0b05b18381 fuse: implement NFS-like readdirplus support
This patch implements readdirplus support in FUSE, similar to NFS.
The payload returned in the readdirplus call contains
'fuse_entry_out' structure thereby providing all the necessary inputs
for 'faking' a lookup() operation on the spot.

If the dentry and inode already existed (for e.g. in a re-run of ls -l)
then just the inode attributes timeout and dentry timeout are refreshed.

With a simple client->network->server implementation of a FUSE based
filesystem, the following performance observations were made:

Test: Performing a filesystem crawl over 20,000 files with

sh# time ls -lR /mnt

Without readdirplus:
Run 1: 18.1s
Run 2: 16.0s
Run 3: 16.2s

With readdirplus:
Run 1: 4.1s
Run 2: 3.8s
Run 3: 3.8s

The performance improvement is significant as it avoided 20,000 upcalls
calls (lookup). Cache consistency is no worse than what already is.

Signed-off-by: Anand V. Avati <avati@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:25 +01:00
Eric W. Biederman 499dcf2024 userns: Support fuse interacting with multiple user namespaces
Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data.

The connection between between a fuse filesystem and a fuse daemon is
established when a fuse filesystem is mounted and provided with a file
descriptor the fuse daemon created by opening /dev/fuse.

For now restrict the communication of uids and gids between the fuse
filesystem and the fuse daemon to the initial user namespace.  Enforce
this by verifying the file descriptor passed to the mount of fuse was
opened in the initial user namespace.  Ensuring the mount happens in
the initial user namespace is not necessary as mounts from non-initial
user namespaces are not yet allowed.

In fuse_req_init_context convert the currrent fsuid and fsgid into the
initial user namespace for the request that will be sent to the fuse
daemon.

In fuse_fill_attr convert the uid and gid passed from the fuse daemon
from the initial user namespace into kuids and kgids.

In iattr_to_fattr called from fuse_setattr convert kuids and kgids
into the uids and gids in the initial user namespace before passing
them to the fuse filesystem.

In fuse_change_attributes_common called from fuse_dentry_revalidate,
fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert
the uid and gid from the fuse daemon into a kuid and a kgid to store
on the fuse inode.

By default fuse mounts are restricted to task whose uid, suid, and
euid matches the fuse user_id and whose gid, sgid, and egid matches
the fuse group id.  Convert the user_id and group_id mount options
into kuids and kgids at mount time, and use uid_eq and gid_eq to
compare the in fuse_allow_task.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-14 22:05:33 -08:00
Brian Foster 72d0d248ca fuse: add FUSE_AUTO_INVAL_DATA init flag
FUSE_AUTO_INVAL_DATA is provided to enable updated/auto cache
invalidation logic.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2012-07-18 16:09:40 +02:00
Pavel Shilovsky 45c72cd73c fuse: fix stat call on 32 bit platforms
Now we store attr->ino at inode->i_ino, return attr->ino at the
first time and then return inode->i_ino if the attribute timeout
isn't expired. That's wrong on 32 bit platforms because attr->ino
is 64 bit and inode->i_ino is 32 bit in this case.

Fix this by saving 64 bit ino in fuse_inode structure and returning
it every time we call getattr. Also squash attr->ino into inode->i_ino
explicitly.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2012-05-14 17:06:42 +02:00
Miklos Szeredi 519c6040ce fuse: optimize fallocate on permanent failure
If userspace filesystem doesn't support fallocate, remember this and don't send
request next time.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2012-04-26 10:56:36 +02:00
Linus Torvalds 6733e54b66 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  FUSE: Notifying the kernel of deletion.
  fuse: support ioctl on directories
  fuse: Use kcalloc instead of kzalloc to allocate array
  fuse: llseek optimize SEEK_CUR and SEEK_SET
2012-01-12 12:39:21 -08:00
Al Viro 541af6a074 fuse: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:07 -05:00
John Muir 451d0f5999 FUSE: Notifying the kernel of deletion.
Allows a FUSE file-system to tell the kernel when a file or directory is
deleted. If the specified dentry has the specified inode number, the kernel will
unhash it.

The current 'fuse_notify_inval_entry' does not cause the kernel to clean up
directories that are in use properly, and as a result the users of those
directories see incorrect semantics from the file-system. The error condition
seen when 'fuse_notify_inval_entry' is used to notify of a deleted directory is
avoided when 'fuse_notify_delete' is used instead.

The following scenario demonstrates the difference:
1. User A chdirs into 'testdir' and starts reading 'testfile'.
2. User B rm -rf 'testdir'.
3. User B creates 'testdir'.
4. User C chdirs into 'testdir'.

If you run the above within the same machine on any file-system (including fuse
file-systems), there is no problem: user C is able to chdir into the new
testdir. The old testdir is removed from the dentry tree, but still open by user
A.

If operations 2 and 3 are performed via the network such that the fuse
file-system uses one of the notify functions to tell the kernel that the nodes
are gone, then the following error occurs for user C while user A holds the
original directory open:

muirj@empacher:~> ls /test/testdir
ls: cannot access /test/testdir: No such file or directory

The issue here is that the kernel still has a dentry for testdir, and so it is
requesting the attributes for the old directory, while the file-system is
responding that the directory no longer exists.

If on the other hand, if the file-system can notify the kernel that the
directory is deleted using the new 'fuse_notify_delete' function, then the above
ls will find the new directory as expected.

Signed-off-by: John Muir <john@jmuir.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-12-13 11:58:49 +01:00
Miklos Szeredi b18da0c56e fuse: support ioctl on directories
Multiplexing filesystems may want to support ioctls on the underlying
files and directores (e.g. FS_IOC_{GET,SET}FLAGS).

Ioctl support on directories was missing so add it now.

Reported-by: Antonio SJ Musumeci <bile@landofbile.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-12-13 11:58:49 +01:00
Linus Torvalds 051732bcbe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message
  fuse: mark pages accessed when written to
  fuse: delete dead .write_begin and .write_end aops
  fuse: fix flock
  fuse: fix non-ANSI void function notation
2011-08-24 09:14:42 -07:00
Miklos Szeredi 37fb3a30b4 fuse: fix flock
Commit a9ff4f87 "fuse: support BSD locking semantics" overlooked a
number of issues with supporing flock locks over existing POSIX
locking infrastructure:

  - it's not backward compatible, passing flock(2) calls to userspace
    unconditionally (if userspace sets FUSE_POSIX_LOCKS)

  - it doesn't cater for the fact that flock locks are automatically
    unlocked on file release

  - it doesn't take into account the fact that flock exclusive locks
    (write locks) don't need an fd opened for write.

The last one invalidates the original premise of the patch that flock
locks can be emulated with POSIX locks.

This patch fixes the first two issues.  The last one needs to be fixed
in userspace if the filesystem assumed that a write lock will happen
only on a file operned for write (as in the case of the current fuse
library).

Reported-by: Sebastian Pipping <webmaster@hartwork.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-08-08 16:08:08 +02:00
Josef Bacik 02c24a8218 fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2.  For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:59 -04:00
Miklos Szeredi 07d5f69b45 fuse: reduce size of struct fuse_request
Reduce the size of struct fuse_request by removing cuse_init_out from
the request structure and allocating it dinamically instead.

CC: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-03-21 13:58:05 +01:00
Miklos Szeredi 5a18ec176c fuse: fix hang of single threaded fuseblk filesystem
Single threaded NTFS-3G could get stuck if a delayed RELEASE reply
triggered a DESTROY request via path_put().

Fix this by

 a) making RELEASE requests synchronous, whenever possible, on fuseblk
 filesystems

 b) if not possible (triggered by an asynchronous read/write) then do
 the path_put() in a separate thread with schedule_work().

Reported-by: Oliver Neukum <oneukum@suse.de>
Cc: stable@kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-02-25 14:44:58 +01:00
Miklos Szeredi 02c048b919 fuse: allow batching of FORGET requests
Terje Malmedal reports that a fuse filesystem with 32 million inodes
on a machine with lots of memory can take up to 30 minutes to process
FORGET requests when all those inodes are evicted from the icache.

To solve this, create a BATCH_FORGET request that allows up to about
8000 FORGET requests to be sent in a single message.

This request is only sent if userspace supports interface version 7.16
or later, otherwise fall back to sending individual FORGET messages.

Reported-by: Terje Malmedal <terje.malmedal@usit.uio.no>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-12-07 20:16:56 +01:00
Miklos Szeredi 07e77dca8a fuse: separate queue for FORGET requests
Terje Malmedal reports that a fuse filesystem with 32 million inodes
on a machine with lots of memory can go unresponsive for up to 30
minutes when all those inodes are evicted from the icache.

The reason is that FORGET messages, sent when the inode is evicted,
are queued up together with regular filesystem requests, and while the
huge queue of FORGET messages are processed no other filesystem
operation can proceed.

Since a full fuse request structure is allocated for each inode, these
take up quite a bit of memory as well.

To solve these issues, create a slim 'fuse_forget_link' structure
containing just the minimum of information required to send the FORGET
request and chain these on a separate queue.

When userspace is asking for a request make sure that FORGET and
non-FORGET requests are selected fairly: for each 8 non-FORGET allow
16 FORGET requests.  This will make sure FORGETs do not pile up, yet
other requests are also allowed to proceed while the queued FORGETs
are processed.

Reported-by: Terje Malmedal <terje.malmedal@usit.uio.no>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-12-07 20:16:56 +01:00
Miklos Szeredi 2d45ba381a fuse: add retrieve request
Userspace filesystem can request data to be retrieved from the inode's
mapping.  This request is synchronous and the retrieved data is queued
as a new request.  If the write to the fuse device returns an error
then the retrieve request was not completed and a reply will not be
sent.

Only present pages are returned in the retrieve reply.  Retrieving
stops when it finds a non-present page and only data prior to that is
returned.

This request doesn't change the dirty state of pages.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-07-12 14:41:40 +02:00
Miklos Szeredi a1d75f2582 fuse: add store request
Userspace filesystem can request data to be stored in the inode's
mapping.  This request is synchronous and has no reply.  If the write
to the fuse device returns an error then the store request was not
fully completed (but may have updated some pages).

If the stored data overflows the current file size, then the size is
extended, similarly to a write(2) on the filesystem.

Pages which have been completely stored are marked uptodate.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-07-12 14:41:40 +02:00
Linus Torvalds 003386fff3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  mm: export generic_pipe_buf_*() to modules
  fuse: support splice() reading from fuse device
  fuse: allow splice to move pages
  mm: export remove_from_page_cache() to modules
  mm: export lru_cache_add_*() to modules
  fuse: support splice() writing to fuse device
  fuse: get page reference for readpages
  fuse: use get_user_pages_fast()
  fuse: remove unneeded variable
2010-05-30 09:16:14 -07:00
Christoph Hellwig 7ea8085910 drop unused dentry argument to ->fsync
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:05:02 -04:00
Miklos Szeredi ce534fb052 fuse: allow splice to move pages
When splicing buffers to the fuse device with SPLICE_F_MOVE, try to
move pages from the pipe buffer into the page cache.  This allows
populating the fuse filesystem's cache without ever touching the page
contents, i.e. zero copy read capability.

The following steps are performed when trying to move a page into the
page cache:

 - buf->ops->confirm() to make sure the new page is uptodate
 - buf->ops->steal() to try to remove the new page from it's previous place
 - remove_from_page_cache() on the old page
 - add_to_page_cache_locked() on the new page

If any of the above steps fail (non fatally) then the code falls back
to copying the page.  In particular ->steal() will fail if there are
external references (other than the page cache and the pipe buffer) to
the page.

Also since the remove_from_page_cache() + add_to_page_cache_locked()
are non-atomic it is possible that the page cache is repopulated in
between the two and add_to_page_cache_locked() will fail.  This could
be fixed by creating a new atomic replace_page_cache_page() function.

fuse_readpages_end() needed to be reworked so it works even if
page->mapping is NULL for some or all pages which can happen if the
add_to_page_cache_locked() failed.

A number of sanity checks were added to make sure the stolen pages
don't have weird flags set, etc...  These could be moved into generic
splice/steal code.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-05-25 15:06:07 +02:00
npiggin@suse.de c08d3b0e33 truncate: use new helpers
Update some fs code to make use of new helper functions introduced
in the previous patch. Should be no significant change in behaviour
(except CIFS now calls send_sig under i_lock, via inode_newsize_ok).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-nfs@vger.kernel.org
Cc: Trond.Myklebust@netapp.com
Cc: linux-cifs-client@lists.samba.org
Cc: sfrench@samba.org
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24 08:41:47 -04:00
Csaba Henk 79a9d99434 fuse: add fusectl interface to max_background
Make the max_background and congestion_threshold parameters of a FUSE
mount tunable at runtime by adding the respective knobs to its directory
within the fusectl filesystem.

Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-09-16 14:15:29 +02:00
Csaba Henk 7a6d3c8b30 fuse: make the number of max background requests and congestion threshold tunable
The practical values for these limits depend on the design of the
filesystem server so let userspace set them at initialization time.

Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-07-07 17:28:52 +02:00
John Muir 3b463ae0c6 fuse: invalidation reverse calls
Add notification messages that allow the filesystem to invalidate VFS
caches.

Two notifications are added:

 1) inode invalidation

   - invalidate cached attributes
   - invalidate a range of pages in the page cache (this is optional)

 2) dentry invalidation

   - try to invalidate a subtree in the dentry cache

Care must be taken while accessing the 'struct super_block' for the
mount, as it can go away while an invalidation is in progress.  To
prevent this, introduce a rw-semaphore, that is taken for read during
the invalidation and taken for write in the ->kill_sb callback.

Cc: Csaba Henk <csaba@gluster.com>
Cc: Anand Avati <avati@zresearch.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-06-30 20:12:24 +02:00
Miklos Szeredi e0a43ddcc0 fuse: allow umask processing in userspace
This patch lets filesystems handle masking the file mode on creation.
This is needed if filesystem is using ACLs.

 - The CREATE, MKDIR and MKNOD requests are extended with a "umask"
   parameter.

 - A new FUSE_DONT_MASK flag is added to the INIT request/reply.  With
   this the filesystem may request that the create mode is not masked.

CC: Jean-Pierre André <jean-pierre.andre@wanadoo.fr>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-06-30 20:12:23 +02:00
Tejun Heo 08cbf542bf fuse: export symbols to be used by CUSE
Export the following symbols for CUSE.

fuse_conn_put()
fuse_conn_get()
fuse_conn_kill()
fuse_send_init()
fuse_do_open()
fuse_sync_release()
fuse_direct_io()
fuse_do_ioctl()
fuse_file_poll()
fuse_request_alloc()
fuse_get_req()
fuse_put_request()
fuse_request_send()
fuse_abort_conn()
fuse_dev_release()
fuse_dev_operations

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-04-28 16:56:42 +02:00
Tejun Heo a325f9b922 fuse: update fuse_conn_init() and separate out fuse_conn_kill()
Update fuse_conn_init() such that it doesn't take @sb and move bdi
registration into a separate function.  Also separate out
fuse_conn_kill() from fuse_put_super().

These will be used to implement cuse.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-04-28 16:56:41 +02:00
Miklos Szeredi 8b0797a498 fuse: don't use inode in fuse_sync_release()
Make fuse_sync_release() a generic helper function that doesn't need a
struct inode pointer.  This makes it suitable for use by CUSE.

Change return value of fuse_release_common() from int to void.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-04-28 16:56:39 +02:00
Miklos Szeredi 91fe96b403 fuse: create fuse_do_open() helper for CUSE
Create a helper for sending an OPEN request that doesn't need a struct
inode pointer.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-04-28 16:56:37 +02:00
Miklos Szeredi c7b7143c63 fuse: clean up args in fuse_finish_open() and fuse_release_fill()
Move setting ff->fh, ff->nodeid and file->private_data outside
fuse_finish_open().  Add ->open_flags member to struct fuse_file.

This simplifies the argument passing to fuse_finish_open() and
fuse_release_fill(), and paves the way for creating an open helper
that doesn't need an inode pointer.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-04-28 16:56:37 +02:00
Miklos Szeredi 2106cb1893 fuse: don't use inode in helpers called by fuse_direct_io()
Use ff->fc and ff->nodeid instead of passing down the inode.

This prepares this function for use by CUSE, where the inode is not
owned by a fuse filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-04-28 16:56:37 +02:00
Miklos Szeredi da5e471457 fuse: add members to struct fuse_file
Add new members ->fc and ->nodeid to struct fuse_file.  This will aid
in converting functions for use by CUSE, where the inode is not owned
by a fuse filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-04-28 16:56:36 +02:00
Miklos Szeredi b0be46ebf7 fuse: use struct path in release structure
Use struct path instead of separate dentry and vfsmount in
req->misc.release.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-04-28 16:56:36 +02:00
Al Viro 4269590a72 constify dentry_operations: FUSE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-27 14:44:01 -04:00
Tejun Heo 43901aabd7 fuse: add fuse_conn->release()
Add fuse_conn->release() so that fuse_conn can be embedded in other
structures.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:56 +01:00
Tejun Heo 0d179aa592 fuse: separate out fuse_conn_init() from new_conn()
Separate out fuse_conn_init() from new_conn() and while at it
initialize fuse_conn->entry during conn initialization.

This will be used by CUSE.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo b93f858ab2 fuse: add fuse_ prefix to several functions
Add fuse_ prefix to request_send*() and get_root_inode() as some of
those functions will be exported for CUSE.  With or without CUSE
export, having the function names scoped is a good idea for
debuggability.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo 95668a69a4 fuse: implement poll support
Implement poll support.  Polled files are indexed using kh in a RB
tree rooted at fuse_conn->polled_files.

Client should send FUSE_NOTIFY_POLL notification once after processing
FUSE_POLL which has FUSE_POLL_SCHEDULE_NOTIFY set.  Sending
notification unconditionally after the latest poll or everytime file
content might have changed is inefficient but won't cause malfunction.

fuse_file_poll() can sleep and requires patches from the following
thread which allows f_op->poll() to sleep.

  http://thread.gmane.org/gmane.linux.kernel/726176

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo acf99433d9 fuse: add file kernel handle
The file handle, fuse_file->fh, is opaque value supplied by userland
FUSE server and uniqueness is not guaranteed.  Add file kernel handle,
fuse_file->kh, which is allocated by the kernel on file allocation and
guaranteed to be unique.

This will be used by poll to match notification to the respective file
but can be used for other purposes where unique file handle is
necessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Miklos Szeredi 1729a16c2c fuse: style fixes
Fix coding style errors reported by checkpatch and others.  Uptdate
copyright date to 2008.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:54 +01:00
Tejun Heo 29d434b39c fuse: add include protectors
Add include protectors to include/linux/fuse.h and fs/fuse/fuse_i.h.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-10-16 16:08:57 +02:00
Miklos Szeredi 33670fa296 fuse: nfs export special lookups
Implement the get_parent export operation by sending a LOOKUP request with
".." as the name.

Implement looking up an inode by node ID after it has been evicted from
the cache.  This is done by seding a LOOKUP request with "." as the name
(for all file types, not just directories).

The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to
indicate that it supports these special lookups.

Thanks to John Muir for the original implementation of this feature.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Miklos Szeredi dbd561d236 fuse: add export operations
Implement export_operations, to allow fuse filesystems to be exported to
NFS.  This feature has been in the out-of-tree fuse module, and is widely
used and tested.

It has not been originally merged into mainline, because doing the NFS
export in userspace was thought to be a cleaner and more efficient way of
doing it, than through the kernel.

While that is true, it would also have involved a lot of duplicated effort
at reimplementing NFS exporting (all the different versions of the
protocol).  This effort was unfortunately not undertaken by anyone, so we
are left with doing it the easy but less efficient way.

If this feature goes in, the out-of-tree fuse module can go away,
which would have several advantages:

  - not having to maintain two versions
  - less confusion for users
  - no bugs due to kernel API changes

Comment from hch:
 - Use the same fh_type values as XFS, since we use the same fh encoding.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Miklos Szeredi 78bb6cb9a8 fuse: add flag to turn on big writes
Prior to 2.6.26 fuse only supported single page write requests.  In theory all
fuse filesystem should be able support bigger than 4k writes, as there's
nothing in the API to prevent it.  Unfortunately there's a known case in
NTFS-3G where big writes cause filesystem corruption.  There could also be
other filesystems, where the lack of testing with big write requests would
result in bugs.

To prevent such problems on a kernel upgrade, disable big writes by default,
but let filesystems set a flag to turn it on.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Szabolcs Szakacsits <szaka@ntfs-3g.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-13 08:02:26 -07:00
Miklos Szeredi b48badf013 fuse: fix node ID type
Node ID is 64bit but it is passed as unsigned long to some functions.  This
breakage wasn't noticed, because libfuse uses unsigned long too.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Miklos Szeredi 5c5c5e51b2 fuse: update file size on short read
If the READ request returned a short count, then either

  - cached size is incorrect
  - filesystem is buggy, as short reads are only allowed on EOF

So assume that the size is wrong and refresh it, so that cached read() doesn't
zero fill the missing chunk.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi 3be5a52b30 fuse: support writable mmap
Quoting Linus (3 years ago, FUSE inclusion discussions):

  "User-space filesystems are hard to get right. I'd claim that they
   are almost impossible, unless you limit them somehow (shared
   writable mappings are the nastiest part - if you don't have those,
   you can reasonably limit your problems by limiting the number of
   dirty pages you accept through normal "write()" calls)."

Instead of attempting the impossible, I've just waited for the dirty page
accounting infrastructure to materialize (thanks to Peter Zijlstra and
others).  This nicely solved the biggest problem: limiting the number of pages
used for write caching.

Some small details remained, however, which this largish patch attempts to
address.  It provides a page writeback implementation for fuse, which is
completely safe against VM related deadlocks.  Performance may not be very
good for certain usage patterns, but generally it should be acceptable.

It has been tested extensively with fsx-linux and bash-shared-mapping.

Fuse page writeback design
--------------------------

fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
It copies the contents of the original page, and queues a WRITE request to the
userspace filesystem using this temp page.

The writeback is finished instantly from the MM's point of view: the page is
removed from the radix trees, and the PageDirty and PageWriteback flags are
cleared.

For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
incremented.  The per-bdi writeback count is not decremented until the actual
write completes.

On dirtying the page, fuse waits for a previous write to finish before
proceeding.  This makes sure, there can only be one temporary page used at a
time for one cached page.

This approach is wasteful in both memory and CPU bandwidth, so why is this
complication needed?

The basic problem is that there can be no guarantee about the time in which
the userspace filesystem will complete a write.  It may be buggy or even
malicious, and fail to complete WRITE requests.  We don't want unrelated parts
of the system to grind to a halt in such cases.

Also a filesystem may need additional resources (particularly memory) to
complete a WRITE request.  There's a great danger of a deadlock if that
allocation may wait for the writepage to finish.

Currently there are several cases where the kernel can block on page
writeback:

  - allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
  - page migration
  - throttle_vm_writeout (through NR_WRITEBACK)
  - sync(2)

Of course in some cases (fsync, msync) we explicitly want to allow blocking.
So for these cases new code has to be added to fuse, since the VM is not
tracking writeback pages for us any more.

As an extra safetly measure, the maximum dirty ratio allocated to a single
fuse filesystem is set to 1% by default.  This way one (or several) buggy or
malicious fuse filesystems cannot slow down the rest of the system by hogging
dirty memory.

With appropriate privileges, this limit can be raised through
'/sys/class/bdi/<bdi>/max_ratio'.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi b6f2fcbcfc mm: bdi: expose the BDI object in sysfs for FUSE
Register FUSE's backing_dev_info under sysfs with the name "fuse-MAJOR:MINOR"

Make the fuse control filesystem use s_dev instead of a fuse specific ID.
This makes it easier to match directories under /sys/fs/fuse/connections/ with
directories under /sys/class/bdi, and with actual mounts.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:49 -07:00
Miklos Szeredi d12def1bcb fuse: limit queued background requests
Libfuse basically creates a new thread for each new request.  This is fine for
synchronous requests, which are naturally limited.  However background
requests (especially writepage) can cause a thread creation storm.

To avoid this, limit the number of background requests available to userspace.

This is done by introducing another queue for background requests, and a
counter for the number of "active" requests, which are currently available for
userspace.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:13 -08:00
Miklos Szeredi b57d426445 fuse: save space in struct fuse_req
Move the fields 'dentry' and 'vfsmount' into the request specific union, since
these are only used for the RELEASE request.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:13 -08:00
Miklos Szeredi a6643094e7 fuse: pass open flags to read and write
Some open flags (O_APPEND, O_DIRECT) can be changed with fcntl(F_SETFL, ...)
after open, but fuse currently only sends the flags to userspace in open.

To make it possible to correcly handle changing flags, send the
current value to userspace in each read and write.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi bcb4be809d fuse: fix reading past EOF
Currently reading a fuse file will stop at cached i_size and return
EOF, even though the file might have grown since the attributes were
last updated.

So detect if trying to read past EOF, and refresh the attributes
before continuing with the read.

Thanks to mpb for the report.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00