linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	a0e881b7c1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is not dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...	2012-08-01 10:26:23 -07:00
Al Viro	e4fad8e5d2	consolidate pipe file creation Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:19 +04:00
Cong Wang	2164d33446	pipe: remove KM_USER0 from comments Signed-off-by: Cong Wang <amwang@redhat.com>	2012-07-24 15:27:34 +08:00
Josef Bacik	c3b2da3148	fs: introduce inode operation ->update_time Btrfs has to make sure we have space to allocate new blocks in order to modify the inode, so updating time can fail. We've gotten around this by having our own file_update_time but this is kind of a pain, and Christoph has indicated he would like to make xfs do something different with atime updates. So introduce ->update_time, where we will deal with i_version an a/m/c time updates and indicate which changes need to be made. The normal version just does what it has always done, updates the time and marks the inode dirty, and then filesystems can choose to do something different. I've gone through all of the users of file_update_time and made them check for errors with the exception of the fault code since it's complicated and I wasn't quite sure what to do there, also Jan is going to be pushing the file time updates into page_mkwrite for those who have it so that should satisfy btrfs and make it not a big deal to check the file_update_time() return code in the generic fault path. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-06-01 12:07:25 -04:00
Will Deacon	46ce341b2f	pipe: return -ENOIOCTLCMD instead of -EINVAL on unknown ioctl command As described in commit `07d106d0a` ("vfs: fix up ENOIOCTLCMD error handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl command which they don't understand. Doing so will result in -ENOTTY being returned to userspace, which matches the behaviour of the compat layer if it fails to translate an ioctl command. This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of -EINVAL when passed an unknown ioctl command. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:55 -04:00
Linus Torvalds	9883035ae7	pipes: add a "packetized pipe" mode for writing The actual internal pipe implementation is already really about individual packets (called "pipe buffers"), and this simply exposes that as a special packetized mode. When we are in the packetized mode (marked by O_DIRECT as suggested by Alan Cox), a write() on a pipe will not merge the new data with previous writes, so each write will get a pipe buffer of its own. The pipe buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn will tell the reader side to break the read at that boundary (and throw away any partial packet contents that do not fit in the read buffer). End result: as long as you do writes less than PIPE_BUF in size (so that the pipe doesn't have to split them up), you can now treat the pipe as a packet interface, where each read() system call will read one packet at a time. You can just use a sufficiently big read buffer (PIPE_BUF is sufficient, since bigger than that doesn't guarantee atomicity anyway), and the return value of the read() will naturally give you the size of the packet. NOTE! We do not support zero-sized packets, and zero-sized reads and writes to a pipe continue to be no-ops. Also note that big packets will currently be split at write time, but that the size at which that happens is not really specified (except that it's bigger than PIPE_BUF). Currently that limit is the system page size, but we might want to explicitly support bigger packets some day. The main user for this is going to be the autofs packet interface, allowing us to stop having to care so deeply about exact packet sizes (which have had bugs with 32/64-bit compatibility modes). But user space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will fail with an EINVAL on kernels that do not support this interface. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: stable@kernel.org # needed for systemd/autofs interaction fix Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-04-29 13:12:42 -07:00
Muthu Kumar	b502bd1152	magic.h: move some FS magic numbers into magic.h - Move open-coded filesystem magic numbers into magic.h - Rearrange magic.h so that the filesystem-related constants are grouped together. Signed-off-by: Muthukumar R <muthur@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-23 16:58:31 -07:00
Cong Wang	e8e3c3d66f	fs: remove the second argument of k[un]map_atomic() Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:21 +08:00
Sasha Levin	2ccd4f4d47	pipe: fail cleanly when root tries F_SETPIPE_SZ with big size When a user with the CAP_SYS_RESOURCE cap tries to F_SETPIPE_SZ a pipe with size bigger than kmalloc() can alloc it spits out an ugly warning: ------------[ cut here ]------------ WARNING: at mm/page_alloc.c:2095 __alloc_pages_nodemask+0x5d3/0x7a0() Pid: 733, comm: a.out Not tainted 3.2.0-rc1+ #4 Call Trace: warn_slowpath_common+0x75/0xb0 warn_slowpath_null+0x15/0x20 __alloc_pages_nodemask+0x5d3/0x7a0 __get_free_pages+0x12/0x50 __kmalloc+0x12b/0x150 pipe_set_size+0x75/0x120 pipe_fcntl+0xf8/0x140 do_fcntl+0x2d4/0x410 sys_fcntl+0x66/0xa0 system_call_fastpath+0x16/0x1b ---[ end trace 432f702e6db7b5ee ]--- Instead, make kcalloc() handle the overflow case and fail quietly. [akpm@linux-foundation.org: switch to sizeof(*bufs) for 80-column niceness] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
Al Viro	84b92d39f9	vfs: pipe.c is really non-modular ... so no exitcalls there. Not much would work if pipe(2) would stop working, after all... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:41 -05:00
Pavel Emelyanov	d70ef97baf	fs/pipe.c: add ->statfs callback for pipefs Currently a statfs on a pipe's /proc/<pid>/fd/ link returns -ENOSYS. Wire pipfs up so that the statfs succeeds. This is required by checkpoint-restart in the userspace to make it possible to distinguish pipes from fifos. When we dump information about task's open files we use the /proc/pid/fd directoy's symlinks and the fact that opening any of them gives us exactly the same dentry->inode pair as the original process has. Now if a task we're dumping has opened pipe and fifo we need to detect this and act accordingly. Knowing that an fd with type S_ISFIFO resides on a pipefs is the most precise way. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Reviewed-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-10-31 17:30:51 -07:00
Eric Dumazet	a209dfc7b0	vfs: dont chain pipe/anon/socket on superblock s_inodes list Workloads using pipes and sockets hit inode_sb_list_lock contention. superblock s_inodes list is needed for quota, dirty, pagecache and fsnotify management. pipe/anon/socket fs are clearly not candidates for these. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 12:57:09 -04:00
Tim Chen	423e0ab086	VFS : mount lock scalability for internal mounts For a number of file systems that don't have a mount point (e.g. sockfs and pipefs), they are not marked as long term. Therefore in mntput_no_expire, all locks in vfs_mount lock are taken instead of just local cpu's lock to aggregate reference counts when we release reference to file objects. In fact, only local lock need to have been taken to update ref counts as these file systems are in no danger of going away until we are ready to unregister them. The attached patch marks file systems using kern_mount without mount point as long term. The contentions of vfs_mount lock is now eliminated. Before un-registering such file system, kern_unmount should be called to remove the long term flag and make the mount point ready to be freed. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:08:32 -04:00
Linus Torvalds	28e58ee8ce	Fix broken "pipe: use event aware wakeups" optimization Commit `e462c448fd` ("pipe: use event aware wakeups") optimized the pipe event wakeup calls to avoid wakeups if the events do not match the requested set. However, the optimization was buggy, in that it didn't actually use the correct sets for the events: when we make room for more data to be written, the pipe poll() routine will return both the POLLOUT _and_ POLLWRNORM bits. Similarly for read. And most critically, when a pipe is released, that will potentially result in POLLHUP\|POLLERR (depending on whether it was the last reader or writer), not just the regular POLLIN\|POLLOUT. This bug showed itself as a hung gnome-screensaver-dialog process, stuck forever (or at least until it was poked by a signal or by being traced) in a poll() system call. Cc: Davide Libenzi <davidel@xmailserver.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-20 16:21:59 -08:00
Al Viro	f03c65993b	sanitize vfsmount refcounting changes Instead of splitting refcount between (per-cpu) mnt_count and (SMP-only) mnt_longrefs, make all references contribute to mnt_count again and keep track of how many are longterm ones. Accounting rules for longterm count: * 1 for each fs_struct.root.mnt * 1 for each fs_struct.pwd.mnt * 1 for having non-NULL ->mnt_ns * decrement to 0 happens only under vfsmount lock exclusive That allows nice common case for mntput() - since we can't drop the final reference until after mnt_longterm has reached 0 due to the rules above, mntput() can grab vfsmount lock shared and check mnt_longterm. If it turns out to be non-zero (which is the common case), we know that this is not the final mntput() and can just blindly decrement percpu mnt_count. Otherwise we grab vfsmount lock exclusive and do usual decrement-and-check of percpu mnt_count. For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm(); namespace.c uses the latter in places where we don't already hold vfsmount lock exclusive and opencodes a few remaining spots where we need to manipulate mnt_longterm. Note that we mostly revert the code outside of fs/namespace.c back to what we used to have; in particular, normal code doesn't need to care about two kinds of references, etc. And we get to keep the optimization Nick's variant had bought us... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-16 13:47:07 -05:00
Linus Torvalds	b2034d474b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits) fs: add documentation on fallocate hole punching Gfs2: fail if we try to use hole punch Btrfs: fail if we try to use hole punch Ext4: fail if we try to use hole punch Ocfs2: handle hole punching via fallocate properly XFS: handle hole punching via fallocate properly fs: add hole punching to fallocate vfs: pass struct file to do_truncate on O_TRUNC opens (try #2) fix signedness mess in rw_verify_area() on 64bit architectures fs: fix kernel-doc for dcache::prepend_path fs: fix kernel-doc for dcache::d_validate sanitize ecryptfs ->mount() switch afs move internal-only parts of ncpfs headers to fs/ncpfs switch ncpfs switch 9p pass default dentry_operations to mount_pseudo() switch hostfs switch affs switch configfs ...	2011-01-13 10:27:28 -08:00
Davide Libenzi	e462c448fd	pipe: use event aware wakeups Send the events the wakeup refers to, so that epoll, and even the new poll code in fs/select.c can avoid wakeups if the events do not match the requested set. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:15 -08:00
Al Viro	c74a1cbb3c	pass default dentry_operations to mount_pseudo() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:43 -05:00
Nick Piggin	b3e19d924b	fs: scale mntget/mntput The problem that this patch aims to fix is vfsmount refcounting scalability. We need to take a reference on the vfsmount for every successful path lookup, which often go to the same mount point. The fundamental difficulty is that a "simple" reference count can never be made scalable, because any time a reference is dropped, we must check whether that was the last reference. To do that requires communication with all other CPUs that may have taken a reference count. We can make refcounts more scalable in a couple of ways, involving keeping distributed counters, and checking for the global-zero condition less frequently. - check the global sum once every interval (this will delay zero detection for some interval, so it's probably a showstopper for vfsmounts). - keep a local count and only taking the global sum when local reaches 0 (this is difficult for vfsmounts, because we can't hold preempt off for the life of a reference, so a counter would need to be per-thread or tied strongly to a particular CPU which requires more locking). - keep a local difference of increments and decrements, which allows us to sum the total difference and hence find the refcount when summing all CPUs. Then, keep a single integer "long" refcount for slow and long lasting references, and only take the global sum of local counters when the long refcount is 0. This last scheme is what I implemented here. Attached mounts and process root and working directory references are "long" references, and everything else is a short reference. This allows scalable vfsmount references during path walking over mounted subtrees and unattached (lazy umounted) mounts with processes still running in them. This results in one fewer atomic op in the fastpath: mntget is now just a per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock and non-atomic decrement in the common case. However code is otherwise bigger and heavier, so single threaded performance is basically a wash. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:33 +11:00
Nick Piggin	4b936885ab	fs: improve scalability of pseudo filesystems Regardless of how much we possibly try to scale dcache, there is likely always going to be some fundamental contention when adding or removing children under the same parent. Pseudo filesystems do not seem need to have connected dentries because by definition they are disconnected. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:32 +11:00
Nick Piggin	fb045adb99	fs: dcache reduce branches in lookup path Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/$[^\t ]$->d_op = $.$;/d_set_d_op(\1, \2);/' -e 's/$[^\t ]$\.d_op = $.$;/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	ff0c7d15f9	fs: avoid inode RCU freeing for pseudo fs Pseudo filesystems that don't put inode on RCU list or reachable by rcu-walk dentries do not need to RCU free their inodes. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:26 +11:00
Linus Torvalds	7208364652	Un-inline get_pipe_info() helper function This avoids some include-file hell, and the function isn't really important enough to be inlined anyway. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-28 16:27:19 -08:00
Linus Torvalds	c66fb34794	Export 'get_pipe_info()' to other users And in particular, use it in 'pipe_fcntl()'. The other pipe functions do not need to use the 'careful' version, since they are only ever called for things that are already known to be pipes. The normal read/write/ioctl functions are called through the file operations structures, so if a file isn't a pipe, they'd never get called. But pipe_fcntl() is special, and called directly from the generic fcntl code, and needs to use the same careful function that the splice code is using. Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-28 14:09:57 -08:00
Al Viro	51139adac9	convert get_sb_pseudo() users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:33 -04:00
Christoph Hellwig	85fe4025c6	fs: do not assign default i_ino in new_inode Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Nicolas Kaiser	e5953cbdff	pipe: fix failure to return error code on ->confirm() The arguments were transposed, we want to assign the error code to 'ret', which is being returned. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-21 14:56:33 +02:00
Miklos Szeredi	6db40cf047	pipe: fix check in "set size" fcntl As it stands this check compares the number of pages to the page size. This makes no sense and makes the fcntl fail in almost any sane case. Fix it by checking if nr_pages is not zero (it can become zero only if arg is too big and round_pipe_size() overflows). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Miklos Szeredi	1d862f4122	pipe: fix pipe buffer resizing pipe_set_size() needs to copy pipe bufs from the old circular buffer to the new. The current code gets this wrong in multiple ways, resulting in oops. Test program is available here: http://www.kernel.org/pub/linux/kernel/people/mszeredi/piperesize/ Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Jens Axboe	ff9da691c0	pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface This changes the interface to be based on bytes instead. The API matches that of F_SETPIPE_SZ in that it rounds up the passed in size so that the resulting page array is a power-of-2 in size. The proc file is renamed to /proc/sys/fs/pipe-max-size to reflect this change. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-03 14:54:39 +02:00
Jens Axboe	419f8367ea	pipe: change the privilege required for growing a pipe beyond system max Change it to CAP_SYS_RESOURCE, as that more accurately models what we want to control. Suggested-by: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-03 12:45:28 +02:00
Jens Axboe	6a6ca57de9	pipe: adjust minimum pipe size to 1 page We don't need to pages to guarantee the POSIX requirement that upto a page size write must be atomic to an empty pipe. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-03 12:44:30 +02:00
Jens Axboe	b4ca761577	Merge branch 'master' into for-linus Conflicts: fs/pipe.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-01 12:42:12 +02:00
Linus Torvalds	003386fff3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: mm: export generic_pipe_buf_() to modules fuse: support splice() reading from fuse device fuse: allow splice to move pages mm: export remove_from_page_cache() to modules mm: export lru_cache_add_() to modules fuse: support splice() writing to fuse device fuse: get page reference for readpages fuse: use get_user_pages_fast() fuse: remove unneeded variable	2010-05-30 09:16:14 -07:00
Julia Lawall	cc967be547	fs: Add missing mutex_unlock Add a mutex_unlock missing on the error path. At other exists from the function that return an error flag, the mutex is unlocked, so do the same here. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * mutex_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * mutex_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:03:09 -04:00
Miklos Szeredi	51921cb746	mm: export generic_pipe_buf_*() to modules This is needed by fuse device code which wants to create pipe buffers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-05-26 08:44:22 +02:00
Jens Axboe	b9598db340	pipe: make F_{GET,SET}PIPE_SZ deal with byte sizes Instead of requiring an exact number of pages as the argument and return value, change the API to deal with number of bytes instead. This also relaxes the requirement that the passed in size must result in a power-of-2 page array size. Round up to the nearest power-of-2 automatically and return the resulting size of the pipe on success. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-05-24 19:34:43 +02:00
Jens Axboe	0191f8697b	pipe: F_SETPIPE_SZ should return -EPERM for non-root If the passed in size is larger than what has been set as the system wide limit and the user is not root, we want to return permission denied (not invalid value). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-05-24 19:15:57 +02:00
Jens Axboe	b492e95be0	pipe: set lower and upper limit on max pages in the pipe page array We need at least two to guarantee proper POSIX behaviour, so never allow a smaller limit than that. Also expose a /proc/sys/fs/pipe-max-pages sysctl file that allows root to define a sane upper limit. Make it default to 16 times the default size, which is 16 pages. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-05-21 21:12:52 +02:00
Jens Axboe	35f3d14dbb	pipe: add support for shrinking and growing pipes This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for growing and shrinking the size of a pipe and adjusts pipe.c and splice.c (and relay and network splice) usage to work with these larger (or smaller) pipes. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-05-21 21:12:40 +02:00
Nick Piggin	a3a065e3f1	fs: no games with DCACHE_UNHASHED Filesystems outside the regular namespace do not have to clear DCACHE_UNHASHED in order to have a working /proc/$pid/fd/XXX. Nothing in proc prevents the fd link from being used if its dentry is not in the hash. Also, it does not get put into the dcache hash if DCACHE_UNHASHED is clear; that depends on the filesystem calling d_add or d_rehash. So delete the misleading comments and needless code. Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-17 10:51:40 -05:00
Al Viro	d231412db6	switch create_read_pipe() to alloc_file() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:43 -05:00
Al Viro	2c48b9c455	switch alloc_file() to passing struct path ... and have the caller grab both mnt and dentry; kill leak in infiniband, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:42 -05:00
Earl Chew	ad3960243e	fs: pipe.c null pointer dereference This patch fixes a null pointer exception in pipe_rdwr_open() which generates the stack trace: > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: > [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70 > [<ffffffff8028125c>] __dentry_open+0x13c/0x230 > [<ffffffff8028143d>] do_filp_open+0x2d/0x40 > [<ffffffff802814aa>] do_sys_open+0x5a/0x100 > [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67 The failure mode is triggered by an attempt to open an anonymous pipe via /proc/pid/fd/* as exemplified by this script: ============================================================= while : ; do { echo y ; sleep 1 ; } \| { while read ; do echo z$REPLY; done ; } & PID=$! OUT=$(ps -efl \| grep 'sleep 1' \| grep -v grep \| { read PID REST ; echo $PID; } ) OUT="${OUT%% }" DELAY=$((RANDOM 1000 / 32768)) usleep $((DELAY * 1000 + RANDOM % 1000 )) echo n > /proc/$OUT/fd/1 # Trigger defect done ============================================================= Note that the failure window is quite small and I could only reliably reproduce the defect by inserting a small delay in pipe_rdwr_open(). For example: static int pipe_rdwr_open(struct inode inode, struct file filp) { msleep(100); mutex_lock(&inode->i_mutex); Although the defect was observed in pipe_rdwr_open(), I think it makes sense to replicate the change through all the pipe_*_open() functions. The core of the change is to verify that inode->i_pipe has not been released before attempting to manipulate it. If inode->i_pipe is no longer present, return ENOENT to indicate so. The comment about potentially using atomic_t for i_pipe->readers and i_pipe->writers has also been removed because it is no longer relevant in this context. The inode->i_mutex lock must be used so that inode->i_pipe can be dealt with correctly. Signed-off-by: Earl Chew <earl_chew@agilent.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-22 08:11:44 +09:00
Peter Zijlstra	023d43c7b5	lockdep: Fix lockdep annotation for pipe_double_lock() The presumed use of the pipe_double_lock() routine is to lock 2 locks in a deadlock free way by ordering the locks by their address. However it fails to keep the specified lock classes in order and explicitly annotates a deadlock. Rectify this. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Miklos Szeredi <mszeredi@suse.cz> LKML-Reference: <1248163763.15751.11098.camel@twins>	2009-07-22 21:14:14 +02:00
Miklos Szeredi	6818173bd6	splice: implement default splice_read method If f_op->splice_read() is not implemented, fall back to a plain read. Use vfs_readv() to read into previously allocated pages. This will allow splice and functions using splice, such as the loop device, to work on all filesystems. This includes "direct_io" files in fuse which bypass the page cache. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 14:13:10 +02:00
Miklos Szeredi	61e0d47c33	splice: add helpers for locking pipe inode There are lots of sequences like this, especially in splice code: if (pipe->inode) mutex_lock(&pipe->inode->i_mutex); /* do something */ if (pipe->inode) mutex_unlock(&pipe->inode->i_mutex); so introduce helpers which do the conditional locking and unlocking. Also replace the inode_double_lock() call with a pipe_double_lock() helper to avoid spreading the use of this functionality beyond the pipe code. This patch is just a cleanup, and should cause no behavioral changes. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 12:10:12 +02:00
Linus Torvalds	3ae5080f4c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits) fs: avoid I_NEW inodes Merge code for single and multiple-instance mounts Remove get_init_pts_sb() Move common mknod_ptmx() calls into caller Parse mount options just once and copy them to super block Unroll essentials of do_remount_sb() into devpts vfs: simple_set_mnt() should return void fs: move bdev code out of buffer.c constify dentry_operations: rest constify dentry_operations: configfs constify dentry_operations: sysfs constify dentry_operations: JFS constify dentry_operations: OCFS2 constify dentry_operations: GFS2 constify dentry_operations: FAT constify dentry_operations: FUSE constify dentry_operations: procfs constify dentry_operations: ecryptfs constify dentry_operations: CIFS constify dentry_operations: AFS ...	2009-03-27 16:23:12 -07:00
Al Viro	3ba13d179e	constify dentry_operations: rest Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:03 -04:00
Cheng Renquan	10f303ae1e	do_pipe cleanup: drop its last user in arch/alpha/ The last user of do_pipe is in arch/alpha/, after replacing it with do_pipe_flags, the do_pipe can be totally dropped. Signed-off-by: Cheng Renquan <crquan@gmail.com> Acked-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:58 -04:00

1 2 3

108 Commits