OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jianpeng Ma	5fdb1389e1	ceph: cleanup use of ceph_msg_get Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-09-08 23:14:29 +03:00
Jianpeng Ma	e36d571d70	ceph: no need to get parent inode in ceph_open parent inode is needed in creating new inode case. For ceph_open, the target inode already exists. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-09-08 23:14:29 +03:00
Jianpeng Ma	a43137f7b0	ceph: remove the useless judgement err != 0 is already handled. So skip this. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-09-08 23:14:29 +03:00
Brad Hubbard	1550d34e56	ceph: remove redundant test of head->safe and silence static analysis warnings Signed-off-by: Brad Hubbard <bhubbard@redhat.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-09-08 23:14:29 +03:00
Yan, Zheng	23078637e0	ceph: fix queuing inode to mdsdir's snaprealm During MDS failovers, MClientSnap message may cause kclient to move some inodes from root directory's snaprealm to mdsdir's snaprealm and queue snapshots for these inodes. For a FS has never created any snapshot, both root directory's snaprealm and mdsdir's snaprealm share the same snapshot contexts (both are ceph_empty_snapc). This confuses ceph_put_wrbuffer_cap_refs(), make it unable to distinguish snapshot buffers from head buffers. The fix is do not use ceph_empty_snapc as snaprealm's cached context. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-09-08 23:14:29 +03:00
Yan, Zheng	a341d4df87	ceph: invalidate dirty pages after forced umount After forced umount, ceph_writepages_start() skips flushing dirty pages. To make sure inode's reference count get dropped to zero, we need to invalidate dirty pages. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-09-08 23:14:28 +03:00
Yan, Zheng	48fec5d0a5	ceph: EIO all operations after forced umount This patch makes try_get_cap_refs() and __do_request() check if the file system was forced umount, and return -EIO if it was. This patch also adds a helper function to drops dirty caps and wakes up blocking operation. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-09-08 23:14:28 +03:00
Kees Cook	a068acf2ee	fs: create and use seq_show_option for escaping Many file systems that implement the show_options hook fail to correctly escape their output which could lead to unescaped characters (e.g. new lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This could lead to confusion, spoofed entries (resulting in things like systemd issuing false d-bus "mount" notifications), and who knows what else. This looks like it would only be the root user stepping on themselves, but it's possible weird things could happen in containers or in other situations with delegated mount privileges. Here's an example using overlay with setuid fusermount trusting the contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of "sudo" is something more sneaky: $ BASE="ovl" $ MNT="$BASE/mnt" $ LOW="$BASE/lower" $ UP="$BASE/upper" $ WORK="$BASE/work/ 0 0 none /proc fuse.pwn user_id=1000" $ mkdir -p "$LOW" "$UP" "$WORK" $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt $ cat /proc/mounts none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 none /proc fuse.pwn user_id=1000 0 0 $ fusermount -u /proc $ cat /proc/mounts cat: /proc/mounts: No such file or directory This fixes the problem by adding new seq_show_option and seq_show_option_n helpers, and updating the vulnerable show_option handlers to use them as needed. Some, like SELinux, need to be open coded due to unusual existing escape mechanisms. [akpm@linux-foundation.org: add lost chunk, per Kees] [keescook@chromium.org: seq_show_option should be using const parameters] Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Jan Kara <jack@suse.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Yan, Zheng	fc927cd32f	ceph: always re-send cap flushes when MDS recovers commit `e548e9b93d` makes the kclient only re-send cap flush once during MDS failover. If the kclient sends a cap flush after MDS enters reconnect stage but before MDS recovers. The kclient will skip re-sending the same cap flush when MDS recovers. This causes problem for newly created inode. The MDS handles cap flushes before replaying unsafe requests, so it's possible that MDS find corresponding inode is missing when handling cap flush. The fix is reverting to old behaviour: always re-send when MDS recovers Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-07-31 11:38:53 +03:00
Yan, Zheng	f6762cb2ca	ceph: fix ceph_encode_locks_to_buffer() posix locks should be in ctx->flc_posix list Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-07-31 11:38:47 +03:00
Linus Torvalds	1dc51b8288	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...	2015-07-04 19:36:06 -07:00
Linus Torvalds	0c76c6ba24	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "We have a pile of bug fixes from Ilya, including a few patches that sync up the CRUSH code with the latest from userspace. There is also a long series from Zheng that fixes various issues with snapshots, inline data, and directory fsync, some simplification and improvement in the cap release code, and a rework of the caching of directory contents. To top it off there are a few small fixes and cleanups from Benoit and Hong" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits) rbd: use GFP_NOIO in rbd_obj_request_create() crush: fix a bug in tree bucket decode libceph: Fix ceph_tcp_sendpage()'s more boolean usage libceph: Remove spurious kunmap() of the zero page rbd: queue_depth map option rbd: store rbd_options in rbd_device rbd: terminate rbd_opts_tokens with Opt_err ceph: fix ceph_writepages_start() rbd: bump queue_max_segments ceph: rework dcache readdir crush: sync up with userspace crush: fix crash from invalid 'take' argument ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL ceph: pre-allocate data structure that tracks caps flushing ceph: re-send flushing caps (which are revoked) in reconnect stage ceph: send TID of the oldest pending caps flush to MDS ceph: track pending caps flushing globally ceph: track pending caps flushing accurately libceph: fix wrong name "Ceph filesystem for Linux" ceph: fix directory fsync ...	2015-07-02 11:35:00 -07:00
Yan, Zheng	e1966b4944	ceph: fix ceph_writepages_start() Before a page get locked, someone else can write data to the page and increase the i_size. So we should re-check the i_size after pages are locked. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 18:30:53 +03:00
Yan, Zheng	fdd4e15838	ceph: rework dcache readdir Previously our dcache readdir code relies on that child dentries in directory dentry's d_subdir list are sorted by dentry's offset in descending order. When adding dentries to the dcache, if a dentry already exists, our readdir code moves it to head of directory dentry's d_subdir list. This design relies on dcache internals. Al Viro suggests using ncpfs's approach: keeping array of pointers to dentries in page cache of directory inode. the validity of those pointers are presented by directory inode's complete and ordered flags. When a dentry gets pruned, we clear directory inode's complete flag in the d_prune() callback. Before moving a dentry to other directory, we clear the ordered flag for both old and new directory. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:32 +03:00
Yan, Zheng	687265e5a8	ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL GFP_NOFS memory allocation is required for page writeback path. But there is no need to use GFP_NOFS in syscall path and readpage path Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:31 +03:00
Yan, Zheng	f66fd9f095	ceph: pre-allocate data structure that tracks caps flushing Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:31 +03:00
Yan, Zheng	e548e9b93d	ceph: re-send flushing caps (which are revoked) in reconnect stage if flushing caps were revoked, we should re-send the cap flush in client reconnect stage. This guarantees that MDS processes the cap flush message before issuing the flushing caps to other client. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:31 +03:00
Yan, Zheng	a2971c8ccb	ceph: send TID of the oldest pending caps flush to MDS According to this information, MDS can trim its completed caps flush list (which is used to detect duplicated cap flush). Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:31 +03:00
Yan, Zheng	8310b08913	ceph: track pending caps flushing globally So we know TID of the oldest pending caps flushing. Later patch will send this information to MDS, so that MDS can trim its completed caps flush list. Tracking pending caps flushing globally also simplifies syncfs code. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:31 +03:00
Yan, Zheng	553adfd941	ceph: track pending caps flushing accurately Previously we do not trace accurate TID for flushing caps. when MDS failovers, we have no choice but to re-send all flushing caps with a new TID. This can cause problem because MDS can has already flushed some caps and has issued the same caps to other client. The re-sent cap flush has a new TID, which makes MDS unable to detect if it has already processed the cap flush. This patch adds code to track pending caps flushing accurately. When re-sending cap flush is needed, we use its original flush TID. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:30 +03:00
Yan, Zheng	da819c8150	ceph: fix directory fsync fsync() on directory should flush dirty caps and wait for any uncommitted directory opertions to commit. But ceph_dir_fsync() only waits for uncommitted directory opertions. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:30 +03:00
Yan, Zheng	89b52fe14d	ceph: fix flushing caps Current ceph_fsync() only flushes dirty caps and wait for them to be flushed. It doesn't wait for caps that has already been flushing. This patch makes ceph_fsync() wait for pending flushing caps too. Besides, this patch also makes caps_are_flushed() peroperly handle tid wrapping. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:30 +03:00
Yan, Zheng	41445999ae	ceph: don't include used caps in cap_wanted when copying files to cephfs, file data may stay in page cache after corresponding file is closed. Cached data use Fc capability. If we include Fc capability in cap_wanted, MDS will treat files with cached data as open files, and journal them in an EOpen event when trimming log segment. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:30 +03:00
Yan, Zheng	3e0708b990	ceph: ratelimit warn messages for MDS closes session Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:30 +03:00
Ilya Dryomov	5be7303477	ceph: simplify two mount_timeout sites No need to bifurcate wait now that we've got ceph_timeout_jiffies(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:29 +03:00
Ilya Dryomov	a319bf56a6	libceph: store timeouts in jiffies, verify user input There are currently three libceph-level timeouts that the user can specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of these are in seconds and no checking is done on user input: negative values are accepted, we multiply them all by HZ which may or may not overflow, arbitrarily large jiffies then get added together, etc. There is also a bug in the way mount_timeout=0 is handled. It's supposed to mean "infinite timeout", but that's not how wait.h APIs treat it and so __ceph_open_session() for example will busy loop without much chance of being interrupted if none of ceph-mons are there. Fix all this by verifying user input, storing timeouts capped by msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies() helper for all user-specified waits to handle infinite timeouts correctly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-06-25 11:49:29 +03:00
Yan, Zheng	e8a7b8b12b	ceph: exclude setfilelock requests when calculating oldest tid setfilelock requests can block for a long time, which can prevent client from advancing its oldest tid. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:29 +03:00
Yan, Zheng	745a8e3bcc	ceph: don't pre-allocate space for cap release messages Previously we pre-allocate cap release messages for each caps. This wastes lots of memory when there are large amount of caps. This patch make the code not pre-allocate the cap release messages. Instead, we add the corresponding ceph_cap struct to a list when releasing a cap. Later when flush cap releases is needed, we allocate the cap release messages dynamically. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:29 +03:00
Yan, Zheng	affbc19a68	ceph: make sure syncfs flushes all cap snaps Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:29 +03:00
Yan, Zheng	622f3e250f	ceph: don't trim auth cap when there are cap snaps Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:28 +03:00
Yan, Zheng	604d1b0245	ceph: take snap_rwsem when accessing snap realm's cached_context When ceph inode's i_head_snapc is NULL, __ceph_mark_dirty_caps() accesses snap realm's cached_context. So we need take read lock of snap_rwsem. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:28 +03:00
Yan, Zheng	8605609049	ceph: avoid sending unnessesary FLUSHSNAP message when a snap notification contains no new snapshot, we can avoid sending FLUSHSNAP message to MDS. But we still need to create cap_snap in some case because it's required by write path and page writeback path Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:28 +03:00
Yan, Zheng	5dda377cf0	ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference In most cases that snap context is needed, we are holding reference of CEPH_CAP_FILE_WR. So we can set ceph inode's i_head_snapc when getting the CEPH_CAP_FILE_WR reference, and make codes get snap context from i_head_snapc. This makes the code simpler. Another benefit of this change is that we can handle snap notification more elegantly. Especially when snap context is updated while someone else is doing write. The old queue cap_snap code may set cap_snap's context to ether the old context or the new snap context, depending on if i_head_snapc is set. The new queue capp_snap code always set cap_snap's context to the old snap context. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:28 +03:00
Yan, Zheng	7b06a826e7	ceph: use empty snap context for uninline_data and get_pool_perm Cached_context in ceph_snap_realm is directly accessed by uninline_data() and get_pool_perm(). This is racy in theory. both uninline_data() and get_pool_perm() do not modify existing object, they only create new object. So we can pass the empty snap context to them. Unlike cached_context in ceph_snap_realm, we do not need to protect the empty snap context. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:28 +03:00
Yan, Zheng	10183a6955	ceph: check OSD caps before read/write Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:28 +03:00
Yan, Zheng	144cba1493	libceph: allow setting osd_req_op's flags Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-06-25 11:49:27 +03:00
Al Viro	dc3f4198ea	make simple_positive() public Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-06-23 18:02:01 -04:00
Jan Kara	5fa8e0a1c6	fs: Rename file_remove_suid() to file_remove_privs() file_remove_suid() is a misnomer since it removes also file capabilities stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells something else than whether file_remove_suid() call is necessary which leads to bugs. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-06-23 18:01:08 -04:00
Al Viro	ac194dccd2	ceph: switch to simple_follow_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:18:28 -04:00
Linus Torvalds	9ec3a646fe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only	2015-04-26 17:22:07 -07:00
Linus Torvalds	1204c46445	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "This time around we have a collection of CephFS fixes from Zheng around MDS failure handling and snapshots, support for a new CRUSH straw2 algorithm (to sync up with userspace) and several RBD cleanups and fixes from Ilya, an error path leak fix from Taesoo, and then an assorted collection of cleanups from others" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits) rbd: rbd_wq comment is obsolete libceph: announce support for straw2 buckets crush: straw2 bucket type with an efficient 64-bit crush_ln() crush: ensuring at most num-rep osds are selected crush: drop unnecessary include from mapper.c ceph: fix uninline data function ceph: rename snapshot support ceph: fix null pointer dereference in send_mds_reconnect() ceph: hold on to exclusive caps on complete directories libceph: simplify our debugfs attr macro ceph: show non-default options only libceph: expose client options through debugfs libceph, ceph: split ceph_show_options() rbd: mark block queue as non-rotational libceph: don't overwrite specific con error msgs ceph: cleanup unsafe requests when reconnecting is denied ceph: don't zero i_wrbuffer_ref when reconnecting is denied ceph: don't mark dirty caps when there is no auth cap ceph: keep i_snap_realm while there are writers libceph: osdmap.h: Add missing format newlines ...	2015-04-22 11:30:10 -07:00
Yan, Zheng	ec137c10e7	ceph: fix uninline data function For CEPH_OSD_CMPXATTR_MODE_U64, OSD expects the u64 to be encoded as string in object's xattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-22 18:33:41 +03:00
Yan, Zheng	0ea611a3bc	ceph: rename snapshot support Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-22 18:33:41 +03:00
Yan, Zheng	c0bd50e2ee	ceph: fix null pointer dereference in send_mds_reconnect() sb->s_root can be null when umounting Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-22 18:33:31 +03:00
Yan, Zheng	32ec439775	ceph: hold on to exclusive caps on complete directories If a directory is complete, we want to keep the exclusive cap. So that MDS does not end up revoking the shared cap on every create/unlink operation. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:40 +03:00
Ilya Dryomov	ff7eeb82cc	ceph: show non-default options only Don't pollute /proc/mounts with default options (presently these are dcache, nofsc and acl). Leave the acl/noacl however - it's a bit of a special case due to CONFIG_CEPH_FS_POSIX_ACL. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-04-20 18:55:39 +03:00
Ilya Dryomov	ff40f9ae95	libceph, ceph: split ceph_show_options() Split ceph_show_options() into two pieces and move the piece responsible for printing client (libceph) options into net/ceph. This way people adding a libceph option wouldn't have to remember to update code in fs/ceph. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-04-20 18:55:38 +03:00
Yan, Zheng	1c841a96b5	ceph: cleanup unsafe requests when reconnecting is denied Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:37 +03:00
Yan, Zheng	a9f6eb6185	ceph: don't zero i_wrbuffer_ref when reconnecting is denied remove_session_caps_cb() does not truncate dirty data in page cache, but zeros i_wrbuffer_ref/i_wrbuffer_ref_head. This will result negtive i_wrbuffer_ref/i_wrbuffer_ref_head Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:36 +03:00
Yan, Zheng	571ade336a	ceph: don't mark dirty caps when there is no auth cap No i_auth_cap means reconnecting to MDS was denied. So don't add new dirty caps. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:36 +03:00
Yan, Zheng	db40cc1702	ceph: keep i_snap_realm while there are writers when reconnecting to MDS is denied, we remove session caps forcibly. But it's possible there are ongoing write, the write code needs to reference i_snap_realm. So if there are ongoing write, we keep i_snap_realm. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:35 +03:00
Sanidhya Kashyap	a149bb9a28	ceph: kstrdup() memory handling Currently, there is no check for the kstrdup() for r_path2, r_path1 and snapdir_name as various locations as there is a possibility of failure during memory pressure. Therefore, returning ENOMEM where the checks have been missed. Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:34 +03:00
Taesoo Kim	c1d00b2d9c	ceph: properly release page upon error When ceph_update_writeable_page fails (including -EAGAIN), it unlocks (w/ unlock_page) the page but does not 'release' (w/ page_cache_release) properly. Upon error, properly set *pagep to NULL, indicating an error. Signed-off-by: Taesoo Kim <tsgatesv@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:34 +03:00
Nicholas Mc Guire	57e95460f0	ceph: match wait_for_completion_timeout return type return type of wait_for_completion_timeout is unsigned long not int. An appropriately named unsigned long is added and the assignment fixed up. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 17:30:23 +03:00
Nicholas Mc Guire	3563dbdd99	ceph: use msecs_to_jiffies for time conversion This is only an API consolidation and should make things more readable it replaces var * HZ / 1000 by msecs_to_jiffies(var). Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 17:30:22 +03:00
Fabian Frederick	e1eba3ea02	ceph: remove redundant declaration ceph_aops was already defined extern in addr.c section Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 17:30:22 +03:00
Yan, Zheng	e2c3de046c	ceph: fix dcache/nocache mount option Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 17:30:22 +03:00
Yan, Zheng	6e6f09231a	ceph: drop cap releases in requests composed before cap reconnect These cap releases are stale because MDS will re-establish client caps according to the cap reconnect messages. Note: MDS can detect stale cap messages, so these stale cap releases are harmless even we don't drop them. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 17:30:22 +03:00
David Howells	2b0143b5c9	VFS: normal filesystems (and lustre): d_inode() annotations that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-15 15:06:57 -04:00
Al Viro	2ba48ce513	mirror O_APPEND and O_DIRECT into iocb->ki_flags ... avoiding write_iter/fcntl races. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:30:22 -04:00
Al Viro	3309dd04cb	switch generic_write_checks() to iocb and iter ... returning -E... upon error and amount of data left in iter after (possible) truncation upon success. Note, that normal case gives a non-zero (positive) return value, so any tests for != 0 _must_ be updated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Conflicts: fs/ext4/file.c	2015-04-11 22:30:21 -04:00
Al Viro	0fa6b005af	generic_write_checks(): drop isblk argument all remaining callers are passing 0; some just obscure that fact. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:29:48 -04:00
Omar Sandoval	22c6186ece	direct_IO: remove rw from a_ops->direct_IO() Now that no one is using rw, remove it completely. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:29:45 -04:00
Al Viro	5d5d568975	make new_sync_{read,write}() static All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:29:40 -04:00
Al Viro	c0fec3a98b	Merge branch 'iocb' into for-next	2015-04-11 22:24:41 -04:00
Christoph Hellwig	e2e40f2c1e	fs: move struct kiocb to fs.h struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-25 20:28:11 -04:00
Christoph Hellwig	66ee59af63	fs: remove ki_nbytes There is no need to pass the total request length in the kiocb, as we already get passed in through the iov_iter argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-12 23:50:23 -04:00
Linus Torvalds	be5e6616dd	Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted stuff from this cycle. The big ones here are multilayer overlayfs from Miklos and beginning of sorting ->d_inode accesses out from David" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits) autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation procfs: fix race between symlink removals and traversals debugfs: leave freeing a symlink body until inode eviction Documentation/filesystems/Locking: ->get_sb() is long gone trylock_super(): replacement for grab_super_passive() fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) SELinux: Use d_is_positive() rather than testing dentry->d_inode Smack: Use d_is_positive() rather than testing dentry->d_inode TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR() Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb VFS: Split DCACHE_FILE_TYPE into regular and special types VFS: Add a fallthrough flag for marking virtual dentries VFS: Add a whiteout dentry type VFS: Introduce inode-getting helpers for layered/unioned fs environments Infiniband: Fix potential NULL d_inode dereference posix_acl: fix reference leaks in posix_acl_create autofs4: Wrong format for printing dentry ...	2015-02-22 17:42:14 -08:00
David Howells	e36cb0b89c	VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_(dentry) Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' \|') \|\| die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") \|\| die $coccifile; print($fd "$_\n") \|\| die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 \|\| die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-02-22 11:38:41 -05:00
Linus Torvalds	4533f6e27a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "On the RBD side, there is a conversion to blk-mq from Christoph, several long-standing bug fixes from Ilya, and some cleanup from Rickard Strandqvist. On the CephFS side there is a long list of fixes from Zheng, including improved session handling, a few IO path fixes, some dcache management correctness fixes, and several blocking while !TASK_RUNNING fixes. The core code gets a few cleanups and Chaitanya has added support for TCP_NODELAY (which has been used on the server side for ages but we somehow missed on the kernel client). There is also an update to MAINTAINERS to fix up some email addresses and reflect that Ilya and Zheng are doing most of the maintenance for RBD and CephFS these days. Do not be surprised to see a pull request come from one of them in the future if I am unavailable for some reason" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) MAINTAINERS: update Ceph and RBD maintainers libceph: kfree() in put_osd() shouldn't depend on authorizer libceph: fix double __remove_osd() problem rbd: convert to blk-mq ceph: return error for traceless reply race ceph: fix dentry leaks ceph: re-send requests when MDS enters reconnecting stage ceph: show nocephx_require_signatures and notcp_nodelay options libceph: tcp_nodelay support rbd: do not treat standalone as flatten ceph: fix atomic_open snapdir ceph: properly mark empty directory as complete client: include kernel version in client metadata ceph: provide seperate {inode,file}_operations for snapdir ceph: fix request time stamp encoding ceph: fix reading inline data when i_size > PAGE_SIZE ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) rbd: fix error paths in rbd_dev_refresh() ...	2015-02-19 14:14:42 -08:00
Yan, Zheng	4d41cef279	ceph: return error for traceless reply race When we receives traceless reply for request that created new inode, we re-send a lookup request to MDS get information of the newly created inode. (VFS expects FS' callback return an inode in create case) This breaks one request into two requests. Other client may modify or move to the new inode in the middle. When the race happens, ceph_handle_notrace_create() unconditionally links the dentry for 'create' operation to the inode returned by lookup. This may confuse VFS when the inode is a directory (VFS does not allow multiple linkages for directory inode). This patch makes ceph_handle_notrace_create() when it detect a race. This event should be rare and it happens only when we talk to old MDS. Recent MDS does not send traceless reply for request that creates new inode. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:40 +03:00
Yan, Zheng	5cba372c0f	ceph: fix dentry leaks Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:40 +03:00
Yan, Zheng	3de22be677	ceph: re-send requests when MDS enters reconnecting stage So that MDS can check if any request is already completed and process completed requests in clientreplay stage. When completed requests are processed in clientreplay stage, MDS can avoid sending traceless replies. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:40 +03:00
Ilya Dryomov	2a0b61cefc	ceph: show nocephx_require_signatures and notcp_nodelay options Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2015-02-19 13:31:40 +03:00
Yan, Zheng	bf91c31508	ceph: fix atomic_open snapdir ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value and creates snapdir inode if it's -ENOENT Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:39 +03:00
Yan, Zheng	2f92b3d0a9	ceph: properly mark empty directory as complete ceph_add_cap() calls __check_cap_issue(), which clears directory inode' complete flag. so we should set the complete flag for empty directory should be set after calling ceph_add_cap(). Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:39 +03:00
Yan, Zheng	a6a5ce4f0d	client: include kernel version in client metadata Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:39 +03:00
Yan, Zheng	38c48b5f0a	ceph: provide seperate {inode,file}_operations for snapdir remove all unsupported operations from {inode,file}_operations. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:39 +03:00
Yan, Zheng	1f041a89b4	ceph: fix request time stamp encoding struct timespec uses 'long' to present second and nanosecond. 'long' is 64 bits on 64bits machine. ceph MDS expects time stamp to be encoded as struct ceph_timespec, which uses 'u32' to present second and nanosecond. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:39 +03:00
Yan, Zheng	fcc02d2a03	ceph: fix reading inline data when i_size > PAGE_SIZE when inode has inline data but its size > PAGE_SIZE (it was truncated to larger size), previous direct read code return -EIO. This patch adds code to return zeros for data whose offset > PAGE_SIZE. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:39 +03:00
Yan, Zheng	86d8f67b26	ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) use an atomic variable to track number of sessions, this can avoid block operation inside wait loops. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:38 +03:00
Yan, Zheng	c4d4a582c5	ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) we should not do block operation in wait_event_interruptible()'s condition check function, but reading inline data can block. so move the read inline data code to ceph_get_caps() Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:38 +03:00
Yan, Zheng	d3383a8e37	ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) check_cap_flush() calls mutex_lock(), which may block. So we can't use it as condition check function for wait_event(); Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:38 +03:00
Yan, Zheng	982d6011bc	ceph: improve reference tracking for snaprealm When snaprealm is created, its initial reference count is zero. But in some rare cases, the newly created snaprealm is not referenced by anyone. This causes snaprealm with zero reference count not freed. The fix is set reference count of newly snaprealm to 1. The reference is return the function who requests to create the snaprealm. When the function finishes its job, it releases the reference. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:38 +03:00
Yan, Zheng	1487a688d8	ceph: properly zero data pages for file holes. A bug is found in striped_read() of fs/ceph/file.c. striped_read() calls ceph_zero_pape_vector_range(). The first argument, page_align + read + ret, passed to ceph_zero_pape_vector_range() is wrong. When a file has holes, this wrong parameter may cause memory corruption either in kernal space or user space. Kernel space memory may be corrupted in the case of non direct IO; user space memory may be corrupted in the case of direct IO. In the latter case, the application doing direct IO may crash due to memory corruption, as we have experienced. The correct value should be initial_align + read + ret, where intial_align = o_direct ? buf_align : io_align. Compared with page_align, the current page offest, initial_align is the initial page offest, which should be used to calculate the page and offset in ceph_zero_pape_vector_range(). Reported-by: caifeng zhu <zhucaifeng@unissoft-nj.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:38 +03:00
Rickard Strandqvist	671762f807	ceph: acl: Remove unused function Remove the function ceph_get_cached_acl() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Reviewed-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:38 +03:00
Yan, Zheng	03f4fcb028	ceph: handle SESSION_FORCE_RO message mark session as readonly and wake up all cap waiters. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:37 +03:00
Jeff Layton	e084c1bd40	Revert "locks: keep a count of locks on the flctx lists" This reverts commit `9bd0f45b70`. Linus rightly pointed out that I failed to initialize the counters when adding them, so they don't work as expected. Just revert this patch for now. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>	2015-02-16 14:32:03 -05:00
Linus Torvalds	6bec003528	Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block Pull backing device changes from Jens Axboe: "This contains a cleanup of how the backing device is handled, in preparation for a rework of the life time rules. In this part, the most important change is to split the unrelated nommu mmap flags from it, but also removing a backing_dev_info pointer from the address_space (and inode), and a cleanup of other various minor bits. Christoph did all the work here, I just fixed an oops with pages that have a swap backing. Arnd fixed a missing export, and Oleg killed the lustre backing_dev_info from staging. Last patch was from Al, unexporting parts that are now no longer needed outside" * 'for-3.20/bdi' of git://git.kernel.dk/linux-block: Make super_blocks and sb_lock static mtd: export new mtd_mmap_capabilities fs: make inode_to_bdi() handle NULL inode staging/lustre/llite: get rid of backing_dev_info fs: remove default_backing_dev_info fs: don't reassign dirty inodes to default_backing_dev_info nfs: don't call bdi_unregister ceph: remove call to bdi_unregister fs: remove mapping->backing_dev_info fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info nilfs2: set up s_bdi like the generic mount_bdev code block_dev: get bdev inode bdi directly from the block device block_dev: only write bdev inode on close fs: introduce f_op->mmap_capabilities for nommu mmap support fs: kill BDI_CAP_SWAP_BACKED fs: deduplicate noop_backing_dev_info	2015-02-12 13:50:21 -08:00
Linus Torvalds	992de5a8ec	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "Bite-sized chunks this time, to avoid the MTA ratelimiting woes. - fs/notify updates - ocfs2 - some of MM" That laconic "some MM" is mainly the removal of remap_file_pages(), which is a big simplification of the VM, and which gets rid of a lot of random cruft and special cases because we no longer support the non-linear mappings that it used. From a user interface perspective, nothing has changed, because the remap_file_pages() syscall still exists, it's just done by emulating the old behavior by creating a lot of individual small mappings instead of one non-linear one. The emulation is slower than the old "native" non-linear mappings, but nobody really uses or cares about remap_file_pages(), and simplifying the VM is a big advantage. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits) memcg: zap memcg_slab_caches and memcg_slab_mutex memcg: zap memcg_name argument of memcg_create_kmem_cache memcg: zap __memcg_{charge,uncharge}_slab mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check mm: hugetlb: fix type of hugetlb_treat_as_movable variable mm, hugetlb: remove unnecessary lower bound on sysctl handlers"? mm: memory: merge shared-writable dirtying branches in do_wp_page() mm: memory: remove ->vm_file check on shared writable vmas xtensa: drop _PAGE_FILE and pte_file()-related helpers x86: drop _PAGE_FILE and pte_file()-related helpers unicore32: drop pte_file()-related helpers um: drop _PAGE_FILE and pte_file()-related helpers tile: drop pte_file()-related helpers sparc: drop pte_file()-related helpers sh: drop _PAGE_FILE and pte_file()-related helpers score: drop _PAGE_FILE and pte_file()-related helpers s390: drop pte_file()-related helpers parisc: drop _PAGE_FILE and pte_file()-related helpers openrisc: drop _PAGE_FILE and pte_file()-related helpers nios2: drop _PAGE_FILE and pte_file()-related helpers ...	2015-02-10 16:45:56 -08:00
Kirill A. Shutemov	d83a08db5b	mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub Nobody uses it anymore. [akpm@linux-foundation.org: fix filemap_xip.c] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-10 14:30:30 -08:00
Christoph Hellwig	df0ce26cb4	fs: remove default_backing_dev_info Now that default_backing_dev_info is not used for writeback purposes we can git rid of it easily: - instead of using it's name for tracing unregistered bdi we just use "unknown" - btrfs and ceph can just assign the default read ahead window themselves like several other filesystems already do. - we can assign noop_backing_dev_info as the default one in alloc_super. All filesystems already either assigned their own or noop_backing_dev_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-01-20 14:05:38 -07:00
Christoph Hellwig	e4d2750909	ceph: remove call to bdi_unregister bdi_destroy already does all the work, and if we delay freeing the anon bdev we can get away with just that single call. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-01-20 14:03:07 -07:00
Christoph Hellwig	b83ae6d421	fs: remove mapping->backing_dev_info Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-01-20 14:03:05 -07:00
Christoph Hellwig	de1414a654	fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info Now that we got rid of the bdi abuse on character devices we can always use sb->s_bdi to get at the backing_dev_info for a file, except for the block device special case. Export inode_to_bdi and replace uses of mapping->backing_dev_info with it to prepare for the removal of mapping->backing_dev_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-01-20 14:03:04 -07:00
Jeff Layton	9bd0f45b70	locks: keep a count of locks on the flctx lists This makes things a bit more efficient in the cifs and ceph lock pushing code. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 16:08:50 -05:00
Jeff Layton	6109c85037	locks: add a dedicated spinlock to protect i_flctx lists We can now add a dedicated spinlock without expanding struct inode. Change to using that to protect the various i_flctx lists. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 16:08:49 -05:00
Jeff Layton	bd61e0a9c8	locks: convert posix locks to file_lock_context Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 16:08:16 -05:00
Jeff Layton	5263e31e45	locks: move flock locks to file_lock_context Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 15:09:25 -05:00
Jeff Layton	c362781cad	ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks There is only a single call site for each of these functions, and the caller takes the i_lock prior to calling them and drops it just afterward. Move the spinlocking into the functions instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 15:09:25 -05:00

1 2 3 4 5 ...

1196 Commits