OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	992de5a8ec	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "Bite-sized chunks this time, to avoid the MTA ratelimiting woes. - fs/notify updates - ocfs2 - some of MM" That laconic "some MM" is mainly the removal of remap_file_pages(), which is a big simplification of the VM, and which gets rid of a lot of random cruft and special cases because we no longer support the non-linear mappings that it used. From a user interface perspective, nothing has changed, because the remap_file_pages() syscall still exists, it's just done by emulating the old behavior by creating a lot of individual small mappings instead of one non-linear one. The emulation is slower than the old "native" non-linear mappings, but nobody really uses or cares about remap_file_pages(), and simplifying the VM is a big advantage. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits) memcg: zap memcg_slab_caches and memcg_slab_mutex memcg: zap memcg_name argument of memcg_create_kmem_cache memcg: zap __memcg_{charge,uncharge}_slab mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check mm: hugetlb: fix type of hugetlb_treat_as_movable variable mm, hugetlb: remove unnecessary lower bound on sysctl handlers"? mm: memory: merge shared-writable dirtying branches in do_wp_page() mm: memory: remove ->vm_file check on shared writable vmas xtensa: drop _PAGE_FILE and pte_file()-related helpers x86: drop _PAGE_FILE and pte_file()-related helpers unicore32: drop pte_file()-related helpers um: drop _PAGE_FILE and pte_file()-related helpers tile: drop pte_file()-related helpers sparc: drop pte_file()-related helpers sh: drop _PAGE_FILE and pte_file()-related helpers score: drop _PAGE_FILE and pte_file()-related helpers s390: drop pte_file()-related helpers parisc: drop _PAGE_FILE and pte_file()-related helpers openrisc: drop _PAGE_FILE and pte_file()-related helpers nios2: drop _PAGE_FILE and pte_file()-related helpers ...	2015-02-10 16:45:56 -08:00
Kirill A. Shutemov	d83a08db5b	mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub Nobody uses it anymore. [akpm@linux-foundation.org: fix filemap_xip.c] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-10 14:30:30 -08:00
Christoph Hellwig	df0ce26cb4	fs: remove default_backing_dev_info Now that default_backing_dev_info is not used for writeback purposes we can git rid of it easily: - instead of using it's name for tracing unregistered bdi we just use "unknown" - btrfs and ceph can just assign the default read ahead window themselves like several other filesystems already do. - we can assign noop_backing_dev_info as the default one in alloc_super. All filesystems already either assigned their own or noop_backing_dev_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-01-20 14:05:38 -07:00
Christoph Hellwig	e4d2750909	ceph: remove call to bdi_unregister bdi_destroy already does all the work, and if we delay freeing the anon bdev we can get away with just that single call. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-01-20 14:03:07 -07:00
Christoph Hellwig	b83ae6d421	fs: remove mapping->backing_dev_info Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-01-20 14:03:05 -07:00
Christoph Hellwig	de1414a654	fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info Now that we got rid of the bdi abuse on character devices we can always use sb->s_bdi to get at the backing_dev_info for a file, except for the block device special case. Export inode_to_bdi and replace uses of mapping->backing_dev_info with it to prepare for the removal of mapping->backing_dev_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-01-20 14:03:04 -07:00
Jeff Layton	9bd0f45b70	locks: keep a count of locks on the flctx lists This makes things a bit more efficient in the cifs and ceph lock pushing code. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 16:08:50 -05:00
Jeff Layton	6109c85037	locks: add a dedicated spinlock to protect i_flctx lists We can now add a dedicated spinlock without expanding struct inode. Change to using that to protect the various i_flctx lists. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 16:08:49 -05:00
Jeff Layton	bd61e0a9c8	locks: convert posix locks to file_lock_context Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 16:08:16 -05:00
Jeff Layton	5263e31e45	locks: move flock locks to file_lock_context Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 15:09:25 -05:00
Jeff Layton	c362781cad	ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks There is only a single call site for each of these functions, and the caller takes the i_lock prior to calling them and drops it just afterward. Move the spinlocking into the functions instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 15:09:25 -05:00
Ilya Dryomov	0668ff52e2	ceph: use %zu for len in ceph_fill_inline_data() len is size_t, should be printed with %zu. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2015-01-08 20:36:56 +03:00
Linus Torvalds	44e8967d59	Ceph: remove left-over reject file Neither Sage nor I noticed that Zheng Yan had mistakenly committed fs/ceph/super.h.rej as part of commit `31c542a199` ("ceph: add inline data to pagecache"). Remove it. Requested-by: Yan, Zheng <ukernel@gmail.com> Cc: Sage Weil <sweil@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-17 18:47:01 -08:00
Linus Torvalds	57666509b7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The big item here is support for inline data for CephFS and for message signatures from Zheng. There are also several bug fixes, including interrupted flock request handling, 0-length xattrs, mksnap, cached readdir results, and a message version compat field. Finally there are several cleanups from Ilya, Dan, and Markus. Note that there is another series coming soon that fixes some bugs in the RBD 'lingering' requests, but it isn't quite ready yet" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) ceph: fix setting empty extended attribute ceph: fix mksnap crash ceph: do_sync is never initialized libceph: fixup includes in pagelist.h ceph: support inline data feature ceph: flush inline version ceph: convert inline data to normal data before data write ceph: sync read inline data ceph: fetch inline data when getting Fcr cap refs ceph: use getattr request to fetch inline data ceph: add inline data to pagecache ceph: parse inline data in MClientReply and MClientCaps libceph: specify position of extent operation libceph: add CREATE osd operation support libceph: add SETXATTR/CMPXATTR osd operations support rbd: don't treat CEPH_OSD_OP_DELETE as extent op ceph: remove unused stringification macros libceph: require cephx message signature by default ceph: introduce global empty snap context ceph: message versioning fixes ...	2014-12-17 16:03:12 -08:00
Yan, Zheng	0aeff37aba	ceph: fix setting empty extended attribute make sure 'value' is not null. otherwise __ceph_setxattr will remove the extended attribute. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-12-17 20:18:49 +03:00
Yan, Zheng	275dd19ea4	ceph: fix mksnap crash mksnap reply only contain 'target', does not contain 'dentry'. So it's wrong to use req->r_reply_info.head->is_dentry to detect traceless reply. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-12-17 20:09:53 +03:00
Dan Carpenter	021b77bee2	ceph: do_sync is never initialized Probably this code was syncing a lot more often then intended because the do_sync variable wasn't set to zero. Cc: stable@vger.kernel.org # v3.11+ Fixes: `c62988ec09` ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:53 +03:00
Yan, Zheng	65a22662bf	ceph: support inline data feature Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:53 +03:00
Yan, Zheng	e20d258d73	ceph: flush inline version After converting inline data to normal data, client need to flush the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes cap messages (sent to MDS) contain inline_version and inline_data. Client always converts inline data to normal data before data write, so the inline data length part is always zero. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:53 +03:00
Yan, Zheng	28127bdd2f	ceph: convert inline data to normal data before data write Before any data write, convert inline data to normal data and set i_inline_version to CEPH_INLINE_NONE. The OSD request that saves inline data to object contains 3 operations (CMPXATTR, WRITE and SETXATTR). It compares a xattr named 'inline_version' to prevent old data overwrites newer data. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	83701246ae	ceph: sync read inline data we can't use getattr to fetch inline data while holding Fr cap, because it can cause deadlock. If we need to sync read inline data, drop cap refs first, then use getattr to fetch inline data. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	3738daa68a	ceph: fetch inline data when getting Fcr cap refs we can't use getattr to fetch inline data after getting Fcr caps, because it can cause deadlock. The solution is try bringing inline data to page cache when not holding any cap, and hope the inline data page is still there after getting the Fcr caps. If the page is still there, pin it in page cache for later IO. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	01deead041	ceph: use getattr request to fetch inline data Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data in getattr reply will be copied to the page. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	31c542a199	ceph: add inline data to pagecache Request reply and cap message can contain inline data. add inline data to the page cache if there is Fc cap. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	fb01d1f8b0	ceph: parse inline data in MClientReply and MClientCaps Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	715e4cd405	libceph: specify position of extent operation allow specifying position of extent operation in multi-operations osd request. This is required for cephfs to convert inline data to normal data (compare xattr, then write object). Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:52 +03:00
Ilya Dryomov	ca3995ad13	ceph: remove unused stringification macros These were used to report git versions a long time ago. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:51 +03:00
Yan, Zheng	97c85a828f	ceph: introduce global empty snap context Current snaphost code does not properly handle moving inode from one empty snap realm to another empty snap realm. After changing inode's snap realm, some dirty pages' snap context can be not equal to inode's i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs() The fix is introduce a global empty snap context for all empty snap realm. This avoids triggering the BUG() for filesystem with no snapshot. Fixes: http://tracker.ceph.com/issues/9928 Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:51 +03:00
John Spray	7cfa0313d0	ceph: message versioning fixes There were two places we were assigning version in host byte order instead of network byte order. Also in MSG_CLIENT_SESSION we weren't setting compat_version in the header to reflect continued compatability with older MDSs. Fixes: http://tracker.ceph.com/issues/9945 Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-12-17 20:09:51 +03:00
Yan, Zheng	33d0733796	libceph: message signature support Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:50 +03:00
SF Markus Elfring	e96a650a81	ceph, rbd: delete unnecessary checks before two function calls The functions ceph_put_snap_context() and iput() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [idryomov@redhat.com: squashed rbd.c hunk, changelog] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:50 +03:00
Yan, Zheng	70db4f3629	ceph: introduce a new inode flag indicating if cached dentries are ordered After creating/deleting/renaming file, offsets of sibling dentries may change. So we can not use cached dentries to satisfy readdir. But we can still use the cached dentries to conclude -ENOENT for lookup. This patch introduces a new inode flag indicating if child dentries are ordered. The flag is set at the same time marking a directory complete. After creating/deleting/renaming file, we clear the flag on directory inode. This prevents ceph_readdir() from using cached dentries to satisfy readdir syscall. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:50 +03:00
Yan, Zheng	9280be24dc	ceph: fix file lock interruption When a lock operation is interrupted, current code sends a unlock request to MDS to undo the lock operation. This method does not work as expected because the unlock request can drop locks that have already been acquired. The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR requests to interrupt blocked file lock request. These requests do not drop locks that have alread been acquired, they only interrupt blocked file lock request. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:49 +03:00
Al Viro	ba00410b81	Merge branch 'iov_iter' into for-next	2014-12-08 20:39:29 -05:00
Al Viro	b583043e99	kill f_dentry uses Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-19 13:01:25 -05:00
Al Viro	a455589f18	assorted conversions to %p[dD] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-19 13:01:20 -05:00
Al Viro	41d28bca2d	switch d_materialise_unique() users to d_splice_alias() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-19 13:01:20 -05:00
Yan, Zheng	3231300bb9	ceph: fix flush tid comparision TID of cap flush ack is 64 bits, but ceph_inode_info::flushing_cap_tid is only 16 bits. 16 bits should be plenty to let the cap flush updates pipeline appropriately, but we need to cast in the proper direction when comparing these differently-sized versions. So downcast the 64-bits one to 16 bits. Reflects ceph.git commit a5184cf46a6e867287e24aeb731634828467cd98. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>	2014-11-13 22:19:05 +03:00
Al Viro	946e51f2bf	move d_rcu from overlapping d_child to overlapping d_alias Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-03 15:20:29 -05:00
Linus Torvalds	6b04908166	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is the long-awaited discard support for RBD (Guangliang Zhao, Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's (Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and performance and debugging improvements (Yan, Zheng, John Spray), and a smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits) ceph: fix divide-by-zero in __validate_layout() rbd: rbd workqueues need a resque worker libceph: ceph-msgr workqueue needs a resque worker ceph: fix bool assignments libceph: separate multiple ops with commas in debugfs output libceph: sync osd op definitions in rados.h libceph: remove redundant declaration ceph: additional debugfs output ceph: export ceph_session_state_name function ceph: include the initial ACL in create/mkdir/mknod MDS requests ceph: use pagelist to present MDS request data libceph: reference counting pagelist ceph: fix llistxattr on symlink ceph: send client metadata to MDS ceph: remove redundant code for max file size verification ceph: remove redundant io_iter_advance() ceph: move ceph_find_inode() outside the s_mutex ceph: request xattrs if xattr_version is zero rbd: set the remaining discard properties to enable support rbd: use helpers to handle discard for layered images correctly ...	2014-10-15 06:46:01 +02:00
Yan, Zheng	0bc62284ee	ceph: fix divide-by-zero in __validate_layout() The 'stripe_unit' field is 64 bits, casting it to 32 bits can result zero. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 12:57:05 -07:00
Fabian Frederick	ab6c2c3ebe	ceph: fix bool assignments Fix some coccinelle warnings: fs/ceph/caps.c:2400:6-10: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2401:6-15: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2402:6-17: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2403:6-22: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2404:6-22: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2405:6-19: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2440:4-20: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2469:3-16: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2490:2-18: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2519:3-7: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2549:3-12: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2575:2-6: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2589:3-7: WARNING: Assignment of bool to 0/1 Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2014-10-14 12:57:04 -07:00
John Spray	14ed97033d	ceph: additional debugfs output MDS session state and client global ID is useful instrumentation when testing. Signed-off-by: John Spray <john.spray@redhat.com>	2014-10-14 12:57:01 -07:00
John Spray	a687ecaf50	ceph: export ceph_session_state_name function ...so that it can be used from the ceph debugfs code when dumping session info. Signed-off-by: John Spray <john.spray@redhat.com>	2014-10-14 12:56:50 -07:00
Yan, Zheng	b1ee94aa59	ceph: include the initial ACL in create/mkdir/mknod MDS requests Current code set new file/directory's initial ACL in a non-atomic manner. Client first sends request to MDS to create new file/directory, then set the initial ACL after the new file/directory is successfully created. The fix is include the initial ACL in create/mkdir/mknod MDS requests. So MDS can handle creating file/directory and setting the initial ACL in one request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:49 -07:00
Yan, Zheng	25e6bae356	ceph: use pagelist to present MDS request data Current code uses page array to present MDS request data. Pages in the array are allocated/freed by caller of ceph_mdsc_do_request(). If request is interrupted, the pages can be freed while they are still being used by the request message. The fix is use pagelist to present MDS request data. Pagelist is reference counted. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:49 -07:00
Yan, Zheng	e4339d28f6	libceph: reference counting pagelist this allow pagelist to present data that may be sent multiple times. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:48 -07:00
Yan, Zheng	0abb43dcac	ceph: fix llistxattr on symlink only regular file and directory have vxattrs. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 12:56:48 -07:00
John Spray	dbd0c8bf79	ceph: send client metadata to MDS Implement version 2 of CEPH_MSG_CLIENT_SESSION syntax, which includes additional client metadata to allow the MDS to report on clients by user-sensible names like hostname. Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 12:56:47 -07:00
Chao Yu	a4483e8a42	ceph: remove redundant code for max file size verification Both ceph_update_writeable_page and ceph_setattr will verify file size with max size ceph supported. There are two caller for ceph_update_writeable_page, ceph_write_begin and ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no chance to change file size when mmap. Likewise we have already verified the size in inode_change_ok when we call ceph_setattr. So let's remove the redundant code for max file size verification. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 21:03:40 +04:00
Yan, Zheng	3b70b388e3	ceph: remove redundant io_iter_advance() ceph_sync_read and generic_file_read_iter() have already advanced the IO iterator. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 21:03:39 +04:00
Yan, Zheng	6cd3bcad0d	ceph: move ceph_find_inode() outside the s_mutex ceph_find_inode() may wait on freeing inode, using it inside the s_mutex may cause deadlock. (the freeing inode is waiting for OSD read reply, but dispatch thread is blocked by the s_mutex) Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 21:03:39 +04:00
Yan, Zheng	508b32d866	ceph: request xattrs if xattr_version is zero Following sequence of events can happen. - Client releases an inode, queues cap release message. - A 'lookup' reply brings the same inode back, but the reply doesn't contain xattrs because MDS didn't receive the cap release message and thought client already has up-to-data xattrs. The fix is force sending a getattr request to MDS if xattrs_version is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client does not have xattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 21:03:38 +04:00
Yan, Zheng	03974e8177	ceph: make sure request isn't in any waiting list when kicking request. we may corrupt waiting list if a request in the waiting list is kicked. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 21:03:24 +04:00
Yan, Zheng	656e438294	ceph: protect kick_requests() with mdsc->mutex Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 21:03:24 +04:00
Yan, Zheng	5d23371fdb	ceph: trim unused inodes before reconnecting to recovering MDS So the recovering MDS does not need to fetch these ununsed inodes during cache rejoin. This may reduce MDS recovery time. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 21:03:22 +04:00
Eric W. Biederman	c143c2333c	vfs: Remove d_drop calls from d_revalidate implementations Now that d_invalidate always succeeds it is not longer necessary or desirable to hard code d_drop calls into filesystem specific d_revalidate implementations. Remove the unnecessary d_drop calls and rely on d_invalidate to drop the dentries. Using d_invalidate ensures that paths to mount points will not be dropped. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-09 02:38:58 -04:00
Linus Torvalds	8d2d441ac4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is a lot of refactoring and hardening of the libceph and rbd code here from Ilya that fix various smaller bugs, and a few more important fixes with clone overlap. The main fix is a critical change to the request_fn handling to not sleep that was exposed by the recent mutex changes (which will also go to the 3.16 stable series). Yan Zheng has several fixes in here for CephFS fixing ACL handling, time stamps, and request resends when the MDS restarts. Finally, there are a few cleanups from Himangi Saraogi based on Coccinelle" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits) libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly rbd: remove extra newlines from rbd_warn() messages rbd: allocate img_request with GFP_NOIO instead GFP_ATOMIC rbd: rework rbd_request_fn() ceph: fix kick_requests() ceph: fix append mode write ceph: fix sizeof(struct tYpO *) typo ceph: remove redundant memset(0) rbd: take snap_id into account when reading in parent info rbd: do not read in parent info before snap context rbd: update mapping size only on refresh rbd: harden rbd_dev_refresh() and callers a bit rbd: split rbd_dev_spec_update() into two functions rbd: remove unnecessary asserts in rbd_dev_image_probe() rbd: introduce rbd_dev_header_info() rbd: show the entire chain of parent images ceph: replace comma with a semicolon rbd: use rbd_segment_name_free() instead of kfree() ceph: check zero length in ceph_sync_read() ceph: reset r_resend_mds after receiving -ESTALE ...	2014-08-13 17:43:29 -06:00
J. Bruce Fields	1a0a397e41	dcache: d_obtain_alias callers don't all want DISCONNECTED There are a few d_obtain_alias callers that are using it to get the root of a filesystem which may already have an alias somewhere else. This is not the same as the filehandle-lookup case, and none of them actually need DCACHE_DISCONNECTED set. It isn't really a serious problem, but it would really be clearer if we reserved DCACHE_DISCONNECTED for those cases where it's actually needed. In the btrfs case this was causing a spurious printk from nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED dentry. Josef worked around this by unsetting DCACHE_DISCONNECTED manually in `3a0dfa6a12` "Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol", and this replaces that workaround. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:10 -04:00
Yan, Zheng	282c105225	ceph: fix kick_requests() __do_request() may unregister the request. So we should update iterator 'p' before calling __do_request() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-08-07 14:30:00 +04:00
Yan, Zheng	06fee30f6a	ceph: fix append mode write generic_write_checks() may update 'pos', so we need to pass 'pos' to ceph_sync_write() and ceph_sync_direct_write(); Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-28 13:29:33 +04:00
Ilya Dryomov	7e8a295295	ceph: fix sizeof(struct tYpO *) typo struct ceph_xattr -> struct ceph_inode_xattr Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-28 13:29:27 +04:00
Ilya Dryomov	1a295bd8c8	ceph: remove redundant memset(0) xattrs array of pointers is allocated with kcalloc() - no need to memset() it to 0 right after that. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-28 13:28:33 +04:00
Himangi Saraogi	08a0f24e4c	ceph: replace comma with a semicolon Replace a comma between expression statements by a semicolon. This changes the semantics of the code, but given the current indentation appears to be what is intended. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-07-24 12:04:46 +04:00
Yan, Zheng	d0d0db2268	ceph: check zero length in ceph_sync_read() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-21 10:17:05 +08:00
Yan, Zheng	51da8e8c6f	ceph: reset r_resend_mds after receiving -ESTALE this makes __choose_mds() choose mds according caps Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-14 10:49:15 +08:00
Yan, Zheng	f5f1864743	ceph: properly apply umask when ACL is enabled when ACL is enabled, posix_acl_create() may change inode's mode Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-08 15:08:47 +04:00
Yan, Zheng	5aaa432ad9	ceph: pass proper page offset to copy_page_to_iter() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-08 15:08:47 +04:00
Yan, Zheng	c5c9a0bf1b	ceph: include time stamp in replayed MDS requests Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-08 15:08:46 +04:00
Yan, Zheng	494d77bf8f	ceph: check unsupported fallocate mode Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-08 15:08:46 +04:00
Linus Torvalds	6d87c225f5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "This has a mix of bug fixes and cleanups. Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT check when a second rbd image is mapped and a couple memory leaks. Zheng fixes several issues with fragmented directories and multiple MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's patches fix setting and unsetting RBD images read-only. Naturally there are several other cleanups mixed in for good measure" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) rbd: only set disk to read-only once rbd: move calls that may sleep out of spin lock range rbd: add ioctl for rbd ceph: use truncate_pagecache() instead of truncate_inode_pages() ceph: include time stamp in every MDS request rbd: fix ida/idr memory leak rbd: use reference counts for image requests rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync() rbd: make sure we have latest osdmap on 'rbd map' libceph: add ceph_monc_wait_osdmap() libceph: mon_get_version request infrastructure libceph: recognize poolop requests in debugfs ceph: refactor readpage_nounlock() to make the logic clearer mds: check cap ID when handling cap export message ceph: remember subtree root dirfrag's auth MDS ceph: introduce ceph_fill_fragtree() ceph: handle cap import atomically ceph: pre-allocate ceph_cap struct for ceph_add_cap() ceph: update inode fields according to issued caps rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO ...	2014-06-12 23:06:23 -07:00
Linus Torvalds	16b9057804	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...	2014-06-12 10:30:18 -07:00
Al Viro	3551dd79ac	ceph: switch to iter_file_splice_write() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:21:12 -04:00
Yan, Zheng	4e217b5dc8	ceph: use truncate_pagecache() instead of truncate_inode_pages() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-08 05:09:28 +08:00
Fabian Frederick	3364d113c8	fs/ceph/debugfs.c: replace seq_printf by seq_puts Replace seq_printf where possible. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Sage Weil <sage@inktank.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-06 16:08:06 -07:00
Fabian Frederick	f3ae1b97be	fs/ceph: replace pr_warning by pr_warn Update the last pr_warning callsites in fs branch Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Sage Weil <sage@inktank.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-06 16:08:06 -07:00
Sage Weil	b8e69066d8	ceph: include time stamp in every MDS request We recently modified the client/MDS protocol to include a timestamp in the client request. This allows ctime updates to follow the client's clock in most cases, which avoids subtle problems when clocks are out of sync and timestamps are updated sometimes by the MDS clock (for most requests) and sometimes by the client clock (for cap writeback). Signed-off-by: Sage Weil <sage@inktank.com>	2014-06-06 09:30:00 +08:00
Zhang Zhen	23cd573b46	ceph: refactor readpage_nounlock() to make the logic clearer If the return value of ceph_osdc_readpages() is not negative, it is certainly greater than or equal to zero. Remove the useless condition judgment and redundant braces. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:56 +08:00
Yan, Zheng	ca665e0282	mds: check cap ID when handling cap export message handle following sequence of events: - mds0 exports an inode to mds1. client receives the cap import message from mds1. caps from mds0 are removed while handling the cap import message. - mds1 exports an inode to mds0. client receives the cap export message from mds1. handle_cap_export() adds placeholder caps for mds0 - client receives the first cap export message (for exporting inode from mds0 to mds1) Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:55 +08:00
Yan, Zheng	8d08503c13	ceph: remember subtree root dirfrag's auth MDS remember dirfrag's auth MDS when it's different from its parent inode's auth MDS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:55 +08:00
Yan, Zheng	3e7fbe9ceb	ceph: introduce ceph_fill_fragtree() Move the code that update the i_fragtree into a separate function. Also add simple probabilistic test to decide whether the i_fragtree should be updated Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:54 +08:00
Yan, Zheng	2cd698be9a	ceph: handle cap import atomically cap import messages are processed by both handle_cap_import() and handle_cap_grant(). These two functions are not executed in the same atomic context, so they can races with cap release. The fix is make handle_cap_import() not release the i_ceph_lock when it returns. Let handle_cap_grant() release the lock after it finishes its job. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:53 +08:00
Yan, Zheng	d9df278350	ceph: pre-allocate ceph_cap struct for ceph_add_cap() So that ceph_add_cap() can be used while i_ceph_lock is locked. This simplifies the code that handle cap import/export. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:53 +08:00
Yan, Zheng	f98a128a55	ceph: update inode fields according to issued caps Cap message and request reply from non-auth MDS may carry stale information (corresponding locks are in LOCK states) even they have the newest inode version. So client should update inode fields according to issued caps. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:52 +08:00
Yan, Zheng	c6bcda6f52	ceph: queue vmtruncate if necessary when handing cap grant/revoke cap grant/revoke message from non-auth MDS can update inode's size and truncate_seq/truncate_size. (the message arrives before auth MDS's cap trunc message) Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:51 +08:00
Zhang Zhen	979d4c1895	ceph: remove useless ACL check posix_acl_xattr_set() already does the check, and it's the only way to feed in an ACL from userspace. So the check here is useless, remove it. Signed-off-by: zhang zhen <zhenzhang.zhang@huawei.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:50 +08:00
Fengguang Wu	e84be11c53	ceph: ceph_get_parent() can be static Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-06-06 09:29:50 +08:00
Jeff Layton	130d1f956a	locks: ensure that fl_owner is always initialized properly in flock and lease codepaths Currently, the fl_owner isn't set for flock locks. Some filesystems use byte-range locks to simulate flock locks and there is a common idiom in those that does: fl->fl_owner = (fl_owner_t)filp; fl->fl_start = 0; fl->fl_end = OFFSET_MAX; Since flock locks are generally "owned" by the open file description, move this into the common flock lock setup code. The fl_start and fl_end fields are already set appropriately, so remove the unneeded setting of that in flock ops in those filesystems as well. Finally, the lease code also sets the fl_owner as if they were owned by the process and not the open file description. This is incorrect as leases have the same ownership semantics as flock locks. Set them the same way. The lease code doesn't actually use the fl_owner value for anything, so this is more for consistency's sake than a bugfix. Reported-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (Staging portion) Acked-by: J. Bruce Fields <bfields@fieldses.org>	2014-06-02 08:09:29 -04:00
Al Viro	4908b822b3	ceph: switch to ->write_iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:39:43 -04:00
Al Viro	64c3131161	ceph_sync_direct_write: stop poking into iov_iter guts all needed primitives are there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:39:43 -04:00
Al Viro	2b777c9dd9	ceph_sync_read: stop poking into iov_iter guts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:39:42 -04:00
Al Viro	3644424dc6	ceph: switch to ->read_iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:38:00 -04:00
Al Viro	71d8e532b1	start adding the tag to iov_iter For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:32:49 -04:00
Al Viro	ed978a811e	new helper: generic_file_read_iter() iov_iter-using variant of generic_file_aio_read(). Some callers converted. Note that it's still not quite there for use as ->read_iter() - we depend on having zero iter->iov_offset in O_DIRECT case. Fortunately, that's true for all converted callers (and for generic_file_aio_read() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:32:49 -04:00
Al Viro	05bb2e0bc7	ceph_aio_read(): keep iov_iter across retries Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:32:48 -04:00
Al Viro	d8d3d94b80	pass iov_iter to ->direct_IO() unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:32:44 -04:00
Al Viro	cb66a7a1f1	kill generic_segment_checks() all callers of ->aio_read() and ->aio_write() have iov/nr_segs already checked - generic_segment_checks() done after that is just an odd way to spell iov_length(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:32:43 -04:00
Al Viro	e7c24607b5	kill iov_iter_copy_from_user() all callers can use copy_page_from_iter() and it actually simplifies them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:32:42 -04:00
Linus Torvalds	5575eeb7b9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "First, there is a critical fix for the new primary-affinity function that went into -rc1. The second batch of patches from Zheng fix a range of problems with directory fragmentation, readdir, and a few odds and ends for cephfs" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: reserve caps for file layout/lock MDS requests ceph: avoid releasing caps that are being used ceph: clear directory's completeness when creating file libceph: fix non-default values check in apply_primary_affinity() ceph: use fpos_cmp() to compare dentry positions ceph: check directory's completeness before emitting directory entry	2014-05-05 15:17:02 -07:00
Yan, Zheng	3bd58143ba	ceph: reserve caps for file layout/lock MDS requests Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-04-28 12:55:41 -07:00

1 2 3 4 5 ...

1107 Commits