OpenCloudOS-Kernel/fs/ceph
Alex Elder 26be88087a libceph: change how "safe" callback is used
An osd request currently has two callbacks.  They inform the
initiator of the request when we've received confirmation for the
target osd that a request was received, and when the osd indicates
all changes described by the request are durable.

The only time the second callback is used is in the ceph file system
for a synchronous write.  There's a race that makes some handling of
this case unsafe.  This patch addresses this problem.  The error
handling for this callback is also kind of gross, and this patch
changes that as well.

In ceph_sync_write(), if a safe callback is requested we want to add
the request on the ceph inode's unsafe items list.  Because items on
this list must have their tid set (by ceph_osd_start_request()), the
request added *after* the call to that function returns.  The
problem with this is that there's a race between starting the
request and adding it to the unsafe items list; the request may
already be complete before ceph_sync_write() even begins to put it
on the list.

To address this, we change the way the "safe" callback is used.
Rather than just calling it when the request is "safe", we use it to
notify the initiator the bounds (start and end) of the period during
which the request is *unsafe*.  So the initiator gets notified just
before the request gets sent to the osd (when it is "unsafe"), and
again when it's known the results are durable (it's no longer
unsafe).  The first call will get made in __send_request(), just
before the request message gets sent to the messenger for the first
time.  That function is only called by __send_queued(), which is
always called with the osd client's request mutex held.

We then have this callback function insert the request on the ceph
inode's unsafe list when we're told the request is unsafe.  This
will avoid the race because this call will be made under protection
of the osd client's request mutex.  It also nicely groups the setup
and cleanup of the state associated with managing unsafe requests.

The name of the "safe" callback field is changed to "unsafe" to
better reflect its new purpose.  It has a Boolean "unsafe" parameter
to indicate whether the request is becoming unsafe or is now safe.
Because the "msg" parameter wasn't used, we drop that.

This resolves the original problem reportedin:
    http://tracker.ceph.com/issues/4706

Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-01 21:18:52 -07:00
..
Kconfig fs/ceph: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:39:04 -08:00
Makefile ceph: Makefile: Remove unnessary code 2011-01-12 15:15:13 -08:00
addr.c libceph: combine initializing and setting osd data 2013-05-01 21:18:23 -07:00
caps.c ceph: use i_release_count to indicate dir's completeness 2013-05-01 21:17:07 -07:00
ceph_frag.c ceph: factor out libceph from Ceph file system 2010-10-20 15:37:28 -07:00
debugfs.c libceph: delay debugfs initialization until we learn global_id 2012-08-20 10:03:15 -07:00
dir.c ceph: use i_release_count to indicate dir's completeness 2013-05-01 21:17:07 -07:00
export.c fs: encode_fh: return FILEID_INVALID if invalid fid_type 2013-02-26 02:46:10 -05:00
file.c libceph: change how "safe" callback is used 2013-05-01 21:18:52 -07:00
inode.c ceph: fix symlink inode operations 2013-05-01 21:18:50 -07:00
ioctl.c libceph: rename ceph_calc_object_layout() 2013-05-01 21:16:17 -07:00
ioctl.h ceph: fully initialize new layout 2012-05-16 14:28:27 -05:00
locks.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
mds_client.c libceph: add, don't set data for a message 2013-05-01 21:18:34 -07:00
mds_client.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2013-02-28 17:43:09 -08:00
mdsmap.c ceph: Use pseudo-random numbers to choose mds 2013-05-01 21:18:49 -07:00
snap.c ceph: define snap counts as u32 everywhere 2012-07-30 18:15:47 -07:00
strings.c libceph: update ceph_mds_state_name() and ceph_mds_op_name() 2013-02-18 12:20:34 -06:00
super.c ceph: set up page array mempool with correct size 2013-05-01 21:17:50 -07:00
super.h ceph: use i_release_count to indicate dir's completeness 2013-05-01 21:17:07 -07:00
xattr.c ceph: eliminate sparse warnings in fs code 2013-02-25 15:37:14 -06:00