linux-sg2042/include/linux/ceph
Ilya Dryomov 67645d7619 libceph: fix ceph_msg_revoke()
There are a number of problems with revoking a "was sending" message:

(1) We never make any attempt to revoke data - only kvecs contibute to
con->out_skip.  However, once the header (envelope) is written to the
socket, our peer learns data_len and sets itself to expect at least
data_len bytes to follow front or front+middle.  If ceph_msg_revoke()
is called while the messenger is sending message's data portion,
anything we send after that call is counted by the OSD towards the now
revoked message's data portion.  The effects vary, the most common one
is the eventual hang - higher layers get stuck waiting for the reply to
the message that was sent out after ceph_msg_revoke() returned and
treated by the OSD as a bunch of data bytes.  This is what Matt ran
into.

(2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs
is wrong.  If ceph_msg_revoke() is called before the tag is sent out or
while the messenger is sending the header, we will get a connection
reset, either due to a bad tag (0 is not a valid tag) or a bad header
CRC, which kind of defeats the purpose of revoke.  Currently the kernel
client refuses to work with header CRCs disabled, but that will likely
change in the future, making this even worse.

(3) con->out_skip is not reset on connection reset, leading to one or
more spurious connection resets if we happen to get a real one between
con->out_skip is set in ceph_msg_revoke() and before it's cleared in
write_partial_skip().

Fixing (1) and (3) is trivial.  The idea behind fixing (2) is to never
zero the tag or the header, i.e. send out tag+header regardless of when
ceph_msg_revoke() is called.  That way the header is always correct, no
unnecessary resets are induced and revoke stands ready for disabled
CRCs.  Since ceph_msg_revoke() rips out con->out_msg, introduce a new
"message out temp" and copy the header into it before sending.

Cc: stable@vger.kernel.org # 4.0+
Reported-by: Matt Conner <matt.conner@keepertech.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Matt Conner <matt.conner@keepertech.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-01-21 19:36:08 +01:00
..
auth.h libceph: message signature support 2014-12-17 20:09:50 +03:00
buffer.h libceph: nuke ceph_kvfree() 2014-12-17 20:09:50 +03:00
ceph_debug.h
ceph_features.h libceph: advertise support for keepalive2 2015-09-17 20:14:27 +03:00
ceph_frag.h ceph: ceph_frag_contains_value can be boolean 2016-01-21 19:36:07 +01:00
ceph_fs.h ceph: rename snapshot support 2015-04-22 18:33:41 +03:00
ceph_hash.h
debugfs.h libceph: simplify our debugfs attr macro 2015-04-20 18:55:39 +03:00
decode.h remove extra definitions of U32_MAX 2014-01-23 16:36:55 -08:00
libceph.h libceph: add nocephx_sign_messages option 2015-11-02 23:37:46 +01:00
mdsmap.h ceph: update support for PGID64, PGPOOL3, OSDENC protocol features 2013-02-26 15:02:25 -08:00
messenger.h libceph: fix ceph_msg_revoke() 2016-01-21 19:36:08 +01:00
mon_client.h libceph: nuke pool op infrastructure 2015-02-19 13:31:37 +03:00
msgpool.h UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers 2012-10-02 18:01:25 +01:00
msgr.h libceph: use keepalive2 to verify the mon session is alive 2015-09-08 23:14:30 +03:00
osd_client.h libceph: allow setting osd_req_op's flags 2015-06-25 11:49:27 +03:00
osdmap.h libceph: osdmap.h: Add missing format newlines 2015-04-20 18:55:35 +03:00
pagelist.h libceph: fixup includes in pagelist.h 2014-12-17 20:09:53 +03:00
rados.h libceph: sync osd op definitions in rados.h 2014-10-14 12:57:02 -07:00
types.h UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers 2012-10-02 18:01:25 +01:00