Commit Graph

58 Commits

Author SHA1 Message Date
David Teigland 98f176fb32 [DLM] don't accept replies to old recovery messages
We often abort a recovery after sending a status request to a remote node.
We want to ignore any potential status reply we get from the remote node.
If we get one of these unwanted replies, we've often moved on to the next
recovery message and incremented the message sequence counter, so the
reply will be ignored due to the seq number.  In some cases, we've not
moved on to the next message so the seq number of the reply we want to
ignore is still correct, causing the reply to be accepted.  The next
recovery message will then mistake this old reply as a new one.

To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
new reply.  We clear this flag if we abort recovery while waiting for a
reply.  Before the flag is set again (to allow new replies) we know that
any old replies will be rejected due to their sequence number.  We also
initialize the recovery-message sequence number to a random value when a
lockspace is first created.  This makes it clear when messages are being
rejected from an old instance of a lockspace that has since been
recreated.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:37:14 -05:00
David Teigland 1babdb4531 [DLM] fix size of STATUS_REPLY message
When the not_ready routine sends a "fake" status reply with blank status
flags, it needs to use the correct size for a normal STATUS_REPLY by
including the size of the would-be config parameters.  We also fill in the
non-existant config parameters with an invalid lvblen value so it's easier
to notice if these invalid paratmers are ever being used.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:37:08 -05:00
David Teigland 435618b75b [DLM] status messages ping-pong between unmounted nodes
Red Hat BZ 213682

If two nodes leave the lockspace (while unmounting the fs in the case of
gfs) after one has sent a STATUS message to the other, STATUS/STATUS_REPLY
messages will then ping-pong between the nodes when neither of them can
find the lockspace in question any longer.  We kill this by not sending
another STATUS message when we get a STATUS_REPLY for an unknown
lockspace.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:35:06 -05:00
David Teigland f5888750aa [DLM] sequence number missing in not_ready reply
When a status reply is sent for a lockspace that doesn't yet exist, the
message sequence number from the sender was not being copied into the
reply causing the sender to ignore the reply.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-24 09:37:43 -04:00
David Teigland 4a99c3d9d6 [DLM] reject replies to old requests
When recoveries are aborted by other recoveries we can get replies to
status or names requests that we've given up on.  This can cause problems
if we're making another request and receive an old reply.  Add a sequence
number to status/names requests and reject replies that don't match.  A
field already exists for the seq number that's used in other message
types.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-09 17:32:07 -04:00
David Teigland faa0f26772 [DLM] show nodeid for recovery message
To aid debugging, it's useful to be able to see what nodeid the dlm is
waiting on for a message reply.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-09 09:46:38 -04:00
David Teigland 3bcd3687f8 [DLM] Remove range locks from the DLM
This patch removes support for range locking from the DLM

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-23 09:56:38 +00:00
David Teigland e7fd41792f [DLM] The core of the DLM for GFS2/CLVM
This is the core of the distributed lock manager which is required
to use GFS2 as a cluster filesystem. It is also used by CLVM and
can be used as a standalone lock manager independantly of either
of these two projects.

It implements VAX-style locking modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>
2006-01-18 09:30:29 +00:00