foundationdb/design/Commit/How a commit is done in FDB.md

30 KiB

How a commit is done in FDB

This doc describes how commit is done in FDB 6.3+. The commit path in FDB 6.3 and before is documented in documentation/sphinx/source/read-write-path.rst.

Overall description

Legend:

  • alt means alternative paths
    • The texts in [] are conditions
    • The texts above the arrow are messages.

The diagrams are generated using https://sequencediagram.org. The source code of the diagrams are the *.sequence files.

CommitOverall

Description of each sections

Before all RPCs mentioned below, the client would first verify if the commit proxies and GRV proxies are changed, by comparing the client information ID it holds to the ID the cluster coordinator holds. If they are different, the proxies are changed and the client will refresh the proxies list.

GetReadVersion Section

  • The GRV Proxy sends a request to master to retrieve the current commit version. This version is the read version of the request.

Preresolution Section

  • The commit proxy sends a request for commit version, with a request number.

    • The request number is a monotonically increasing number per commit proxy.
    • This ensures for each proxy, the master will process the requests in order.
  • The master server waits until the request number is current.

    When the current request number is larger than the incoming request number

    • If a commit version is already assigned to the incoming request number, return the commit version and the previous commit version. (i.e. prevVersion)

    • Otherwise return Never

    • Increase current commit version, return it back to the commit proxy.

      • Only one process serves as master. Thus the commit version is unique for each cluster.

      • The monotonically increasing commit version will ensure that each transaction is processed in a strict serial order.

Resolution section

  • The commit proxy sends the transaction to the resolver.
  • Resolver waits until its version reaches prevVersion
    • Ensures all transactions having version smaller than this transaction are resolved.
    • Detects conflicts for the given transaction:
      • If there is no conflict, return TransactionCommitted as the status
      • Any conflict, return TransactionConflict status
      • If the read snapshot is not in MVCC, return TransactionTooOld status

Post Resolution section

  • The proxy waits until the local batch number is current
  • The proxy updates the metadata keys and attaches corresponding storage servers' tags to all mutations.
  • The proxy then waits until the commit version is current, i.e. the proxy's committed version is catching up with the commit version of the batch and these two versions are within the MVCC window.
  • The proxy pushes the commit data to TLogs.
  • TLog waits the commit version to be current, then persists the commit.
  • Wait until all TLogs return the transaction result.

Reply section

  • The proxy updates the master with the committed version for next GRV request at the master.
  • Reply the result to the client, base on the result from the resolver.

Tracking the process using g_traceBatch

g_traceBatch can be used for querying the transactions and commits. A typical query in the trace logs is:

Type=type Location=location

The format of location is, in general, <source_file_name>.<function/actor name>.<log information>, e.g.

NativeAPI.getConsistentReadVersion.Before

means the location is at NativeAPI.actor.cpp, ACTOR getConsistentReadVersion, Before requesting the read version from GRV Proxy.

Some example queries are:

Type=TransactionDebug Location=NativeAPI*
LogGroup=loggroup Type=CommitDebug Location=Resolver.resolveBatch.*

In the following sections, green tag indicates an attach; blue tag indicates an event that the location follows the format mentioned above, where only the <log information> is included; light-blue tag indicates an event that the location is not following the format, where the full location is included. All the g_traceBatch events are tabularized after the diagram.

contrib/commit_debug.py can be used to visualize the commit process.

Get Read Version

GetReadVersion

Role File name Function/Actor Trace Type Location
Client NativeAPI Transaction::getReadVersion
readVersionBatcher TransactionAttachID
getConsistentReadVersion Before TransactionDebug NativeAPI.getConsistentReadVersion.Before
GRVProxy GrvProxyServer queueGetReadVersionRequests Before TransactionDebug GrvProxyServer.queueTransactionStartRequests.Before
transactionStarter TransactionAttachID
AskLiveCommittedVersionFromMaster TransactionDebug GrvProxyServer.transactionStarter.AskLiveCommittedVersionFromMaster
getLiveCommittedVersion confirmEpochLive TransactionDebug GrvProxyServer.getLiveCommittedVersion.confirmEpochLive
Master MasterServer serveLiveCommittedVersion GetRawCommittedVersion TransactionDebug MasterServer.serveLiveCommittedVersion.GetRawCommittedVersion
GRVProxy GrvProxyServer getLiveCommittedVersion After TransactionDebug GrvProxyServer.getLiveCommittedVersion.After
Client NativeAPI getConsistentReadVersion After TransactionDebug NativeAPI.getConsistentReadVersion.After

Get

Get

Role File name Function/Actor Trace Name Location Notes
Client NativeAPI Transaction::get
Transaction::getReadVersion (Refer to GetReadVersion)
getKeyLocation Before TransactionDebug NativeAPI.getKeyLocation.Before getKeyLocation is called by getValue, getKeyLocation actually calls getKeyLocation_internal
After TransactionDebug NativeAPI.getKeyLocation.After
getValue GetValueAttachID
Before GetValueDebug NativeAPI.getValue.Before
Storage Server StorageServer serveGetValueRequests received GetValueDebug StorageServer.received
getValueQ DoRead GetValueDebug getValueQ.DoRead
AfterVersion GetValueDebug getValueQ.AfterVersion
KeyValueStoreSQLite KeyValueStoreSQLite::Reader::action Before GetValueDebug Reader.Before
After GetValueDebug Reader.After
StorageServer AfterRead GetValueDebug getValueQ.AfterRead
Client NativeAPI getValue After GetValueDebug NativeAPI.getValue.After (When successful)
Error GetValueDebug NativeAPI.getValue.Error (Wehn failure)

Get Range

GetRange

Role File name Function/Actor Trace Name Location Notes
Client NativeAPI Transaction::getRange
Transaction::getReadVersion (Refer to GetReadVersion)
getKeyLocation Before TransactionDebug NativeAPI.getKeyLocation.Before getKeyLocation is called by getRange
After TransactionDebug NativeAPI.getKeyLocation.After
getRange Before TransactionDebug NativeAPI.getRange.Before
Storage Server storageserver getKeyValuesQ Before TransactionDebug storageserver.getKeyValues.Before
AfterVersion TransactionDebug storageserver.getKeyValues.AfterVersion
AfterKeys TransactionDebug storageserver.getKeyValues.AfterKeys
Send TransactionDebug storageserver.getKeyValues.Send (When no keys found)
AfterReadRange TransactionDebug storageserver.getKeyValues.AfterReadRange (When found keys in this SS)
Client NativeAPI getRange After TransactionDebug NativeAPI.getRange.After (When successful)
Error TransactionDebug NativeAPI.getRange.Error (Wehn failure)

GetRange Fallback

GetRangeFallback

Role File name Function/Actor Trace Type Location Notes
Client NativeAPI getRangeFallback
getKey GetKeyAttachID
AfterVersion GetKeyDebug NativeAPI.getKey.AfterVersion
Before GetKeyDebug NativeAPI.getKey.Before
After GetKeyDebug NativeAPI.getKey.After Success
Error GetKeyDebug NativeAPI.getKey.Error Error
getReadVersion (Refer to GetReadVersion)
getKeyRangeLocations Before TransactionDebug NativeAPI.getKeyLocations.Before
After TransactionDebug NativeAPI.getKeyLocations.After
getExactRange Before TransactionDebug NativeAPI.getExactRange.Before getKeyRangeLocations is called by getExactRange
After TransactionDebug NativeAPI.getExactRange.After

Commit

Commit

Role File name Function/Actor Trace Type Location Notes
Client NativeAPI Transaction::commit
commitAndWatch
tryCommit commitAttachID
Before CommitDebug NativeAPI.commit.Before
Commit Proxy CommitProxyServer commitBatcher batcher CommitDebug CommitProxyServer.batcher
commitBatch
CommitBatchContext::setupTraceBatch CommitAttachID
Before CommitDebug CommitProxyServer.commitBatch.Before
CommitBatchContext::preresolutionProcessing GettingCommitVersion CommitDebug CommitProxyServer.commitBatch.GettingCommitVersion
GotCommitVersion CommitDebug CommitProxyServer.commitBatch.GotCommitVersion
Resolver Resolver resolveBatch CommitAttachID
Before CommitDebug Resolver.resolveBatch.Before
AfterQueueSizeCheck CommitDebug Resolver.resolveBatch.AfterQueueSizeCheck
AfterOrderer CommitDebug Resolver.resolveBatch.AfterOrderer
After CommitDebug Resolver.resolveBatch.After
Commit Proxy CommitProxyServer CommitBatchContext::postResolution ProcessingMutations CommitDebug CommitProxyServer.CommitBatch.ProcessingMutations
AfterStoreCommits CommitDebug CommitProxyServer.CommitBatch.AfterStoreCommits
TLog TLogServer tLogCommit commitAttachID
BeforeWaitForVersion CommitDebug TLogServer.tLogCommit.BeforeWaitForVersion
Before CommitDebug TLog.tLogCommit.Before
AfterTLogCommit CommitDebug TLog.tLogCommit.AfterTLogCommit
After CommitDebug TLog.tLogCommit.After
Commit Proxy CommitProxyServer CommitBatchContext::reply AfterLogPush CommitDebug CommitProxyServer.CommitBatch.AfterLogPush
Client NativeAPI tryCommit After CommitDebug NativeAPI.commit.After
commitAndWatch
watchValue WatchValueAttachID
Before WatchValueDebug NativeAPI.watchValue.Before
After WatchValueDebug NativeAPI.watchValue.After