Commit Graph

344 Commits

Author SHA1 Message Date
Evan Tschannen 645dc5ead6 warmRange needs to get a read version occasionally to prevent it from overwhelming the proxy
quietDatabase waits for all data distribution to be completely finished so that databases are cached in a cleaner state
2018-01-14 12:50:52 -08:00
Evan Tschannen 721a891d1f fix: never request more than 100 shards from the proxy at a time to avoid large packets 2018-01-12 10:51:53 -08:00
Evan Tschannen 4e8bc273b3 added a version of getKeyRangeLocations that checks for endpoint failures
fix: did not add the cluster controller to id_used in all cases
removed obsolete fixmes
2018-01-07 15:32:43 -08:00
Evan Tschannen 3ec45d38a0 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Evan Tschannen 14e956c4fc fix: getRange did not check for endpoint failures
refactored checking endpoint failures
2018-01-06 13:49:47 -08:00
Evan Tschannen 63751fb0e2 fix: remote logs are not in the log system until the recovery is complete so they cannot be used to determine if this is the correct log system to recover from 2018-01-05 14:15:25 -08:00
Evan Tschannen 2a869a4178 fixed merge errors 2018-01-05 12:17:17 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
A.J. Beamon 5015119115 Generalize the message that gets displayed in status if a cluster file's contents are incorrect. 2018-01-05 10:29:47 -08:00
Evan Tschannen 982f0dcb1e Merge pull request #222 from cie/alexmiller/drtimefix2
Fix yet another VersionStamp DR issue.
2017-12-20 15:09:23 -08:00
Alex Miller b5a6bc0ab7 Fix VersionStamp problems by instead adding a COMMIT_ON_FIRST_PROXY transaction option.
Simulation identified the fact that we can violate the
VersionStamps-are-always-increasing promise via the following series of events:

1. On proxy 0, dumpData adds commit requests to proxy 0's commit promise stream
2. To any proxy, a client submits the first transaction of abortBackup, which stops further dumpData calls on proxy 0.
3. To any proxy that is not proxy 0, submit a transaction that checks if it needs to upgrade the destination version.
4. The transaction from (3) is committed
5. Transactions from (1) are committed

This is possible because the dumpData transactions have no read conflict
ranges, and thus it's impossible to make them abort due to "conflicting"
transactions.  There's also no promise that if client C sends a commit to proxy
A, and later a client D sends a commit to proxy B, that B must log its commit
after A.  (We only promise that if C is told it was committed before D is told
it was committed, then A committed before B.)

There was a failed attempt to fix this problem.  We tried to add read conflict
ranges to dumpData transactions so that they could be aborted by "conflicting"
transactions.  However, this failed because this now means that dumpData
transactions require conflict resolution, and the stale read version that they
use can cause them to be aborted with a transaction_too_old error.
(Transactions that don't have read conflict ranges will never return
transaction_too_old, because with no reads, the read snapshot version is
effectively meaningless.)  This was never previously possible, so the existing
code doesn't retry commits, and to make things more complicated, the dumpData
commits must be applied in order.  This would require either adding
dependencies to transactions (if A is going to commit then B must also be/have
committed), which would be complicated, or submitting transactions with a fixed
read version, and replaying the failed commits with a higher read version once
we get a transaction_too_old error, which would unacceptably slow down the
maximum throughput of dumpData.

Thus, we've instead elected to add a special transaction option that bypasses
proxy load balancing for commits, and always commits against proxy 0.  We can
know for certain that after the transaction from (2) is committed, all of the
dumpData transactions that will be committed have been added to the commit
promise stream on proxy 0.  Thus, if we enqueue another transaction against
proxy 0, we can know that it will be placed into the promise stream after all
of the dumpData transactions, thus providing the semantics that we require:  no
dumpData transaction can commit after the destination version upgrade
transaction.
2017-12-20 15:04:04 -08:00
Evan Tschannen 1dc9eceb6d optimize GetKeyLocationRequests on the proxy so they only require a single map lookup, instead of doing 3 + (3* [number of ranges]) lookups 2017-12-15 20:13:44 -08:00
Evan Tschannen 73a0a07eac clients ask for key location information directly from the proxy, instead of reading it from the database 2017-12-09 16:10:22 -08:00
Alex Miller e900333dbf Fix a subtle valgrind error.
If there was an error in waiting for the read version, we would attempt to
serialize and eventually commit a CommitTransactionRef that had an
uninitialized read_snapshot.
2017-11-15 19:21:20 -08:00
A.J. Beamon cd085764f1 Do not automatically change a cluster file that does not match what you expect. 2017-11-10 14:12:45 -08:00
Evan Tschannen 33eef4c76b Merge branch 'master' into getrange-perf-improvements 2017-11-04 14:25:45 -07:00
Balachandar Namasivayam 33cc644ba8 Copy CommitTransactionRequest during tryCommit function call as it is anyway copied inside the function. 2017-11-02 17:00:44 -07:00
A.J. Beamon 3cad2676cc Collapse some of the getRange actors into a single actor. Avoid unnecessary comparisons. 2017-11-02 13:39:06 -07:00
A.J. Beamon 07c80484eb Merge branch 'master' into getrange-perf-improvements 2017-11-02 08:42:26 -07:00
Balachandar Namasivayam 3efaaec479 onMasterProxiesChanged was being triggered when any member of ClientDBInfo changed. Change the behavior to be triggered only when proxies field in ClientDBInfo is changed. 2017-11-01 18:29:56 -07:00
Balachandar Namasivayam 1d95e1bfd6 Log actor cancellation error to Client Transaction sampling records. 2017-11-01 11:51:31 -07:00
A.J. Beamon 7cf17df821 Merge branch 'master' into log-group-for-unsupported-clients
# Conflicts:
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-11-01 11:31:02 -07:00
A.J. Beamon e52f46064a Don't use the results arena for conflict ranges. Rearrange the code to avoid the copy unless necessary. 2017-10-26 09:35:34 -07:00
A.J. Beamon 0d68db0dac Merge branch 'master' into getrange-perf-improvements 2017-10-26 09:25:04 -07:00
Balachandar Namasivayam 9dd588dcce Addressed review comments.
Changed naming for NewMin and NewAnd to MinV2 and AndV2
2017-10-25 14:48:05 -07:00
A.J. Beamon a2e059be11 Eliminate some more copies 2017-10-20 09:27:17 -07:00
A.J. Beamon 5fcd58b637 Whitespace fix 2017-10-20 09:24:49 -07:00
A.J. Beamon 55bea14b9e Convert extraConflictRanges from KeyRange to std::pair<Key, Key> to avoid copies. 2017-10-20 09:17:47 -07:00
A.J. Beamon 7bab9a0276 Eliminate some copies 2017-10-20 09:11:26 -07:00
A.J. Beamon 986a99a39a Change the reply priority for NativeAPI requests to the cluster to TaskDefaultPromiseEndpoint. 2017-10-20 09:03:48 -07:00
Evan Tschannen e2c1e87df6 made a large number of fixes to make fearless DR correctness clean. 2017-10-19 15:36:32 -07:00
Balachandar Namasivayam eeebf10030 Modified existing behavior of MIN and AND atomic ops. The new behavior results in a 'SET' if the atomic op is performed on a non -existing key.
Added new atomic ops ByteMin and ByteMax that does lexicographic comparison of byte strings.
2017-10-10 13:02:22 -07:00
Evan Tschannen ef41b07bb3 renamed past_version to transaction_too_old
implemented read_lock_aware option
2017-09-28 16:35:08 -07:00
Yichi Chiang d4f75630de Support log group field in status json 2017-09-28 16:31:29 -07:00
Evan Tschannen 53a4a3280a fix: we cannot add to the trLog when cancelled 2017-09-20 14:47:57 -07:00
Balachandar Namasivayam 24aa616a7a Merge pull request #154 from cie/additional-client-profiling
Additional client profiling
2017-09-19 18:15:02 -07:00
Evan Tschannen dc1f7ca6b7 testers now use client locality load balancing 2017-09-01 12:53:01 -07:00
Yichi Chiang aac82074af Avoid calling setCachedLocation twice 2017-08-08 10:03:04 -07:00
Balachandar Namasivayam e767860010 Addressed review comments.
Changed current protocol version to match master
Added operation details for operations that failed.
2017-08-07 18:45:42 -07:00
Balachandar Namasivayam 3e90fdfae7 Added extra client transactional profiling info
1) Key has been added to GET
2) KeyRange has been added to GETRANGE
3) ReadConflict, WriteConflict, Mutation info has been added to COMMIT
2017-08-01 18:33:39 -07:00
Yichi Chiang 53e1ae9f60 shard system keyspace 2017-07-26 13:47:31 -07:00
Evan Tschannen 035efd79cf fix: if a database gets locked after an unknown result, the dummyTransaction will be stuck until the database is unlocked 2017-06-26 12:12:47 -07:00
Stephen Atherton 03d2b1787a Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	fdbrpc/sim2.actor.cpp
2017-06-22 16:56:25 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00