Commit Graph

156 Commits

Author SHA1 Message Date
Meng Xu 0507e9703c FastRestore:Disable abort fast restore test 2020-02-20 06:54:14 -08:00
Meng Xu e794003b16 FastRestore:AtomicOpTest:Enable buggify 2020-02-19 20:25:07 -08:00
Meng Xu 4d027428df FastRestore:Test:Increase txn load to 2500txn per second
More transactions per second can create more data in a backup version,
this can help test if fast restore handles large memory allocation correctly.
2019-12-16 12:13:29 -08:00
Meng Xu 64abaaf02d FastRestore:Reenable tests for nightly test 2019-12-12 07:57:52 -08:00
Meng Xu 04230b59bf Increase load a bit 2019-12-07 21:14:56 -08:00
Meng Xu 78b8961891 Move parallel restore tests to tests folder
Valgrind found errors on these two parallel restore tests,
although correctness test confirms these two tests have no correctness error.

To prevent these two parallel restore tests from spamming valgrind test results,
we exclude these two tests from our nightly tests for now.
2019-11-25 10:53:10 -08:00
Meng Xu 7cf87e9ae3 FastRestore:Add ParallelRestoreCorrectnessCycle.txt test 2019-11-03 17:31:54 -08:00
Meng Xu 63359bfc8b FastRestore:handleInitVersionBatchRequest:Ensure exact once execution
Also increase the test workload for BackupAndParallelRestoreWithAtomicOp test
2019-11-03 17:26:13 -08:00
Jingyu Zhou a30e6ec147
Merge pull request #2277 from xumengpanda/mengxu/fastrestore-atomicOpTest-increaseLoadAndBugFix-PR
Performant restore [7/XX]: Add tests for transactionBatchSizeThreshold when apply mutations
2019-10-24 21:21:14 -07:00
Meng Xu 7af3239ee7 FastRestore:AtomicOpTest:Debug:1 key per group for ops keyspace 2019-10-23 14:36:34 -07:00
Meng Xu ba7e499efe FastRestore:AtomicOpTest:Limit 1 actor per client 2019-10-23 14:04:14 -07:00
Meng Xu e676348710
Merge pull request #1955 from fzhjon/mark-ss-failed
Add fdbcli and API command to mark storage servers as permanently failed
2019-10-22 23:36:30 -07:00
Meng Xu 96d463bab6 FastRestore:Fix bug in applying mutations and increase atomicOp test worload
When Applier applies mutations to the destination cluster, it advances the
mutation cursor twice when it should only advance it once.
This makes restore miss some mutations when the applying txn includes
more than 1 mutations.
2019-10-22 23:24:23 -07:00
Meng Xu 970327b554 FastRestore:Add ParallelRestoreCorrectnessAtomicOpTinyData.txt 2019-10-21 14:42:11 -07:00
Jon Fu 450a09e117 Code Review Changes 2019-09-24 15:48:50 -07:00
Jon Fu 807b02551e updated help message and changed existing workload to use mark as failed feature 2019-08-27 14:39:43 -07:00
Trevor Clinkenbeard 8f06ccb065 Increased testDuration for DifferentClustersSameRV test
Unlocking the second database can take a while because the local
ratekkeeper throttles reads until the updateStorage actor catches up
after advanceVersion
2019-06-25 16:56:49 -07:00
Trevor Clinkenbeard 148f922813 Revert "Add CycleTest (currently times out)"
This reverts commit f367fcaf25.
2019-06-20 18:34:41 -07:00
Andrew Noyes f367fcaf25 Add CycleTest (currently times out) 2019-06-11 14:10:34 -07:00
Andrew Noyes 02e173b601 Add changeConnectionFile method to Transaction
Also add tests
2019-06-11 13:58:22 -07:00
Alex Miller 6d23eb2d1a Implement log_version.
This mega-commit introduces a new configuration setting, `log_version`,
that controls the TLog implementations and features that are available
within FDB, so that users can opt in to new features if they're willing
to sacrifice backwards compatibility.
2019-02-22 12:15:23 -08:00
Evan Tschannen d8ea3dbf9a Added the ability to configure a cluster from a JSON file 2018-08-16 17:34:59 -07:00
Evan Tschannen 9c918a28f6 fix: status was reporting no replicas remaining when the remote datacenter was initially configured with usable_regions=2 2018-08-09 13:16:09 -07:00
Evan Tschannen 6d76ff67a3 added the connection string to status 2018-07-09 22:11:58 -07:00
Evan Tschannen 507b3bacb0 fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
added more recovery states.
2018-07-05 00:08:51 -07:00
Evan Tschannen 866ccfe344 added the ability to allow the master to finish recovery before all storage servers in both regions have their mutations. This allows you to recover from scenarios where you lose all your tlogs in one dc. 2018-07-04 01:59:04 -04:00
Evan Tschannen 0123627d67 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-06-22 10:43:07 -07:00
Evan Tschannen 8a8914f046 re-added the ability to configure the number of log routers. Many log routers are needed to get a sufficient number of sockets involved in copying data across the WAN 2018-06-22 00:04:00 -07:00
A.J. Beamon e8f66df001 Add metrics for watches and mutations on the storage server. The storage server tracks its lag with the logs, and status tries to report a more accurate measure of this lag. 2018-06-21 15:59:43 -07:00
A.J. Beamon 62eeefcc8a Update status schema with new field 2018-06-20 12:09:28 -07:00
Evan Tschannen 1ccfb3a0f4 fix: log_anti_quorum was always 0 in simulation
removed durableStorageQuorum, because it is no longer a useful configuration parameter
2018-06-18 10:24:57 -07:00
Evan Tschannen e8c462882b re-added remote_logs as a parameter, because it could be useful to have a different number of logs between when recruited as primary and remote 2018-06-18 10:22:34 -07:00
Evan Tschannen 0913368651 added usable_regions to specify if we will replicate into a remote region
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Evan Tschannen 246abd1207 added full_replication to status 2018-06-14 21:14:18 -07:00
Evan Tschannen 0103b6f5ed added datacenter_version_difference to status 2018-06-14 19:09:25 -07:00
Evan Tschannen 99e21c869c fixed a number of status calculations, and re-enabled the status workload 2018-06-14 17:58:57 -07:00
A.J. Beamon 432a295bc2 Add read bytes and read keys info to status. Collect this information directly from StorageMetrics rather than through ratekeeper. 2018-05-04 12:01:40 -07:00
yichic ede5cab192
Merge pull request #89 from yichic/share-log-mutations-5.2
Share log mutations 5.2
2018-03-19 12:01:26 -07:00
Yichi Chiang 1f2602d2b3 Fix all review comments 2018-03-19 11:33:33 -07:00
Yichi Chiang d6559b144f Share log mutations between backups and DRs which have the same backup range 2018-03-19 11:32:50 -07:00
A.J. Beamon b2ef6e1358 Add missing available_bytes fields to test status schemas 2018-03-09 14:17:20 -08:00
Stephen Atherton b86f68ceb8 Added new test that combines atomic backup/restore. Added randomization to delays in AtomicRestore workload. 2018-01-05 14:43:21 -08:00
A.J. Beamon 5015119115 Generalize the message that gets displayed in status if a cluster file's contents are incorrect. 2018-01-05 10:29:47 -08:00
A.J. Beamon 7cf17df821 Merge branch 'master' into log-group-for-unsupported-clients
# Conflicts:
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-11-01 11:31:02 -07:00
A.J. Beamon fcbeea104f Update documentation and tests with new connected_clients status schema 2017-11-01 11:25:50 -07:00
Evan Tschannen 48901a9223 added a list of tlog IDs that are missing to status 2017-10-24 16:28:50 -07:00
Alex Miller 11668bb359 Fixing code review comments. 2017-09-29 15:58:36 -07:00
Alex Miller 9cb478be6d Add a VersionStamp+BackupToDBAbort test that fails. 2017-09-29 15:58:36 -07:00
Alex Miller 69523ce151 Hackish version of a test, but it does fail. 2017-09-29 15:58:36 -07:00
Yichi Chiang d4f75630de Support log group field in status json 2017-09-28 16:31:29 -07:00
Evan Tschannen e8b895c878 added the ability to disable connection failures for a period of time after one happens 2017-09-18 12:46:29 -07:00
Evan Tschannen 489332533c all timeouts longer than two minutes have been can be lowered to 60.0 with buggification
added a workload that tries for a 50 second maximum latency in the presence of one failure with both buggification and connection failures
2017-09-18 11:04:51 -07:00
Evan Tschannen 2d0dbd57e8 randomized the delays in atomic switchover workload 2017-06-01 12:08:21 -07:00
Evan Tschannen 1626e16377 Merge branch 'release-4.6' into release-5.0 2017-05-31 16:23:37 -07:00
Stephen Atherton 98604d33a0 Merge branch 'fix-io-timeout-handling'
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/sim2.actor.cpp
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/optimisttest.actor.cpp
#	fdbserver/worker.actor.cpp
#	fdbserver/workloads/MachineAttrition.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00