Commit Graph

126 Commits

Author SHA1 Message Date
A.J. Beamon ebeca10bce Change the serialization of tags sent in some messages. Add communication of the sampling rate from cluster to clients. 2020-04-09 16:55:56 -07:00
A.J. Beamon 36da61dd9c Merge branch 'master' into transaction-tagging
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/vexillographer/fdb.options
2020-04-07 21:12:14 -07:00
A.J. Beamon 2336f073ad Checkpointing a bunch of work on throttles. Rudimentary implementation of auto-throttling. Support for manual throttling via fdbcli. Throttles are stored in the system keyspace. 2020-04-03 15:24:14 -07:00
Meng Xu 60f6edc3b5
Merge pull request #2860 from zjuLcg/report-conflicting-key-roll-forward
Report conflicting key roll forward
2020-03-30 17:33:56 -07:00
Meng Xu 113d0fb48b Remove incorrect assertion 2020-03-27 12:13:30 -07:00
Andrew Noyes 289487559d Revert "Revert "Merge pull request #2257 from zjuLcg/report-conflicting-key""
This reverts commit 804fe1b22e.
2020-03-24 18:11:15 -07:00
Balachandar Namasivayam 804fe1b22e Revert "Merge pull request #2257 from zjuLcg/report-conflicting-key"
This reverts commit 648dc4a933, reversing
changes made to 487d131b38.
2020-03-19 21:34:28 -07:00
chaoguang 0094293d50 add const vars 2020-03-11 23:11:49 -07:00
chaoguang d1c56d3b57 add constant KeyRefs in SystemData 2020-03-11 12:25:50 -07:00
chaoguang 7a76e9556d Merge remote-tracking branch 'upstream/master' into report-conflicting-key 2020-03-04 11:24:39 -08:00
Meng Xu a12a161fb3 Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-02-18 14:49:52 -08:00
Meng Xu c603b20e7e FastRestore:Resolve review comments 2020-02-18 14:08:27 -08:00
Jingyu Zhou 5a602f58e8 Start backup with a wait on all backup workers running
This wait is to make sure that backup workers are already saving mutations so
that no mutations are missed. The idea is that the CLI sets a "backupStartedKey"
in the database and waits for allWorkerStarted() key of the backup to be set.

Backup workers monitor the changes to the "backupStartedKey" and start logging
mutations. Additionally, backup worker for Tag(-2,0) monitors all other workers
have started (checking their saved progress version is larger than the backup's
start version), and then sets the allWorkerStarted() key for the backup.
2020-01-31 19:29:09 -08:00
Meng Xu 141609e80a FastRestore:Improve code style and fix typos 2020-01-27 18:13:14 -08:00
Meng Xu 16f9ec45bd Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-01-23 20:15:21 -08:00
Jingyu Zhou 1eaea91cb3 Address review comments 2020-01-22 19:42:13 -08:00
Jingyu Zhou c08a192c75 Add a backup start key
If the backup key is not set, do not recruit backup workers for old epoches.
2020-01-22 19:42:13 -08:00
Jingyu Zhou 19d6a889ff Recruit backup workers for old epochs
If there are unfinished ranges in the old epochs, the new master will recruit
backup workers responsible for finishing these ranges. These workers remains in
the cluster until the next epoch, when it will remove itself.
2020-01-22 19:38:45 -08:00
Jingyu Zhou 41f0cf2bb5 Add decode function for backup progress 2020-01-22 19:38:45 -08:00
Jingyu Zhou 7da9f47f26 Enable pop from backup workers
This is still WIP as some edge cases can trigger test failure, most likely due
to not popping mutations by backup workers when epoch ends.
2020-01-22 19:38:45 -08:00
Meng Xu bfbf2164c4 FastRestore:Applier buffer data for multiple batches 2020-01-17 17:01:01 -08:00
chaoguang 10719200c3 A hack way to call API through getRange("\xff\xff/conflicting_keys\<start_key>", "\xff\xff/conflicting_keys\<end_key>"). 2020-01-06 11:22:11 -08:00
negoyal d46c7ded59 Merge remote-tracking branch 'origin/master' into storage-cache-subfeature1 2019-11-14 17:52:22 -08:00
negoyal a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
Meng Xu 630c29d160 FastRestore:resolve review comments
1) wait on whenAtLeast;
2) Put BigEndian64 into the function call and the decoder to prevent
future people from making the same mistake.
2019-11-11 17:00:16 -08:00
Meng Xu eb67886b75 FastRestore:Move comment to func definition
Resolve review comments.
2019-11-11 15:10:27 -08:00
Meng Xu 58aa6711e4 FastRestore:ApplyToDB:BugFix:Serialize integer as bigEndian to ensure lexico order 2019-11-03 17:26:07 -08:00
Andrew Noyes b7b5d2ead3 Remove several nonsensical const uses
These seem to be all the ones that clang's -Wignored-qualifiers
complains about
2019-10-26 14:30:34 -07:00
Jon Fu f4237ebfff Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-16 11:32:16 -07:00
Meng Xu 71509a5157 FastRestore:Applier:applyToDB:Clang format 2019-10-10 17:36:38 -07:00
Meng Xu 84b5a5525f FastRestore:Add restoreApplierKeys 2019-10-10 17:18:34 -07:00
Jon Fu 471e283128 Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-09-18 11:49:07 -07:00
Evan Tschannen 8fbd90e2f6
Merge pull request #1985 from xumengpanda/mengxu/storage-engine-switch-PR-v2
Graceful storage engine migration
2019-09-09 13:51:53 -07:00
Meng Xu c2355f721e Merge branch 'master' into mengxu/performant-restore-PR 2019-09-04 17:11:42 -07:00
Meng Xu d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
Jon Fu c908c6c1db added command to fdbcli and changes to SystemData and ManagementAPI 2019-08-27 14:39:43 -07:00
Meng Xu e6284684f0 StorageEngineSwitch:Always remove wrong storeType SS
In the old logic of switching storage engines, it marks a storage server
with wrong store type as undesired even though this can lead to no healthy team.

In the first version of the new storage engine switch, we mimic the same logic
of the old version.
2019-08-13 14:59:46 -07:00
Meng Xu a588710376 StorageEngineSwitch:Graceful switch
When fdbcli change storeType for storage engines,
we switch the store type of storage servers one by one gracefully.
This avoids recruiting multiple storage servers on the same process,
which can cause OOM error.
2019-08-12 17:37:52 -07:00
Jingyu Zhou 4a63de16e9
Merge pull request #1945 from xumengpanda/mengxu/tLog-code-read-v2
Add comments to DiskQueue and tLog
2019-08-08 13:24:32 -07:00
Meng Xu 7ff46e6772 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-07 20:31:56 -07:00
Evan Tschannen ba54508c47 code cleanup 2019-08-06 16:30:30 -07:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu c9c50ceff8 Comments:Add comments to DiskQueue
No functional change.
2019-08-01 15:20:01 -07:00
Meng Xu 7ccaeddf05 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-01 13:23:17 -07:00
Xin Dong 1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong ae11efcb0a Made following changes:
- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
2019-07-30 22:20:45 -07:00
Xin Dong 4ecfc9830f Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)

Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Meng Xu b0c31f28af FastRestore:Fix bug that blocks restore
1) Should recruit only configured number of roles;
2) Should never register a restore master interface as a restore worker (loader or applier) interface.
2019-07-25 17:55:37 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Evan Tschannen 94c66f8d58
Merge pull request #1738 from bnamasivayam/consistency-check-disable
Disable/Re-enable consistency check through a database key.
2019-07-18 10:56:02 -07:00