Commit Graph

8243 Commits

Author SHA1 Message Date
Meng Xu 7f37a90c48 FastRestore:Introduce FASTRESTORE_VB_PARALLELISM
for controlling the number of concurrently running version batches.
2020-01-28 10:39:57 -08:00
Meng Xu 5330dd1937 FastRestore:Randomize running order of version batches 2020-01-27 22:39:25 -08:00
Meng Xu 754e2cb023 FastRestore:Fix:number of workers should be calculated
based on number of appliers and loaders.
2020-01-27 21:14:00 -08:00
Meng Xu e2cb13c6ff FastRestore:Fix false positive in splitKeyRangeForAppliers 2020-01-27 20:44:23 -08:00
Meng Xu 0f4dfeda5b FastRestore:Test random number of appliers and loaders 2020-01-27 20:19:58 -08:00
Meng Xu 3cfe1f031d FastRestore:Use set instead of vector for keysplitter
and disable testing for random number of appliers and loaders
2020-01-27 20:14:33 -08:00
Meng Xu bc4508e1c5 FastRestore:Fix ambiguous reverse_iterator 2020-01-27 19:54:56 -08:00
Meng Xu cab9d51e06 Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-01-27 18:16:26 -08:00
Meng Xu 141609e80a FastRestore:Improve code style and fix typos 2020-01-27 18:13:14 -08:00
Meng Xu 75dc34f775 FastRestore:A single version may be larger than version batch threshold
In case data at a single version is larger than FASTRESTORE_VERSIONBATCH_MAX_BYTES,
we should allow a version batch to include the version and ignore the
FASTRESTORE_VERSIONBATCH_MAX_BYTES limit to avoid false positive in simulation.

In real environment, this situation will report SevError to ask DBA to
increase the memory limit for a version batch.
2020-01-27 12:17:20 -08:00
Alex Miller 6945a6ea01
Merge pull request #2345 from zjuLcg/add-consistency-verification-in-mako-workload
Add consistency verification in mako workload
2020-01-24 17:07:49 -08:00
Alex Miller d06d664ed7
Merge pull request #2149 from tapaswenipathak/ticket-2135
Add comments to explain functions in ReplicationUtils.cpp
2020-01-24 16:35:11 -08:00
Meng Xu b04e98771e FastRestore:Replace FastRestoreOpConfig with Knobs
And randomize value for the rest of knobs
2020-01-24 14:24:34 -08:00
Meng Xu 11f3dca122 FastRestore:Balance load on loaders across version batches 2020-01-23 23:21:06 -08:00
Meng Xu cfdcddd90e FastRestore:Loader:Pipeline sendMutationsToApplier actors 2020-01-23 20:22:05 -08:00
Alex Miller 2bc5b2cf8a
Merge pull request #2585 from Ma27/fix-glibc230-build
Fix build with glibc 2.30
2020-01-23 20:21:32 -08:00
Meng Xu 16f9ec45bd Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-01-23 20:15:21 -08:00
Evan Tschannen 3453d5563d
Merge pull request #2591 from etschannen/master
fix: backupWorker would crash when run outside of simulation
2020-01-23 19:08:08 -08:00
Evan Tschannen 8f599e9d15 fix: backupWorker would crash when run outside of simulation 2020-01-23 19:06:39 -08:00
Evan Tschannen 76e192d490
Merge pull request #2538 from alexmiller-apple/hashlittle2-to-crc32c
Convert more hashlittle{,2} uses to crc32c_append
2020-01-23 17:54:38 -08:00
Evan Tschannen 6c0b934dda
Merge pull request #2242 from alexmiller-apple/fix-10min-stall-again
Fix the 10min multi-region recovery stall again
2020-01-23 17:53:02 -08:00
A.J. Beamon b2c8a4a34c
Merge pull request #2519 from xumengpanda/mengxu/fast-restore-versionBatch-fixSize-PR
Performant restore [14/XX]: Ensure each version-batch not exceed a configured size
2020-01-23 16:49:01 -08:00
Meng Xu 4bf579a6d5 FastRestore:Fix race condition in pipeline
Master should not start asking appliers to apply mutations at batchIndex
until all appliers have applied mutations at (batchIndex - 1).
Otherwise, mutations may not be applied in increasing order of versions,
because appliers at different batch index can have overlapped key ranges.
2020-01-23 16:34:45 -08:00
Meng Xu e011f39829 FastRestore:Add sanity check and trace events 2020-01-23 16:03:41 -08:00
A.J. Beamon 8a065b9da4
Merge pull request #2557 from alexmiller-apple/reduce-versionstamp-conflictranges
Narrow the unreadable range of keys after a versionstamped key operation
2020-01-23 11:14:47 -08:00
Evan Tschannen fe0e10c312
Merge pull request #1625 from jzhou77/backup-worker
Create a backup role in the cluster
2020-01-23 10:06:35 -08:00
Vishesh Yadav b5c1c8cdd0
Merge pull request #2560 from ajbeamon/java-bindingtester-error-retry
Add retry loop to Java binding tester instruction reader
2020-01-23 09:54:43 -08:00
A.J. Beamon 297a56c219
Merge pull request #2582 from mpilman/features/documentation-server
Add `docpreview` target
2020-01-23 09:43:18 -08:00
Maximilian Bosch e133cb974b
Fix build with glibc 2.30
The `gettid()` function is part of glibc 2.30[1]. I decided to keep the
`gettid` implementation here under a different name to remain compatible
to older glibc versions.

[1] https://sourceware.org/ml/libc-alpha/2019-08/msg00029.html
2020-01-23 09:28:18 +01:00
Jingyu Zhou 6ddf73e26a Remove code introduced when resolving merge conflicts 2020-01-22 21:23:38 -08:00
Jingyu Zhou 39fbacbc4f Address review comments 2020-01-22 19:43:40 -08:00
Jingyu Zhou acebfdc67b Restore storage queue limit to 0 in consistency check
The storage queue is no longer going to be a problem failing tests. Now the
backup worker life cycle is tied with backup. So consistency check only happens
after the backup workload is done. Thus, we no longer need to save backup
progress when consistency check is running.
2020-01-22 19:43:40 -08:00
Jingyu Zhou c6c39ca99d Update better master exist with backup workers
During recruitment, if there is no desired log router count, use tlog size
instead, because the number of backup workers has to be larger than 0.
2020-01-22 19:43:40 -08:00
Jingyu Zhou 8b67a89eed More review comments fixed. 2020-01-22 19:42:13 -08:00
Jingyu Zhou 1eaea91cb3 Address review comments 2020-01-22 19:42:13 -08:00
Jingyu Zhou 1311fec45a Add an option to get minKnownCommittedVersion from Proxies
The backup worker needs to use this version for popping when running in a NOOP
mode. This option is added to GetReadVersionRequest and proxies will send back
minKnownCommittedVersion if the option is set.

Also add a couple of knobs for backup workers.
2020-01-22 19:42:13 -08:00
Jingyu Zhou 7989f3f015 Add NOOP to backup worker
The backup worker just blindly pop tags if the "backupStartedKey" is not set.
Note the commit version from TLog cannot be used as the pop version, because
for a single region, during a recovery the log router tags are used to recover
mutations. The backup worker can potentially pop mutations that are needed for
recovery, causing consistency errors. So the solution for now is to use commit
version - 5,000,000, which is a version guaranteed to be persisted on all
replicas.
2020-01-22 19:42:13 -08:00
Jingyu Zhou c08a192c75 Add a backup start key
If the backup key is not set, do not recruit backup workers for old epoches.
2020-01-22 19:42:13 -08:00
Jingyu Zhou e14246ac16 Add more information for trace events 2020-01-22 19:42:13 -08:00
Jingyu Zhou 4bed33031f Set backup worker start version to be savedVersion + 1
If no progress found, start version is set to epochBegin. So the start version
is the one after the last saved (or from last epoch's saved) version.
2020-01-22 19:42:13 -08:00
Jingyu Zhou dcd0a46bc6 Fix a rare remote recovery bug
This bug was introduced when I added log router tags unconditionally to any
configurations. In newEpoch(), the wait for remote recovery is conditioned on
"logRouterTags == 0", which always becomes false. Thus remote recovery was not
performed and remote TLogs won't copy data from previous epoch's TLogs
(previous epoch is a single region configuration). As a result, storage servers
cannot peek/get the data, and won't pop tags. Thus, waitForFullReplication()
became stuck and eventually test timeout.
2020-01-22 19:42:13 -08:00
Jingyu Zhou 56a2c37071 Recruit backup workers for single region
Enable log router tags for single region, which are popped by backup workers.
Need to add noop for backup workers if there is no active backups.
2020-01-22 19:42:13 -08:00
Jingyu Zhou 0e5f5b50f0 Remove unused backup worker knobs 2020-01-22 19:38:46 -08:00
Jingyu Zhou 60f360c954 Log oldest backup epoch in the backup worker 2020-01-22 19:38:46 -08:00
Jingyu Zhou 690e93145e Fix some comments 2020-01-22 19:38:46 -08:00
Jingyu Zhou 06fb45f32a FileConverter skips mutation files without tag ID
Fileconverter doesn't know the format of old mutation logs.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 3854d9a49a Fix decoder to handle multi-part values
When a value (i.e., mutations for a version) is large, it will be split into
multiple key value pairs. This is not handled previously and fixing it also
consolidate the interface of DecodeProgress.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 7d1b9fe6d3 Add mutation file decoder 2020-01-22 19:38:46 -08:00
Jingyu Zhou 568a8a8e77 Use big endian for mutation log files
For each mutation, its version, sub-version, and size are prefixed with big
endian representation. This is required, especially for the first version
variable, because we use 0xFF for padding purpose. A little endian version
number can easily collide with 0xFF, while big endian is guaranteed to have
0x00 as the first byte.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 114e153bc8 Use block size encoded in file names
The log files have block size encoded in their names and the converter should
use these sizes.
2020-01-22 19:38:46 -08:00