foundationdb

Commit Graph

Author	SHA1	Message	Date
Meng Xu	505997ba0a	FastRestore:Switch to new sendBatchRequests that tracks performance and straggler	2020-02-21 15:45:32 -08:00
Meng Xu	03f699f2f9	Merge branch 'master' into mengxu/fast-restore-applier-multi-applying-PR	2020-02-19 15:22:33 -08:00
Meng Xu	94d799552e	FastRestore:Apply clang-format against master	2020-02-18 16:41:59 -08:00
Meng Xu	132f5aa9ba	FastRestore:Improve trace name and cosmetic change	2020-02-18 16:41:19 -08:00
Meng Xu	b5e60585aa	FastRestore:Applier:Fix precompute mutation result	2020-02-13 12:57:47 -08:00
Meng Xu	cda8fc189e	FastRestore:AtomicOp:Intro weighted size for atomicOp atomicOp has an amplified performance overhead to the cluster, for example, an ADD operation can be small, but SS has to load the value to do the operation and the value can be large.	2020-02-11 12:48:05 -08:00
Meng Xu	e76b6d824a	FastRestore:Assign priority to actors to prioritize vb work When we pipeline multiple version batches, we should prevent a later version batch from blocking the earlier version batch by consuming CPU resources. To achive the above, we should assign higher priority to actors in later phases in a version batch. Because restore master will not invoke an actor at a later phase unless the actors at the earlier phases have been finished. This priority assignment will not cause dead lock.	2020-02-10 20:29:23 -08:00
Meng Xu	dbce1e9974	FastRestore:Applier:Add metrics counter and proc counter	2020-02-10 16:38:26 -08:00
Meng Xu	1fc793d6a7	FastRestore:Loader:Add metrics counter	2020-02-09 22:06:14 -08:00
Meng Xu	72110de7e2	FastRestore:Add trace for quick perf. measurement	2020-02-06 19:48:26 -08:00
Meng Xu	cab9d51e06	Merge branch 'master' into mengxu/fast-restore-pipeline-PR	2020-01-27 18:16:26 -08:00
Meng Xu	141609e80a	FastRestore:Improve code style and fix typos	2020-01-27 18:13:14 -08:00
Meng Xu	cfdcddd90e	FastRestore:Loader:Pipeline sendMutationsToApplier actors	2020-01-23 20:22:05 -08:00
Meng Xu	e011f39829	FastRestore:Add sanity check and trace events	2020-01-23 16:03:41 -08:00
Meng Xu	009fcdeb16	FastRestore:Sanity check each restore asset is processed exactly once	2020-01-21 17:17:45 -08:00
Meng Xu	022783b449	Start batches in reverse order for testings and code cleanup	2020-01-21 14:49:40 -08:00
Meng Xu	4ac92d223b	Cleanup batch buffer for each restore request	2020-01-21 14:49:36 -08:00
Meng Xu	d69bd2f661	FastRestore:Loader buffer data for multiple batches	2020-01-17 17:01:06 -08:00
Meng Xu	bfbf2164c4	FastRestore:Applier buffer data for multiple batches	2020-01-17 17:01:01 -08:00
Meng Xu	f436ea806e	FastRestore:Resolve review comment 1) Sort logfiles by endVersion 2) Exit program early when restore will not succeed 3) Do not increase nextVersion unncessarily when calculate version batches. 4) Change assert condition that ensures progress in calculating version batches.	2020-01-13 14:08:27 -08:00
Meng Xu	c29e380076	FastRestore:Remove prevVersion from LoadingParam	2020-01-07 14:59:17 -08:00
Meng Xu	9df02512ab	FastRestore:Apply clang-format	2020-01-07 11:50:32 -08:00
Meng Xu	67e913c3d5	Change LoadingParam struct and endVersion definition 1) Remove endVersion field because it has been included in RestoreAsset; 2) Ensure endVersion in VersionBatch and RestoreAsset is always exclusive; 3) Revise ASSERT in laoder and applier in situations when the dummy commit version is endVersion, to avoid false positive ASSERT failure.	2020-01-07 11:48:03 -08:00
Meng Xu	c3f8f3b445	FastRestore:Build VersionBatch less than threshold size	2020-01-07 11:46:56 -08:00
Meng Xu	c10035ba54	FastRestore:Use isInVersionRange based on code review	2019-12-23 15:01:27 -08:00
Meng Xu	8d6f511816	FastRestore:Resolve review comment Filter out range mutations that do not overlap with the restore range. Small changes on format.	2019-12-22 20:09:10 -08:00
Meng Xu	61b29de3ce	FastRestore:Self code review Clean up commented code; Add sanity check.	2019-12-20 22:24:34 -08:00
Meng Xu	ddcf3fdd80	FastRestore:Apply clang format	2019-12-20 22:00:36 -08:00
Meng Xu	2cd1f0780a	FastRestore:Split asset to subasset for async parsing files	2019-12-20 21:44:40 -08:00
Meng Xu	e98b2a0d1c	FastRestore:Introduce RestoreAsset	2019-12-20 18:00:10 -08:00
Meng Xu	ffc8f76710	FastRestore:Rename StringRefReaderMX to BackupStringRefReader	2019-12-19 11:49:37 -08:00
Meng Xu	b5d7890ce0	FastRestore:Resolve review comments	2019-12-12 07:45:30 -08:00
Meng Xu	9670d64fbd	FastRestore:Remove commented code	2019-12-11 16:48:40 -08:00
Meng Xu	1371db4cdc	FastRestore:Self code review and cleanup 1. Review memory use cases and improve: Ensure state varialble is initialized and change unnecessary state variable to variable. 2. Remove debug code that is no longer useful; 3. Mute verbose debug.	2019-12-11 16:37:33 -08:00
Meng Xu	9a6dabe47e	Merge branch 'mengxu/fastrestore-code-cleanup-PR' into mengxu/fast-restore-fix-valgrind-PR	2019-12-10 20:05:35 -08:00
Meng Xu	feb2a8c70c	FastRestore Change RestoreSendMutationVectorVersionedRequest name Change RestoreSendMutationVectorVersionedRequest to RestoreSendVersionedMutationsRequest for better naming	2019-12-10 17:23:40 -08:00
Meng Xu	20a19978f9	FastRestore:LoadingParam cleanup	2019-12-10 17:20:44 -08:00
Meng Xu	e8dfc1c187	Replace pop_front(size) with new empty standalone obj	2019-12-06 23:16:49 -08:00
Meng Xu	4a66366a05	Use MutationsVec instead of VectorRef	2019-12-06 22:00:40 -08:00
Meng Xu	39a4f2372f	Change FASTRESTORE_SAMPLING_PERCENT to 0 to 100	2019-12-04 21:26:27 -08:00
Meng Xu	c6b36dbffb	FastRestore:Sampling:Resolve review comments	2019-12-04 17:35:11 -08:00
Meng Xu	dd91d26dfa	FastRestore:Sampling:Add FASTRESTORE_SAMPLING_RATE knob	2019-12-04 11:46:29 -08:00
Meng Xu	2b987d1945	FastRestore:typedef Standalone<VectorRef<MutationRef>> MutationsVec	2019-12-04 11:39:55 -08:00
Meng Xu	9383c3f0a6	FastRestore:Sampling:Apply clang format	2019-12-03 21:27:06 -08:00
Meng Xu	3310f67e9e	Merge branch 'mengxu/fast-restore-fix-valgrind-PR' into mengxu/fast-restore-sampling-PR	2019-12-03 16:24:40 -08:00
Meng Xu	153b713b53	FastRestore:Add sampling on parsed mutations	2019-12-03 12:52:17 -08:00
Meng Xu	474f0067c4	Remove unneeded state	2019-11-25 23:10:14 -08:00
Meng Xu	a04f314b1b	Merge pull request #2383 from jzhou77/restore Use sizeof() to replace constant numbers	2019-11-22 16:14:44 -08:00
Jingyu Zhou	037e808253	Address review comments by changing variable names	2019-11-22 13:12:04 -08:00
Jingyu Zhou	9927a9013f	Use sizeof() to replace constant numbers	2019-11-22 11:47:25 -08:00
Meng Xu	78f10f15b3	FastRestore:replace insert with emplace for map and vector This resolves the review suggestions.	2019-11-21 22:47:04 -08:00
Meng Xu	343bcd104a	FastRestore:Apply Clang format	2019-11-20 21:04:18 -08:00
Meng Xu	3f5491318d	FastRestore:Fix bug that cause nondeterminism 1) Use map iterator instead of pointer to maintain stability when map is inserted or deleted 2) dummySampleWorkload: clear rangeToApplier data in each sampling phase. otherwise, we can have an increasing number of keys assigned to the applier.	2019-11-15 11:30:09 -08:00
Meng Xu	9e36b897e6	FastRestore:Loaders must send to appliers log files data before range files	2019-11-12 21:43:12 -08:00
Meng Xu	592f4c0fc4	FastRestore:Remove RestoreSetApplierKeyRangeVectorRequest	2019-11-12 17:59:11 -08:00
Meng Xu	7e4c4ea98e	FastRestore:Load mutations before assign ranges to appliers	2019-11-12 17:14:17 -08:00
Jingyu Zhou	ae7e42face	Merge pull request #2313 from xumengpanda/mengxu/fastrestore-applyToDB-bugfix-PR Performant restore [8/XX]: Fix bugs in applyToDB logic and add more tests	2019-11-12 08:50:23 -08:00
Meng Xu	630c29d160	FastRestore:resolve review comments 1) wait on whenAtLeast; 2) Put BigEndian64 into the function call and the decoder to prevent future people from making the same mistake.	2019-11-11 17:00:16 -08:00
A.J. Beamon	cf2ec3418c	Merge pull request #2317 from xumengpanda/mengxu/fastrestore-extend-atomicOpTest-PR AtomicOps Test: Add more detailed debug information when test fails with opType = AddValue	2019-11-11 15:03:10 -08:00
Meng Xu	0ccded1929	AtomicOps:Resolve review comments	2019-11-05 19:27:49 -08:00
Meng Xu	c4d1e6e1a9	Trace:Severity:Include SevNoInfo to mute trace Define SevFRMutationInfo to trace mutations in restore.	2019-11-04 16:18:40 -08:00
Meng Xu	e345c9061f	FastRestore:Refine debug messages	2019-11-04 11:47:38 -08:00
Meng Xu	7903b47b82	FastRestore:Remove unnecessary return	2019-10-24 13:09:24 -07:00
Meng Xu	c53f817c5e	FastRestore:Convert handleInitVersionBatchRequest to plain func	2019-10-24 13:06:50 -07:00
Meng Xu	60d26ff5d7	FastRestore:Resolve review comments	2019-10-24 12:52:12 -07:00
Meng Xu	bae0c907a6	FastRestore:Convert unnecessary actor function to plain function	2019-10-23 15:10:34 -07:00
Meng Xu	ab4a375b95	FastRestore:RestoreLoader:Define SerializedMutationPartMap type	2019-10-17 10:12:38 -07:00
Meng Xu	78b1ebc7c2	FastRestore:Loader:Handle multiple mutations at same verions in multiple files	2019-10-16 20:57:16 -07:00
Meng Xu	d160810662	FastRestore:Resolve review comments	2019-09-04 16:48:43 -07:00
Meng Xu	9cc832cfd6	FastRestore:Fix Mac and Windows compilation error	2019-08-02 14:33:08 -07:00
Meng Xu	3b54363780	FastRestore:Apply Clang-format	2019-08-01 18:09:12 -07:00
Meng Xu	45083edf74	Merge branch 'master' into mengxu/performant-restore-PR Fix conflicts as well.	2019-07-25 10:46:11 -07:00
Meng Xu	f1741aa90d	FastRestore: Resolve review comments 1) Do not keep restore role data (e.g., masterData) in restore worker; 2) Change function parameter list by only passing in the needed variables in role data; 3) Remove unneccessary files vector from masterData; 4) Change typos in comments and some functions name.	2019-07-24 17:51:53 -07:00
Meng Xu	701676dbd2	FastRestore:Refactor code and add missing files Add RestoreWorker.actor.cpp and RestoreWorkerInterface.actor.h back.	2019-06-18 09:54:27 -07:00
Meng Xu	022b555b69	FastRestore:Fix bug in finish restore RestoreMaster may not receive all acks. for the last command, i.e., finishRestore, because RestoreLoaders and RestoreAppliers exit immediately after sending the ack. If the ack is lost, it will not be resent. This commit also removes some unneeded code. This commit passes 50k random tests without errors.	2019-06-05 20:07:18 -07:00
Meng Xu	3fcb6ec0a1	FastRestore:Refactor RestoreLoader and fix bugs Refactor RestoreLoader code and Fix a bug in notifying restore finish.	2019-06-04 21:53:31 -07:00
Meng Xu	477fd152c0	FastRestore:Refactor code 1) Use the runRYWTransaction for simple DB access 2) Replace some printf with TraceEvent 3) Remove printf not used in debugging 4) Avoid wait inside the condition in loop-choose-when for the core routine of restore worker, loader and applier. 5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since the file only has functionalities related to restore worker. Passed correctness test	2019-06-04 11:22:47 -07:00
Meng Xu	a372c82db2	FastRestore:BugFix:Loader must distinguish range and log mutations sent to appliers	2019-05-30 21:22:33 -07:00
Meng Xu	450bda9a01	FastRestore:Refactor parsing backup file code Refactor _parseRangeFileToMutationsOnLoader and _parseLogFileToMutationsOnLoader functions and their callees	2019-05-30 14:01:48 -07:00
Meng Xu	a3f61e6df7	FastRestore:Rafctor:Reduce code size 1) Use runRYWTransaction to replace the loop-try style; 2) Remove unnecessary printf 3) Do not mistakenly send reply twice.	2019-05-29 17:03:50 -07:00
Meng Xu	9e1216af1c	FastRestore:Remove CMDUID	2019-05-29 13:48:04 -07:00
Meng Xu	4f484a2a5d	FastRestore:Refactor out the use of cmdID and other non-must functions	2019-05-29 13:26:17 -07:00
Meng Xu	d56837ba16	FastRestore:Refactor LoadFileRequest 1) Remove global map to buffer the parsed mutations on loader. Use local map instead to increase parallelism. 2) Use std::map<LoadingParam, Future<Void>> to hold the actor that parse a backup file and to de-duplicate requests. 3) Remove unused code.	2019-05-28 18:39:00 -07:00
Meng Xu	fe2624fc22	FastRestore:Remove sampling phase Remove the sampling phase to make the PR easier to review. The sampling design and implementation may be changed and added in next PR.	2019-05-26 21:34:58 -07:00
Meng Xu	3eadb31798	FastRestore:Resolve two major reveiw comments 1) Add sendBatchRequests and getBatchReplies sendBatchRequests is a generic actor to send requests without processing replies. getBatchReplies is similar to sendBatchRequests expect that it returns the reply to caller. 2) Share applier interface to loaders by using RequestStream, instead of using DB. Create RestoreSysInfo struct, similar purpose as DBInfo, for the restore system information that are shared among restore workers.	2019-05-24 21:53:21 -07:00
Meng Xu	fac63a83c4	FastRestore:Use NotifiedVersion to deduplicate requests Add a NotifiedVersion into an applier data which represents the smallest version the applier is at. When a loader sends mutation vector to appliers, it sends the request that contains prevVersion and commitVersion. This commits also put actor into an actorCollector for loop-choose-when situation.	2019-05-22 22:09:54 -07:00
Meng Xu	12817af03f	FastRestore:Fix CMake compiling errors	2019-05-16 20:01:43 -07:00
Meng Xu	35b169fd2d	FastRestore:Fix bug in registerMutationsToApplier We forgot to update the applierInterface reference to the iterated applyID	2019-05-14 22:10:09 -07:00
Meng Xu	d9c97b5e5f	FastRestore:Fix bug in sending a vector of mutations When mutationVectorThreshold is not 1, a loader sends a vector of mutations to an applier. We should never mix mutations at different versions into the same vector. The code on previous commit may mix mutations at versions. This commit resolves the bug.	2019-05-14 21:04:36 -07:00
Meng Xu	f33e3bf8bc	FastRestore:bugFix:loader must clear kvOps after use it In the sampling phase, a loader will cache the mutations into kvOps map; In the loading log file phase, the loader will do the same thing. The loader must clear the kvOps map once the loader use it; otherwise, it will cache the sampled mutations twice, which leads to an inconsistent restored DB.	2019-05-14 20:28:32 -07:00
Meng Xu	f54a1e1463	FastRestore:Fix bug in deciding applierID in splitMutation	2019-05-14 17:39:44 -07:00
Meng Xu	f8c654cd86	FastRestore:Fix splitMutation bug The splitted range mutation had a wrong param1 for the produced first mutation	2019-05-14 17:05:50 -07:00
Meng Xu	1f159113e6	FastRestore:Test multiple appliers Loaders will split a range mutation for multiple appliers when needed.	2019-05-14 16:41:04 -07:00
Meng Xu	6c4c807801	FastRestore:fix bug due to non-unique cmdid This commit identifies the bug why DB may be restored to an inconsistent state. The cmdid is used to achieve exact once delivery even when network can deliver a request twice. This is under assumption that cmdid is unique for each request! However, this assumption may not hold for the phase Loader_Send_Mutations_To_Applier, when loaders send parsed mutations to appliers: 1) When the same loader loads multiple files, we reset the cmdid for the phase; 2) When different loaders load files, each loader's cmdid starts from 0 for the phase. Both situations can break the assumption, which causes appliers to miss some mutations to apply. This breaks the cycle test.	2019-05-14 01:49:49 -07:00
Meng Xu	c115e3ceb1	FastRestore: Remove handleSampleLogFileRequest handleSampleLogFileRequest is replaced by handleLoadLogFileRequest	2019-05-13 18:49:13 -07:00
Meng Xu	730142d532	FastRestore: Mark sampled file as processed files This commit should pass correctness test, but it does not mean the fast restore logic is correct. We should NOT mark sampled file as processed files.	2019-05-13 17:53:11 -07:00
Meng Xu	76dd8dc8a8	FastRestore: Fix splitMutation bug	2019-05-13 17:24:57 -07:00
Meng Xu	c7cd758e01	FastRestore:Do not mark log file as processed in sampling This commit will expose a potential bug in fast restore. We may need to parse range file before log file.	2019-05-13 11:37:20 -07:00
Meng Xu	26b224cddc	FastRestore:RestoreLoader: Unify parsing log file Use a generic actor to parse log files for sampling phase and load phase.	2019-05-13 11:36:29 -07:00
Meng Xu	a2fef23678	FastRestore: Remove handleSampleRangeFile actor	2019-05-13 10:36:44 -07:00

1 2 3 4

157 Commits