Commit Graph

177 Commits

Author SHA1 Message Date
Xiaoxi Wang bbcb3cc018 extract KeyBackedConfig, StorageWiggleData class; solve template resolution problem; solve MV txn and native api conflict by splitting RunTransaction file 2023-01-02 23:34:39 -08:00
Xiaoxi Wang f13453fe63 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/wiggleDelay 2022-12-20 17:21:19 -08:00
Meng Xu e6b2254726 Resolve review comments: No functional change 2022-12-19 15:28:01 -08:00
Meng Xu a1d513b355 Fix:Exclusion stuck because DD cannot build new teams
Bug behavior:
When DD has zero healthy machine teams but more unhealthy machine teams
than the max machine teams DD plans to build, DD will stop building
new machine teams. Due to zero healthy machine team (and zero healthy
server team), DD cannot find a healthy destination team  to relocate data.
When data relocation stops, exclusion stops progressing and stuck.

Bug happens when we *shrink* a k-host cluster by
first adding k/2 new host;
then quickly excluding all old hosts.

Fix:
Let DD build temporary extra teams to relocate data.
The extra teams will be cleaned up later by DD's remove extra teams logic.

Simulation test:
There is no simulation test to cover cluster expansion scnenario.
To most closely simulate this behavior, we intentionally overbuild all possible
machine teams to trigger the condition that unhealthy teams is larger than
the maximum teams DD wants to build later.
2022-12-19 15:28:01 -08:00
Xiaoxi Wang a33b366f19 merge feature/main/ppwLoadBalance 2022-12-15 13:27:44 -08:00
Xiaoxi Wang 919c512cdc fix wiggler state setting 2022-12-15 12:14:40 -08:00
Xiaoxi Wang ab4778bd19 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/ppwLoadBalance 2022-12-15 11:36:20 -08:00
Xiaoxi Wang c12de23824 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/wiggleDelay 2022-12-11 14:27:22 -08:00
Xiaoxi Wang 3e4966d5bd persistent perpetual wiggle delay 2022-12-08 23:46:26 -05:00
sfc-gh-tclinkenbeard 68f14f017c Fix clang 15 compiler warnings 2022-12-08 13:59:37 -08:00
Xiaoxi Wang ccc494319c perpetual wiggle key functions 2022-12-08 16:46:05 -05:00
FoundationDB CI 86d6106dc1
format source code after switch to clang 15 2022-12-08 17:26:45 +00:00
Xiaoxi Wang 16d11143fa add smallLoadThreshold logic and change knobs 2022-12-07 11:45:49 -05:00
Xiaoxi Wang aae89c863d DDTeamCollection.getAverageShardBytes 2022-12-07 10:08:22 -05:00
Xiaoxi Wang 5d01d33531 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/ppwLoadBalance 2022-12-07 09:11:55 -05:00
Xiaoxi Wang 73a72d70fd consider the overall load in the cluster 2022-12-07 08:58:52 -05:00
Xiaoxi Wang c89d74fa1b rewrite loadBytesBalanceRatio; rename knobs; update comments 2022-11-16 12:52:25 -08:00
Xiaoxi Wang ac923cfbcd add knobs; make ppw wait for byte load balance 2022-11-10 12:25:51 -08:00
Xiaoxi Wang 7a5f2973c5 move stopWiggleSignal to StorageWiggler; update workload 2022-11-08 23:02:35 -08:00
Xiaoxi Wang 4727449ef0 Merge branch 'main' of https://github.com/apple/foundationdb into fix/main/restoreStats 2022-11-08 15:35:15 -08:00
Xiaoxi Wang 4971976a61 make trackExcludedServers PRIORITY_SYSTEM_IMMEDIATE 2022-11-07 14:38:04 -08:00
Zhe Wu 56001de2d4 More nit changes around DD 2022-11-07 09:11:16 -08:00
Zhe Wu 3a02f919b9 Add some comments in DD and fix some nit 2022-11-07 09:11:16 -08:00
Xiaoxi Wang 03a9dd009a fix compilation errors 2022-11-06 22:46:54 -08:00
Xiaoxi Wang 8e6a9730ea move StorageWiggleMetrics out; add workload; try to fix the restore/reset bug (not test) 2022-11-03 23:42:44 -07:00
Jingyu Zhou c127bb1c30 Fix some clang warnings on unused variables 2022-11-01 15:38:47 -07:00
Lukas Joswiak 9d3c3b1efe Remove cluster ID logic from individual roles
The logic to determine the validity of a process joining a cluster now
belongs on the worker and the cluster controller. It is no longer
restricted to tlogs and storages, but instead applies to all processes
(even stateless ones).
2022-10-27 13:56:13 -07:00
Xiaoxi Wang 5d90703dc8 finish getKeysLocations etc, and unit test pass. 2022-10-24 09:58:41 -07:00
Jingyu Zhou a8391caf23 Revert "Data loss protection v2" 2022-10-20 18:09:58 -05:00
Lukas Joswiak 72bc89cf39 Remove cluster ID logic from individual roles
The logic to determine the validity of a process joining a cluster now
belongs on the worker and the cluster controller. It is no longer
restricted to tlogs and storages, but instead applies to all processes
(even stateless ones).
2022-10-18 21:37:42 -07:00
Jingyu Zhou 63f5705560
Merge pull request #8414 from sfc-gh-xwang/feature/main/txnProcessor_team
Replace self->cx with self->dbContext() method and trivial renaming
2022-10-10 10:09:36 -07:00
Markus Pilman ea1325a552
Merge pull request #8319 from sfc-gh-tclinkenbeard/add-rare-code-probe-annotation
Add `rare` code probe decoration
2022-10-07 09:39:00 -06:00
Xiaoxi Wang 2ad4f29539 dbContext() replace self->cx, remove cx member 2022-10-04 16:39:22 -07:00
Xiaoxi Wang 21b2e11bc4 getWorkers from IDDTxnProcessor 2022-10-04 14:57:04 -07:00
Xiaoxi Wang 28e170ca69 remove unnecessary Database cx parameters 2022-10-04 14:29:45 -07:00
Xiaoxi Wang 4cf4ccc089 correct getServerListAndProcessClasses implementation (100k pass) 2022-10-03 22:24:35 -07:00
Xiaoxi Wang 76f2dc8ce0 merge upstream/main 2022-10-02 22:07:42 -07:00
Xiaoxi Wang df9b21169d change shared_ptr to Reference 2022-09-27 11:22:47 -07:00
Xiaoxi Wang 14d73193d5 waitDDTeamInfoPrintSignal, getClusterId, tryUpdateReplicasKeyForDc in IDDTxnProcessor 2022-09-26 23:00:31 -07:00
sfc-gh-tclinkenbeard 985958c260 Add rare code probe decoration 2022-09-25 15:28:32 -07:00
Xiaoxi Wang 1194774d54 rename dbProcessor to db; rename getDb() to context() 2022-09-23 15:35:39 -07:00
Xiaoxi Wang 11a6cba2c6 rename dbProcessor to db; readability improvement 2022-09-22 17:11:07 -07:00
Xiaoxi Wang e7a280ec03 format code 2022-09-21 20:49:39 -07:00
Xiaoxi Wang 97fd5878d9 change DDTeamCollection constructor 2022-09-20 13:00:28 -07:00
A.J. Beamon 4fd64630e8 Convert literal string ref instances to use _sr suffix 2022-09-19 11:35:58 -07:00
Xiaoxi Wang dab8bcd109 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/wiggler-tss 2022-08-12 15:27:50 -07:00
Xiaoxi Wang 9133d4e16d
Merge pull request #7803 from sfc-gh-xwang/feature/main/ddvisibility
Add server selection counter in DDQueue
2022-08-12 15:10:25 -07:00
Xiaoxi Wang b131dc9692 make getNextWigglingServerID() consider TSS recruitment;add unittest 2022-08-12 13:44:29 -07:00
Xiaoxi Wang 8b9684ae40 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/wiggler-tss 2022-08-12 10:48:37 -07:00
Xiaoxi Wang 860ffbc51e getTargetTSSInDC() method 2022-08-12 10:47:39 -07:00