foundationdb

Commit Graph

Author	SHA1	Message	Date
Jingyu Zhou	8b67a89eed	More review comments fixed.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	85c4a4e422	Address review comments for PR #1625	2020-01-22 19:38:45 -08:00
Jingyu Zhou	6c6a553dcc	Fix hang due to distributor death in QuietDatabase It's possible that after obtaining data distributor, the distributor then dies and a new one is recruited. Because the tester is still contacting the old one, it becomes stuck.	2020-01-22 19:38:45 -08:00
Jon Fu	471e283128	Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed	2019-09-18 11:49:07 -07:00
Evan Tschannen	8fbd90e2f6	Merge pull request #1985 from xumengpanda/mengxu/storage-engine-switch-PR-v2 Graceful storage engine migration	2019-09-09 13:51:53 -07:00
Meng Xu	879dec1a5d	ConsistencyCheck:Check teamCollectionValid for data_hall mode	2019-09-05 10:34:57 -07:00
Jon Fu	00c2025d4b	fixed removeKeys impl, adjusted test workload, and introduced extra safety checks to NativeAPI and proxy	2019-08-27 14:39:44 -07:00
Meng Xu	a588710376	StorageEngineSwitch:Graceful switch When fdbcli change storeType for storage engines, we switch the store type of storage servers one by one gracefully. This avoids recruiting multiple storage servers on the same process, which can cause OOM error.	2019-08-12 17:37:52 -07:00
Evan Tschannen	9f11f2ec53	Merge branch 'master' of github.com:apple/foundationdb	2019-07-30 16:55:56 -07:00
Evan Tschannen	2d7ec54d3e	fix: some exclude workloads would cause both the primary and remote datacenter to be considered dead	2019-07-30 16:35:52 -07:00
sramamoorthy	5a56f6b456	minor snap create client improvement and bug fixes	2019-07-29 20:28:22 -07:00
Balachandar Namasivayam	bf87d906f6	Fix a crash.	2019-07-25 16:15:28 -07:00
sramamoorthy	31a1e6858b	remove un-necessary state variables in getCoord	2019-07-24 15:36:28 -07:00
sramamoorthy	a65c9f92ed	get rid of all timeouts and other changes	2019-07-24 15:36:28 -07:00
sramamoorthy	a2f2ad96ff	code review comments and merge to master changes	2019-07-24 15:36:28 -07:00
sramamoorthy	d90b678f6f	storage worker to throw in case of failures	2019-07-24 15:36:28 -07:00
sramamoorthy	7ec8fe6e74	snap v2: implement get only local storage workers	2019-07-24 15:36:28 -07:00
sramamoorthy	8f1f0c0435	snap v2: worker and other helper related changes	2019-07-24 15:36:28 -07:00
Meng Xu	64bee63dbc	Resolve two review comments 1) No need to check server with only one team when teamRemover finds a server team or machine team to remove 2) Fix optimalTeamCount counting in teamTracker	2019-07-18 18:46:31 -07:00
Meng Xu	80ed39c189	QuietDB:Disable check for too many teams Because team remover does not remove a team if it causes 0 team per server. So we currently disable the check until we have a better strategy to enforce the desired number of teams. This will not cause much problem in real situation, while having 0 team on a server will make the server unable to host data, which is bad.	2019-07-16 12:38:55 -07:00
Meng Xu	20f067e794	Merge with master:Resolve conflict with PR#1797	2019-07-16 10:52:28 -07:00
Meng Xu	243504b125	DD:Clang format changes	2019-07-15 18:40:14 -07:00
Meng Xu	94e9b8a3b4	Do not remove a team whose min team number is less than target If the minimum number of teams of servers in a team is less than the target value (desired_team_number_per_server * (teamSize + 1) / 2), the team remover should not remove it. Otherwise, DD will oscillate in building more teams and removing redundant teams. Do not do consistency check for three_data_hall mode because when machines are not evenly distributed across data halls, we will need to build more teams than the total desired number to make sure the number of teams per server is no less than the target value.	2019-07-15 18:30:13 -07:00
Meng Xu	cafe9b9412	TC:Target team num per server is desired number Do not overbuild teams because we may oscillate between building more teams and removing the redundant teams. The oscillation happens when the machines are not evenly distributed across availability zones. For example, in three_data_hall mode, we have 1 machine in 1 data hall for 2 data halls. We have 3 machines in the 3rd data hall. To build enough (and more teams) for servers in the 3rd data hall, we will overbuild teams. However, the teamRemover will remove those newly teams.	2019-07-15 17:32:51 -07:00
Meng Xu	cf935ff9e6	Remove debug message and format code	2019-07-11 22:05:20 -07:00
Meng Xu	cd28a0b604	Reenable check each server must have at least 1 team	2019-07-11 17:58:14 -07:00
Meng Xu	221e6945db	TeamTracker:Fix bug in counting optimalTeamCount When a teamTracker is cancelled, e.g, by redundant teamRemover or badTeamRemover, we should decrease the optimalTeamCount if the team is considered as an optimal team, i.e., all members' machine fitness is no worse than unset, and the team is healthy.	2019-07-11 17:22:41 -07:00
Meng Xu	4c32593f59	QuietDB:Do not check when machineId is not zoneID	2019-07-11 10:37:16 -07:00
Meng Xu	4fae510633	AddBestMachineTeams:BugFix:Must build team when it has remainingMachineTeamBudget	2019-07-10 11:55:06 -07:00
Meng Xu	9816fb6aca	ConsistencyCheck:Check minServerTeamOnServer larger than 0	2019-07-10 11:53:47 -07:00
Meng Xu	522230f050	ConsistencyCheck:getTeamCollectionValid tries 10 times before return false Because serverTeamRemover takes time to remove teams, getTeamCollectionValid() need to wait for a while before concluding that the number of server teams is larger than the desired number.	2019-07-09 11:46:57 -07:00
Meng Xu	cf03b274a2	TeamTracker:Add traceTeamCollectionInfo	2019-07-08 23:01:25 -07:00
Meng Xu	08d76a7bbe	ServerTeamRemover:Bug fix and clang-format	2019-07-08 17:08:32 -07:00
Meng Xu	9cc11e88c5	TeamBuilder:Reduce unnecessary calculation of remainingTeamBudget	2019-07-08 16:56:06 -07:00
Meng Xu	08a721b320	Merge branch 'master' into mengxu/server-team-remover-PR	2019-07-08 16:30:32 -07:00
A.J. Beamon	0a5c7608df	Remove "Number" suffix from newly added events (and variables that feed the events).	2019-07-08 15:45:28 -07:00
A.J. Beamon	f52c239ef8	Merge branch 'master' into trace-event-rename # Conflicts: # fdbserver/DataDistribution.actor.cpp # fdbserver/QuietDatabase.actor.cpp	2019-07-08 15:37:00 -07:00
A.J. Beamon	2a56e011ea	Merge branch 'release-6.1' into merge-release-6.1-into-master # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbserver/DataDistribution.actor.cpp	2019-07-05 13:52:29 -07:00
A.J. Beamon	2a709ee5d0	Rename event details that use the suffix "Number" to indicate a count, as number could also imply an index. Rename a few other trace events and details that e.g. needed to be pluralized.	2019-07-05 08:54:21 -07:00
Meng Xu	599fcb2e6d	Add serverTeamRemover to remove redundant server teams	2019-07-02 17:40:37 -07:00
Meng Xu	716494ed9f	ConsistencyCheck:Check serverTeamNumber larger than desired number	2019-07-02 17:40:37 -07:00
Meng Xu	875cb877ac	TeamCollection: Apply clang-format	2019-06-28 16:01:05 -07:00
Meng Xu	0baae134f6	TeamCollectionInfo: Resolve review comments	2019-06-28 15:59:47 -07:00
Meng Xu	4da345f7d2	TeamCollectionTest:Remove test on minTeamOnServer	2019-06-27 19:05:10 -07:00
Meng Xu	f889843332	Change traceTeamCollectionInfo to actor There are cases where traceTeamCollectionInfo was called within the same execution block, i.e., no wait between the two traceTeamCollectionInfo calls. Because simulation uses the same time for all execution instructions in the same execution block, having more than one traceTeamCollectionInfo at the same time will mess up the trackLatest semantics. When one of them is always chosen by simulator, simulation test will report false positive error. Changing this function to actor and adding a small delay inside the function can solve this problem.	2019-06-27 18:24:20 -07:00
Meng Xu	4fe3c7f749	TeamCollectionInfo:Revert to original version where it is	2019-06-27 17:09:21 -07:00
Meng Xu	42620e4831	TeamCollectionTest:GetTeamCollectionValid wait until values are correct	2019-06-27 16:52:36 -07:00
Meng Xu	8d5e848808	QuitDatabase test: Check each server has at least 1 team	2019-06-27 14:22:41 -07:00
Meng Xu	53324e4db7	TeamCollectionInfo: clang format	2019-06-27 11:27:29 -07:00
Meng Xu	cc6a0e9bcd	TeamCollectionTest:Do not enforce minServerTeamOnServer larger than 0 In ConfigureTest, one server may be left with 0 server teams, even if we call buildTeams in the storageServerTracker.	2019-06-27 11:27:29 -07:00

1 2 3

120 Commits