foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	c05c95cbe8	forgot to rename the knob	2020-02-25 15:47:39 -08:00
Evan Tschannen	aa4d1357b3	handle the case that there is only one healthy team	2020-02-21 15:41:01 -08:00
Evan Tschannen	457dbc5215	Update fdbserver/DataDistribution.actor.cpp Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>	2020-02-21 15:39:17 -08:00
Evan Tschannen	6a634652c4	Update fdbserver/DataDistribution.actor.cpp Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>	2020-02-21 15:39:06 -08:00
Evan Tschannen	08914a2acd	Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team	2020-02-21 15:14:32 -08:00
Evan Tschannen	819c55556c	More aggressively attempt to find teams that do not have low disk space	2020-02-20 16:47:50 -08:00
A.J. Beamon	e1fb568fd1	Merge branch 'release-6.2' into dd-use-available-space # Conflicts: # fdbserver/DataDistribution.actor.cpp # fdbserver/DataDistribution.actor.h # fdbserver/DataDistributionQueue.actor.cpp	2020-02-20 16:12:42 -08:00
A.J. Beamon	e4b483796d	Combine some logic that was doing similar computations for free space ratio.	2020-02-20 14:52:08 -08:00
A.J. Beamon	4c9c736253	Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them.	2020-02-20 11:21:03 -08:00
A.J. Beamon	3a1ba5a077	Rename variable for clarity	2020-02-20 10:59:52 -08:00
A.J. Beamon	c164acb88d	Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck.	2020-02-20 09:32:00 -08:00
A.J. Beamon	b8a252da40	Clarify the names of a couple trace fields	2020-02-10 08:15:00 -08:00
Evan Tschannen	9b80498180	Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth	2020-01-10 14:58:38 -08:00
Evan Tschannen	c2608f0af9	fix: completeSources could be larger than the teamSize, so we need to check all completeSources we do not need to track bestSize, since all teams in the list will be the same size	2020-01-10 14:46:40 -08:00
Evan Tschannen	ab7071932f	Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly	2020-01-09 16:59:37 -08:00
Evan Tschannen	3a3ab5664b	fix: team trackers for bad teams that contain a removed servers must be cancelled or the cluster will falsely report those teams as failed	2019-11-22 10:20:13 -08:00
Evan Tschannen	f8e44d2f71	fix: If a storage server was offline, it would not be checked for being in an undesired dc	2019-10-23 23:04:39 -07:00
Evan Tschannen	86bcb84b45	Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards	2019-10-11 17:50:43 -07:00
Evan Tschannen	4b5080fbea	added a few more missing data distribution priorities	2019-09-27 19:39:53 -07:00
Evan Tschannen	324d0bd3b0	Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-cleanup-mutations	2019-09-27 19:15:14 -07:00
Evan Tschannen	3bb62e008c	lowered the priority of some delays in data distribution so that the process will prefer other work	2019-09-27 18:33:13 -07:00
Meng Xu	32ebd08f9f	DD:Trigger storage recruitment when an invalid address locality is corrected	2019-09-24 13:35:38 -07:00
Meng Xu	515689d07b	Update fdbserver/DataDistribution.actor.cpp Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>	2019-09-18 14:45:18 -07:00
Meng Xu	d175b61a62	DD:Trace when invalid locality is corrected Change getWorkers param from txn handler to db	2019-09-18 13:52:54 -07:00
Meng Xu	93bbc26a35	set:erase:Use return value of erase iterator as next iterator	2019-09-17 15:28:30 -07:00
Meng Xu	d2fd1f4931	DD:MisconfiguredLocality:Fix review comments	2019-09-17 13:04:21 -07:00
Meng Xu	37d2318eed	DD:Handle worker with incorrect locality When a worker has incorrect locality, the worker will be excluded from storage recruitment. When the worker has its locality corrected by system operators, the worker will be reincluded for storage recruitment.	2019-09-14 12:12:56 -07:00
Meng Xu	c3960aba17	DD:initializeStorage:Exclude worker with invalid locality	2019-09-13 22:05:41 -07:00
Meng Xu	75460089e1	DD_VALIDATE_LOCALITY:Add comment for our future selves When we add simulation test that misconfigure a cluster by not setting some locality entries, we should set DD_VALIDATE_LOCALITY always true. Otherwise, simulation tests may fail.	2019-09-13 16:26:54 -07:00
Meng Xu	78b8e48cef	DD:ValidLocality:Resolve review comment	2019-09-13 15:35:16 -07:00
Meng Xu	e1dcdbf3d2	LocalityData:Remove verbose check for valid locality	2019-09-13 15:11:13 -07:00
Meng Xu	8970d9858b	DD:isValidLocality:A generic way to check any replicationPolicy	2019-09-13 14:55:51 -07:00
Meng Xu	1196841b3d	DD:IsValidLocality:Clang format	2019-09-13 13:56:43 -07:00
Meng Xu	e8878b16d4	DD:Valid locality includes an empty but set locality entry	2019-09-13 13:55:46 -07:00
Meng Xu	1596e2e4a5	DD:TCMachine:Use processID as machineID if zoneID is unset	2019-09-13 13:43:41 -07:00
Meng Xu	3ad7e3adb3	DD:DD_VALIDATE_LOCALITY:Guard the checking of locality validity	2019-09-13 13:19:35 -07:00
Meng Xu	90d6a27a0d	DD:IsValidLocality:Consider configured replica policy	2019-09-13 12:04:49 -07:00
Meng Xu	52f6297b52	DD:Introduce isValidLocality A server or machine has a valid locality only if it sets correct locality entries. Build teams should only use the valid locality servers or machines	2019-09-13 11:30:26 -07:00
Evan Tschannen	cc41f3e2fc	fix: an unhealthy server with a low number of teams could cause data distribution to build every possible teams	2019-09-12 14:18:10 -07:00
sramamoorthy	5d87443323	improved error msgs for snapshot cmd	2019-08-27 16:43:52 -07:00
Evan Tschannen	297b65236f	added additional trace events to warn when different parts of shard relocations take more than 10 minutes	2019-08-16 14:56:58 -07:00
Evan Tschannen	ba54508c47	code cleanup	2019-08-06 16:30:30 -07:00
Evan Tschannen	5dc4c80d44	fix: the machineAttrition workload did not ensure that healthyZone was always cleared fix: an assert could trigger spuriously	2019-08-05 15:00:17 -07:00
Evan Tschannen	7d7aa27c2d	Merge pull request #1814 from dongxinEric/feature/1508/finer-grained-dd-controls Added finer grained controls to DataDistribution in fdbcli.	2019-07-31 17:36:20 -07:00
Evan Tschannen	bba01c6531	fix: add subsetOfEmergencyTeam could add an unsorted team	2019-07-31 16:02:08 -07:00
Xin Dong	b653ddb30d	Final clean ups after rebasing master	2019-07-30 22:35:34 -07:00
Xin Dong	5d20364423	Address review comments	2019-07-30 22:24:30 -07:00
Xin Dong	1922c39377	Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race.	2019-07-30 22:24:30 -07:00
Xin Dong	c6e5472d8d	Apply suggestions from code review Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>	2019-07-30 22:20:45 -07:00
Xin Dong	f5d6e3a5b3	- Addressed review commends - Added test for the storage server failure disable switch	2019-07-30 22:20:45 -07:00

1 2 3 4 5 ...

384 Commits