foundationdb

Commit Graph

Author	SHA1	Message	Date
Xin Dong	b653ddb30d	Final clean ups after rebasing master	2019-07-30 22:35:34 -07:00
Xin Dong	cda70700cc	Address review comments. 50K correctness with no failures.	2019-07-30 22:24:30 -07:00
Xin Dong	5d20364423	Address review comments	2019-07-30 22:24:30 -07:00
Xin Dong	1922c39377	Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race.	2019-07-30 22:24:30 -07:00
Xin Dong	f5d6e3a5b3	- Addressed review commends - Added test for the storage server failure disable switch	2019-07-30 22:20:45 -07:00
Xin Dong	4ecfc9830f	Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is: - Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures - Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller) Kicked off two 200K correctness and showed no related errors.	2019-07-30 22:17:21 -07:00
Evan Tschannen	a78a97f186	Merge pull request #1908 from etschannen/feature-better-dd A few data distribution improvements	2019-07-30 17:34:50 -07:00
sramamoorthy	63941e0d96	disable DD with a in-memory flag and use in snapv2	2019-07-30 17:04:51 -07:00
Evan Tschannen	5dd9043fd3	addressed review comments	2019-07-30 17:04:41 -07:00
Evan Tschannen	481642fbd4	Merge branch 'master' into feature-better-dd	2019-07-30 16:56:27 -07:00
A.J. Beamon	41605735f5	Merge pull request #1916 from ajbeamon/merge-onto-new-servers Add knob to control whether merges request new servers or not.	2019-07-30 15:04:37 -07:00
A.J. Beamon	14648e20f9	Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate Send bytes input rate to data distribution	2019-07-30 15:01:36 -07:00
A.J. Beamon	bc536757df	Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space.	2019-07-29 15:47:34 -07:00
Evan Tschannen	6b5e683de5	The mountainChopper and valleyFiller only move larger than average shards, to avoid moving high bandwidth shards which are generally smaller.	2019-07-28 23:50:42 -07:00
Evan Tschannen	04dd293af0	Merge pull request #1874 from xumengpanda/mengxu/DD-code-read DataDistribution:Add comments to help understand the code	2019-07-26 13:30:44 -07:00
A.J. Beamon	b91795d288	Send bytes input rate to DD.	2019-07-25 16:27:32 -07:00
Meng Xu	e582219ec5	Remove unnecessary condition in DDQueue Resolve the review comment.	2019-07-22 17:00:37 -07:00
Meng Xu	b7478f5dd3	DD:Add comments to help understand code Add comments to explain the functionalities of some code.	2019-07-22 11:23:16 -07:00
Meng Xu	612a51fe00	Apply Clang format to PRIORITY_TEAM_REDUNDANT	2019-07-19 18:32:22 -07:00
Meng Xu	ea76451f15	Count PRIORITY_TEAM_REDUNDANT as count PRIORITY_TEAM_UNHEALTHY	2019-07-19 18:30:01 -07:00
Alex Miller	7a500cd37f	A giant translation of TaskFooPriority -> TaskPriority::Foo This is so that APIs that take priorities don't take ints, which are common and easy to accidentally pass the wrong thing.	2019-06-25 02:47:35 -07:00
A.J. Beamon	5f55f3f613	Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.	2019-05-10 14:01:52 -07:00
Evan Tschannen	2d5043c665	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-04-30 18:27:04 -07:00
Evan Tschannen	e0f7ec96aa	Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers	2019-04-22 17:29:46 -07:00
mpilman	d01cbf3455	Addressed code review comments	2019-04-05 13:12:20 -07:00
mpilman	1c16f87a4e	Remove trace-calls to printable (in non-workloads)	2019-04-05 13:12:19 -07:00
anoyes	981426bac9	More ide fixes	2019-03-05 18:03:57 -08:00
Evan Tschannen	d008de576e	Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR Add background actor to remove redundant teams	2019-02-22 14:22:07 -08:00
Meng Xu	9445ac0b0c	Status: Use new data distributor worker to publish status After we add a new data distributor role, we publish the data related to data distributor and rate keeper through the new role (and new worker). So the status needs to contact the data distributor, instead of master, to get the status information.	2019-02-21 18:05:50 -08:00
Meng Xu	7cca439e00	TeamRemover: Add status to show redundant team removing Distinguish the removal of unhealthy team and redundant team. Change status report to include redundant team removal report.	2019-02-21 14:16:46 -08:00
mpilman	27a3153719	Use ACTOR forward declarations in MoveKeys Also MoveKeys.h -> MoveKeys.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3a0f9839b9	Fix minor IDE build errors	2019-02-19 15:16:59 -08:00
Meng Xu	6d09ac483c	Merge with master	2019-02-15 17:03:40 -08:00
Jingyu Zhou	bf6da81bf9	Remove recovery version from data distribution queue This parameter is no longer used/needed.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	07dab56133	Fix a data movement stuck bug When moving keys to a team, if one of the server in the target team died, then the move can become stuck. This is because the DDTeamCollection waits for all the data movement of the failed server to be completed. However, in this case, because the movement has not finished yet, checking the database tells us there is no key assocated with this server and it is safe to go ahead. In reality, only the in-memory structure knows there is pending movement, i.e., unfinished move causes some keys to be attributed to the failed server. Thus, the server can't be removed yet. Fix by adding a check with in-memory structure in waitForAllDataRemoved(). Use const& to optimize a few function parameters.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	886e7ab2ba	Add a new DataDistributor role. Let cluster controller to start a new data distributor role by sending a message to a chosen worker. Change MasterInterface usage in DataDistribution to masterId Add DataDistributor rejoin handling. This allows the data distributor to tell the new cluster controller of its existence so that the controller doesn't spawn a new one. I.e., there should be only ONE data distributor in the cluster. If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries to recruit one as DD. CC also monitors DD and restarts one if it failed. The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for the new DD. Add GetRecoveryInfo RPC to master server, which is called by data distributor to obtain the recovery Transaction version from the master server.	2019-02-14 16:30:13 -08:00
Meng Xu	fe4f43203d	TeamCollection: getTeam may add a new team getTeam function may add a new team for the GetTeamRequest. We need to check if the number of teams is larger than the desired team number.	2019-02-12 14:57:35 -08:00
Meng Xu	8de031f9a6	TeamCollection: clang-format Format the changes with git clang-format. No functional changes. Signed-off-by: Meng Xu <meng_xu@apple.com>	2018-11-21 11:18:26 -08:00
Meng Xu	f7a7e069f0	TeamCollection: Remove unnecessary comments Pass 41806 tests with no failure Signed-off-by: Meng Xu <meng_xu@apple.com>	2018-11-16 15:56:35 -08:00
Meng Xu	73c58852f0	TeamCollection: Resolve code review comments Resolve code review comments: 1) Improve the code efficiency by avoiding unnecessary map search and avoiding unnecessary checking 2) Remove or comment out trace events when they can be spammy 3) Improve coding style Tested for 1 hour and no error was found. KillRegionCycle.txt test was excluded from the test because existing code cannot pass that test either Signed-off-by: Meng Xu <meng_xu@apple.com>	2018-11-16 15:55:33 -08:00
Meng Xu	5051b35c61	TeamCollection: Use machine team to create server team Current server team collection logic does not consider the fact that multipe storage servers can run on the same machine. When multiple machines fail, all servers on the machines will fail, and the possibility of having one process team fail and lose data is very high. To reduce the possibility of losing data when multiple machine fails, we first create machine teams which span across different fault zones; we then create server teams based on machine teams by first picking 1 machine team, and then picking 1 server from each machine in the machine team. Signed-off-by: Meng Xu <meng_xu@apple.com>	2018-11-16 15:53:22 -08:00
Evan Tschannen	4e54690005	Merge branch 'release-6.0' # Conflicts: # fdbserver/DataDistribution.actor.cpp # fdbserver/MoveKeys.actor.cpp	2018-11-12 20:26:58 -08:00
Evan Tschannen	cd188a351e	fix: if a destination team became unhealthy and then healthy again, it would lower the priority of a move even though the source servers we are moving from are still unhealthy fix: badTeams were not accounted for when checking priorities	2018-11-11 12:33:31 -08:00
Robert Escriva	268093a96d	Adjust all includes to be relative to the root. Remove the use of relative paths. A header at foo/bar.h could be included by files under foo/ with "bar.h", but would be included everywhere else as "foo/bar.h". Adjust so that every include references such a header with the latter form. Signed-off-by: Robert Escriva <rescriva@dropbox.com>	2018-10-19 17:35:33 +00:00
Evan Tschannen	90301f497f	Merge branch 'release-6.0' # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbrpc/FlowTransport.actor.cpp # fdbrpc/TLSConnection.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/Status.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/StatusWorkload.actor.cpp # versions.target	2018-09-05 16:06:33 -07:00
Evan Tschannen	d8659a5822	fix: bytesWritten would overflow and go negative	2018-08-31 12:46:57 -07:00
A.J. Beamon	2a97139d5d	This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.	2018-08-16 10:24:12 -07:00
Alex Miller	fb31a6999f	Rewrite all files to have #include actorcompiler.h as the last include.	2018-08-14 15:50:26 -07:00
Alex Miller	535b5701e5	Rewrite all `Void _ = wait(...)` -> `wait(...)`. This takes advantage of the new actorcompiler functionality to avoid having duplicate definitions of `Void _` when trying to feed the un-actorompiled source through clang.	2018-08-14 15:50:26 -07:00
A.J. Beamon	574c5576a2	Merge branch 'release-6.0' of github.com:apple/foundationdb # Conflicts: # fdbrpc/TLSConnection.actor.cpp # versions.target	2018-08-10 14:31:58 -07:00

1 2

79 Commits