foundationdb

Commit Graph

Author	SHA1	Message	Date
Jingyu Zhou	de8d953865	Add backup role, class, and worker skeleton	2020-01-22 19:35:30 -08:00
negoyal	a4a0bf18f9	Merging with Master.	2019-11-12 13:01:29 -08:00
Meng Xu	d160810662	FastRestore:Resolve review comments	2019-09-04 16:48:43 -07:00
Meng Xu	3b54363780	FastRestore:Apply Clang-format	2019-08-01 18:09:12 -07:00
Meng Xu	45083edf74	Merge branch 'master' into mengxu/performant-restore-PR Fix conflicts as well.	2019-07-25 10:46:11 -07:00
mpilman	68ce9a5e75	ProtocolVersion type - second try	2019-06-18 17:55:27 -07:00
mpilman	6afce01744	Implementation complete (not yet working)	2019-05-13 14:15:22 -07:00
mpilman	ba83c458a6	types implemented	2019-05-13 14:15:22 -07:00
Meng Xu	70d7c289f4	Merge branch 'master' into mengxu/restore/parallel-v7	2019-03-30 22:13:10 -07:00
A.J. Beamon	71e2fdafb8	Changes to ratekeeper camel case	2019-03-27 08:24:25 -07:00
Jingyu Zhou	3c86643822	Separate Ratekeeper from data distribution. Add a new role for ratekeeper. Remove StorageServerChanges from data distribution. Ratekeeper monitors storage servers, which borrows the idea from DataDistribution.	2019-03-07 13:16:20 -08:00
Trevor Clinkenbeard	1bb384db4d	Merge branch 'master' of https://github.com/apple/foundationdb into add-no-assign-class	2019-02-20 13:13:12 -08:00
Jingyu Zhou	886e7ab2ba	Add a new DataDistributor role. Let cluster controller to start a new data distributor role by sending a message to a chosen worker. Change MasterInterface usage in DataDistribution to masterId Add DataDistributor rejoin handling. This allows the data distributor to tell the new cluster controller of its existence so that the controller doesn't spawn a new one. I.e., there should be only ONE data distributor in the cluster. If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries to recruit one as DD. CC also monitors DD and restarts one if it failed. The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for the new DD. Add GetRecoveryInfo RPC to master server, which is called by data distributor to obtain the recovery Transaction version from the master server.	2019-02-14 16:30:13 -08:00
Meng Xu	550f2e2682	Merge with master to use the latest backup container In fdb 6.0.15, backup container is changed on how to organize the backup data. The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15. Merge with master so that the fast restore uses fdb > 6.0.15	2019-01-30 12:05:15 -08:00
Meng Xu	76e1ba2934	add blob_credential_file option	2019-01-29 16:00:52 -08:00
Trevor Clinkenbeard	2e0b3a7f1d	Added ProcessClass::CoordinatorClass, which can be used by coordinators, so that coordinators do not have to take on other roles if desired	2019-01-25 11:03:13 -08:00
Balachandar Namasivayam	a8e2e75cd5	Re-enable CheckDesiredClasses after making necessary changes for multi-region setup. Fixed a couple of bugs 1) A rare race condition where a worker is being roles even after it died. 2) Fix how RoleFitness is calculated for TLog and LogRouter. Only worst fitness is compared to see if a better fit is available.	2019-01-10 10:28:32 -08:00
anoyes	6a4d87802b	Replace & operator with variadic function	2018-12-28 11:33:42 -08:00
Meng Xu	8de031f9a6	TeamCollection: clang-format Format the changes with git clang-format. No functional changes. Signed-off-by: Meng Xu <meng_xu@apple.com>	2018-11-21 11:18:26 -08:00
Meng Xu	5051b35c61	TeamCollection: Use machine team to create server team Current server team collection logic does not consider the fact that multipe storage servers can run on the same machine. When multiple machines fail, all servers on the machines will fail, and the possibility of having one process team fail and lose data is very high. To reduce the possibility of losing data when multiple machine fails, we first create machine teams which span across different fault zones; we then create server teams based on machine teams by first picking 1 machine team, and then picking 1 server from each machine in the machine team. Signed-off-by: Meng Xu <meng_xu@apple.com>	2018-11-16 15:53:22 -08:00
Evan Tschannen	bf6545a9cf	clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary reduced unnecessary state variables in waitMetrics requests	2018-11-02 13:15:09 -07:00
Evan Tschannen	5fc8199abc	Swapped OkayFit and UnsetFit, because generally if machine classes are set on one machine they are set everywhere and it helps with wait_for_good_recruitment logic wait_for_good_recruitment now requires that you have the desired count of each roll remote recruitment is given a much longer wait_for_good_recruitment time interval, which does not start until enough remote machines have registered	2018-06-22 10:15:24 -07:00
Alex Miller	6c2cb25c53	Rename BestOtherFit -> OkayFit. The previous order of fitness was BestFit > GoodFit > BestOtherFit > ... which is baffling. It's now: BestFit > GoodFit > OkayFit > ... which won't break anyone's expectations.	2018-06-12 16:50:25 -07:00
Evan Tschannen	37a6a81634	Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs # Conflicts: # fdbserver/workloads/RestartRecovery.actor.cpp	2018-02-23 12:33:28 -08:00
Alec Grieser	0bae9880f1	remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py	2018-02-21 10:25:11 -08:00
Evan Tschannen	c7b3be5b19	re-enabled better master exists the cluster controller can choose a better data center for itself and let the workers know where the next cluster controller should be recruited	2018-02-09 16:48:55 -08:00
Evan Tschannen	5ac4f73978	Merge branch 'release-5.1' into feature-remote-logs # Conflicts: # fdbclient/NativeAPI.actor.cpp # fdbrpc/Locality.h # fdbrpc/simulator.h # fdbserver/ApplyMetadataMutation.h # fdbserver/ClusterController.actor.cpp # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/masterserver.actor.cpp # flow/Net2.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-01-05 11:33:42 -08:00
Evan Tschannen	57aba0b3bc	fix: excluded servers were the same fitness as storage servers for the master role fix: better master exists did not considers exclusion for master fitness	2017-11-03 17:09:14 -07:00
Yichi Chiang	76c5488421	Add cluster controller process class	2017-10-16 16:21:25 -07:00
Evan Tschannen	15962cf079	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbrpc/Locality.cpp # fdbrpc/Locality.h # fdbserver/ClusterController.actor.cpp # fdbserver/ClusterRecruitmentInterface.h # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/fdbserver.vcxproj.filters # fdbserver/masterserver.actor.cpp # fdbserver/worker.actor.cpp # flow/error_definitions.h	2017-10-05 17:09:44 -07:00
Evan Tschannen	ea26bc1c43	passed first tests which kill entire datacenters added configuration options for the remote data center and satellite data centers updated cluster controller recruitment logic refactors how master writes core state updated log recovery, and log system peeking	2017-09-07 15:32:08 -07:00
Yichi Chiang	9fe927127f	choose leader on the perferred process class	2017-08-28 14:41:04 -07:00
FDB Dev Team	a674cb4ef4	Initial repository commit	2017-05-25 13:48:44 -07:00

33 Commits