Commit Graph

1158 Commits

Author SHA1 Message Date
Evan Tschannen e45952bc53 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	tests/BlobStore.txt
#	versions.target
2018-11-13 16:06:39 -08:00
Evan Tschannen 1bd615f954 fix: remoteDcIds will not actually have transaction logs unless usable regions is > 1 2018-11-13 12:36:04 -08:00
Evan Tschannen 4e54690005 Merge branch 'release-6.0'
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MoveKeys.actor.cpp
2018-11-12 20:26:58 -08:00
Evan Tschannen 3f3a562f75 updated resolution balancing knobs to be a little more aggressive 2018-11-12 19:11:28 -08:00
Evan Tschannen 239bf882d8 Merge branch 'release-6.0' into feature-resolution-balancing-fix 2018-11-12 18:43:20 -08:00
Evan Tschannen 3f461f3706 updated comments 2018-11-12 18:42:29 -08:00
Evan Tschannen 6353a6724b strengthened the protections related to changing regions 2018-11-12 17:40:40 -08:00
Evan Tschannen 26c49f21be fix: we do not know a region is fully replicated until all the initial storage servers have either been heard from or have been removed 2018-11-12 17:39:40 -08:00
Evan Tschannen 3f39024640 buggify resolution balancing so that it still happens in simulation 2018-11-12 00:03:07 -08:00
Evan Tschannen 536ee826da tuned resolver balancing to keep the resolvers within 5MB per second of each other 2018-11-11 23:42:45 -08:00
Evan Tschannen 50f481b149 fix: peek local should not call peek all, because it is possible to still peek from remote log sets after a special tag 2018-11-11 19:16:25 -08:00
Evan Tschannen 7892da032f fix: Do not remove the locality entry for the current transaction logs when removing storage servers
fix: dcId_locality map could be incorrect after restarting recruitEverything
2018-11-11 12:37:53 -08:00
Evan Tschannen cd188a351e fix: if a destination team became unhealthy and then healthy again, it would lower the priority of a move even though the source servers we are moving from are still unhealthy
fix: badTeams were not accounted for when checking priorities
2018-11-11 12:33:31 -08:00
Evan Tschannen 4b5d0b4e2c Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/AsyncFileBlobStore.actor.cpp
#	fdbclient/AsyncFileBlobStore.actor.h
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/BlobStore.h
#	fdbclient/HTTP.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/batcher.actor.h
#	fdbrpc/fdbrpc.vcxproj
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen a654183f63
Merge pull request #791 from ajbeamon/remove-cluster-from-iclientapi
Remove cluster from IClientApi (phase 2 of removing DB names)
2018-11-10 10:16:18 -08:00
Evan Tschannen 6a406bae72
Merge pull request #896 from ajbeamon/downgrade-incorrect-cluster-file-event
Downgrade the severity of IncorrectClusterFileContents the first time…
2018-11-10 10:06:36 -08:00
Evan Tschannen 6f4ad84777
Merge pull request #903 from ajbeamon/move-batcher-into-proxy
Move the sort of generic batcher from fdbrpc and make it specific to …
2018-11-10 09:56:03 -08:00
Evan Tschannen 7c23b68501 fix: we need to build teams if a server becomes healthy and it is not already on any teams 2018-11-09 18:06:00 -08:00
A.J. Beamon c3a06aa6f1 Fix indentation 2018-11-09 14:25:40 -08:00
A.J. Beamon 67a152ae9f Move the sort of generic batcher from fdbrpc and make it specific to batching commits in master proxy. Also a couple minor formatting changes. 2018-11-09 14:19:18 -08:00
Evan Tschannen 3e2484baf7 fix: a team tracker could downgrade the priority of a relocation issued by the team tracker for the other region 2018-11-09 10:07:55 -08:00
Evan Tschannen 6874e379fc fix: set the simulator’s view of usable regions to one during configure tests which can disable usable regions 2018-11-09 10:06:03 -08:00
Evan Tschannen 19ae063b66 fix: storage servers need to be rebooted when increasing replication so that clients become aware that new options are available 2018-11-08 15:44:03 -08:00
Evan Tschannen 1cf5689d62 fix: workers could only create a shared transaction log for one store type. This resulted in the old store type being used for new transaction logs after configuration changes which changed the store type 2018-11-07 21:09:51 -08:00
Evan Tschannen 599cc6260e fix: data distribution who not always add all subsets of emergency teams
fix: data distribution would not stop tracking bad teams after all their data was moved to other teams
fix: data distribution did not probably handle a server changing locality such that the teams it used to be on no longer satisfy the policy
2018-11-07 21:05:31 -08:00
Stephen Atherton ade75ac692 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/worker.actor.cpp
2018-11-07 11:43:54 -08:00
Stephen Atherton 9d73166b3b Many bug fixes related to concurrent page operations and pager shutdown. 2018-11-06 19:31:16 -08:00
Evan Tschannen 6bb283aebc fix: dcId to Locality changes could be lost if an emergency transaction happened that did not change the configuration
fix: master proxy was starting dcId’s at 1 number too large
2018-11-05 11:12:43 -08:00
Evan Tschannen 04fa2a7202 fix: we could recover in a region with priority < 0 2018-11-05 10:14:26 -08:00
A.J. Beamon 187e507e53 Downgrade the severity of IncorrectClusterFileContents the first time it is logged to avoid transient issues that appear like the cluster file hasn't been updated (e.g. the cluster file is shared between multiple processes). 2018-11-05 09:28:08 -08:00
Evan Tschannen 87295cc263 suppressed spammy trace events, and avoid reporting a long master recovery duration when the cluster is first created 2018-11-04 23:07:56 -08:00
Evan Tschannen 87d0b4c294 fix: the remote region does not have a full replica is usable_regions==1 2018-11-04 22:05:37 -08:00
Evan Tschannen c1bd279a4e addressed review comments 2018-11-04 20:26:23 -08:00
Evan Tschannen bd60027544 test region priority changes 2018-11-04 20:11:23 -08:00
Evan Tschannen c02690471d added protection against configuration changes which cannot be immediately reverted
the configure database workload tests region configurations
2018-11-04 19:53:55 -08:00
Evan Tschannen 3304c83229 added additional checks in peek which determine when a tag will never get additional versions 2018-11-04 19:28:15 -08:00
Evan Tschannen accba4fa1d keep track of the last time a process became available to set a better starting value for remoteStartTime 2018-11-04 14:33:03 -08:00
Evan Tschannen 45c8f2dfcb restarting tests will sometimes configure to a fearless configuration on startup if possible 2018-11-02 14:16:47 -07:00
Evan Tschannen 2a8c628d82 fix: even if a peek cursor cannot find a local set for the most recent data, it still may be able to find data from older log sets 2018-11-02 14:13:57 -07:00
Evan Tschannen f045c041eb fix: if a storage server already exists in a remote region after converting to fearless, it did not receive mutations between the known committed version and the recovery version 2018-11-02 14:11:39 -07:00
Evan Tschannen bf6545a9cf clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations
allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary
reduced unnecessary state variables in waitMetrics requests
2018-11-02 13:15:09 -07:00
Evan Tschannen 3b97f5a899 fix: the storage server still has to pop old tags, even if it does not need any data from them 2018-11-02 13:10:14 -07:00
Evan Tschannen 979597a2ca fix: upgraded tags must be popped from all log sets 2018-11-02 13:09:18 -07:00
Evan Tschannen 1b5d28386a fix: the Tlog would not update the durable version properly when version_sizes was empty 2018-11-02 13:05:54 -07:00
Evan Tschannen 2d9a670774 fix: nested multCursors would improperly hang on getMore, because an inner pop of cursors would not be detected by the outer instance 2018-11-02 13:04:09 -07:00
Evan Tschannen e68c07ae35 fix: trackShardBytes was called with the incorrect range, resulting in incorrect shard sizes
reduced the size of shard tracker actors by removing unnecessary state variable. Because we have a large number of these actors these extra state variables add up to a lot of memory
2018-11-02 13:03:01 -07:00
Evan Tschannen ad98acf795 fix: if the team started unhealthy and initialFailureReactionDelay was ready, we would not send relocations to the queue
print wrong shard size team messages in simulation
2018-11-02 13:00:15 -07:00
Evan Tschannen 1d591acd0a removed the countHealthyTeams check, because it was incorrect if it triggered during the wait(yield()) at the top of team tracker 2018-11-02 12:58:16 -07:00
Evan Tschannen 30fbc29af1 Renamed TimeKeeperStarted to TimeKeeperCommit 2018-11-02 12:57:03 -07:00
Evan Tschannen 278dbd5096 call debug transaction on timekeeper 2018-11-02 12:56:29 -07:00