Evan Tschannen
e45952bc53
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/BackupContainer.actor.cpp
# fdbclient/BlobStore.actor.cpp
# fdbclient/HTTP.actor.cpp
# tests/BlobStore.txt
# versions.target
2018-11-13 16:06:39 -08:00
Evan Tschannen
1bd615f954
fix: remoteDcIds will not actually have transaction logs unless usable regions is > 1
2018-11-13 12:36:04 -08:00
Evan Tschannen
4e54690005
Merge branch 'release-6.0'
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MoveKeys.actor.cpp
2018-11-12 20:26:58 -08:00
Evan Tschannen
3f3a562f75
updated resolution balancing knobs to be a little more aggressive
2018-11-12 19:11:28 -08:00
Evan Tschannen
239bf882d8
Merge branch 'release-6.0' into feature-resolution-balancing-fix
2018-11-12 18:43:20 -08:00
Evan Tschannen
3f461f3706
updated comments
2018-11-12 18:42:29 -08:00
Evan Tschannen
6353a6724b
strengthened the protections related to changing regions
2018-11-12 17:40:40 -08:00
Evan Tschannen
26c49f21be
fix: we do not know a region is fully replicated until all the initial storage servers have either been heard from or have been removed
2018-11-12 17:39:40 -08:00
Evan Tschannen
3f39024640
buggify resolution balancing so that it still happens in simulation
2018-11-12 00:03:07 -08:00
Evan Tschannen
536ee826da
tuned resolver balancing to keep the resolvers within 5MB per second of each other
2018-11-11 23:42:45 -08:00
Evan Tschannen
50f481b149
fix: peek local should not call peek all, because it is possible to still peek from remote log sets after a special tag
2018-11-11 19:16:25 -08:00
Evan Tschannen
7892da032f
fix: Do not remove the locality entry for the current transaction logs when removing storage servers
...
fix: dcId_locality map could be incorrect after restarting recruitEverything
2018-11-11 12:37:53 -08:00
Evan Tschannen
cd188a351e
fix: if a destination team became unhealthy and then healthy again, it would lower the priority of a move even though the source servers we are moving from are still unhealthy
...
fix: badTeams were not accounted for when checking priorities
2018-11-11 12:33:31 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
a654183f63
Merge pull request #791 from ajbeamon/remove-cluster-from-iclientapi
...
Remove cluster from IClientApi (phase 2 of removing DB names)
2018-11-10 10:16:18 -08:00
Evan Tschannen
6a406bae72
Merge pull request #896 from ajbeamon/downgrade-incorrect-cluster-file-event
...
Downgrade the severity of IncorrectClusterFileContents the first time…
2018-11-10 10:06:36 -08:00
Evan Tschannen
6f4ad84777
Merge pull request #903 from ajbeamon/move-batcher-into-proxy
...
Move the sort of generic batcher from fdbrpc and make it specific to …
2018-11-10 09:56:03 -08:00
Evan Tschannen
7c23b68501
fix: we need to build teams if a server becomes healthy and it is not already on any teams
2018-11-09 18:06:00 -08:00
A.J. Beamon
c3a06aa6f1
Fix indentation
2018-11-09 14:25:40 -08:00
A.J. Beamon
67a152ae9f
Move the sort of generic batcher from fdbrpc and make it specific to batching commits in master proxy. Also a couple minor formatting changes.
2018-11-09 14:19:18 -08:00
Evan Tschannen
3e2484baf7
fix: a team tracker could downgrade the priority of a relocation issued by the team tracker for the other region
2018-11-09 10:07:55 -08:00
Evan Tschannen
6874e379fc
fix: set the simulator’s view of usable regions to one during configure tests which can disable usable regions
2018-11-09 10:06:03 -08:00
Evan Tschannen
19ae063b66
fix: storage servers need to be rebooted when increasing replication so that clients become aware that new options are available
2018-11-08 15:44:03 -08:00
Evan Tschannen
1cf5689d62
fix: workers could only create a shared transaction log for one store type. This resulted in the old store type being used for new transaction logs after configuration changes which changed the store type
2018-11-07 21:09:51 -08:00
Evan Tschannen
599cc6260e
fix: data distribution who not always add all subsets of emergency teams
...
fix: data distribution would not stop tracking bad teams after all their data was moved to other teams
fix: data distribution did not probably handle a server changing locality such that the teams it used to be on no longer satisfy the policy
2018-11-07 21:05:31 -08:00
Stephen Atherton
ade75ac692
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/worker.actor.cpp
2018-11-07 11:43:54 -08:00
Stephen Atherton
9d73166b3b
Many bug fixes related to concurrent page operations and pager shutdown.
2018-11-06 19:31:16 -08:00
Evan Tschannen
6bb283aebc
fix: dcId to Locality changes could be lost if an emergency transaction happened that did not change the configuration
...
fix: master proxy was starting dcId’s at 1 number too large
2018-11-05 11:12:43 -08:00
Evan Tschannen
04fa2a7202
fix: we could recover in a region with priority < 0
2018-11-05 10:14:26 -08:00
A.J. Beamon
187e507e53
Downgrade the severity of IncorrectClusterFileContents the first time it is logged to avoid transient issues that appear like the cluster file hasn't been updated (e.g. the cluster file is shared between multiple processes).
2018-11-05 09:28:08 -08:00
Evan Tschannen
87295cc263
suppressed spammy trace events, and avoid reporting a long master recovery duration when the cluster is first created
2018-11-04 23:07:56 -08:00
Evan Tschannen
87d0b4c294
fix: the remote region does not have a full replica is usable_regions==1
2018-11-04 22:05:37 -08:00
Evan Tschannen
c1bd279a4e
addressed review comments
2018-11-04 20:26:23 -08:00
Evan Tschannen
bd60027544
test region priority changes
2018-11-04 20:11:23 -08:00
Evan Tschannen
c02690471d
added protection against configuration changes which cannot be immediately reverted
...
the configure database workload tests region configurations
2018-11-04 19:53:55 -08:00
Evan Tschannen
3304c83229
added additional checks in peek which determine when a tag will never get additional versions
2018-11-04 19:28:15 -08:00
Evan Tschannen
accba4fa1d
keep track of the last time a process became available to set a better starting value for remoteStartTime
2018-11-04 14:33:03 -08:00
Evan Tschannen
45c8f2dfcb
restarting tests will sometimes configure to a fearless configuration on startup if possible
2018-11-02 14:16:47 -07:00
Evan Tschannen
2a8c628d82
fix: even if a peek cursor cannot find a local set for the most recent data, it still may be able to find data from older log sets
2018-11-02 14:13:57 -07:00
Evan Tschannen
f045c041eb
fix: if a storage server already exists in a remote region after converting to fearless, it did not receive mutations between the known committed version and the recovery version
2018-11-02 14:11:39 -07:00
Evan Tschannen
bf6545a9cf
clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations
...
allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary
reduced unnecessary state variables in waitMetrics requests
2018-11-02 13:15:09 -07:00
Evan Tschannen
3b97f5a899
fix: the storage server still has to pop old tags, even if it does not need any data from them
2018-11-02 13:10:14 -07:00
Evan Tschannen
979597a2ca
fix: upgraded tags must be popped from all log sets
2018-11-02 13:09:18 -07:00
Evan Tschannen
1b5d28386a
fix: the Tlog would not update the durable version properly when version_sizes was empty
2018-11-02 13:05:54 -07:00
Evan Tschannen
2d9a670774
fix: nested multCursors would improperly hang on getMore, because an inner pop of cursors would not be detected by the outer instance
2018-11-02 13:04:09 -07:00
Evan Tschannen
e68c07ae35
fix: trackShardBytes was called with the incorrect range, resulting in incorrect shard sizes
...
reduced the size of shard tracker actors by removing unnecessary state variable. Because we have a large number of these actors these extra state variables add up to a lot of memory
2018-11-02 13:03:01 -07:00
Evan Tschannen
ad98acf795
fix: if the team started unhealthy and initialFailureReactionDelay was ready, we would not send relocations to the queue
...
print wrong shard size team messages in simulation
2018-11-02 13:00:15 -07:00
Evan Tschannen
1d591acd0a
removed the countHealthyTeams check, because it was incorrect if it triggered during the wait(yield()) at the top of team tracker
2018-11-02 12:58:16 -07:00
Evan Tschannen
30fbc29af1
Renamed TimeKeeperStarted to TimeKeeperCommit
2018-11-02 12:57:03 -07:00
Evan Tschannen
278dbd5096
call debug transaction on timekeeper
2018-11-02 12:56:29 -07:00