Evan Tschannen
aed2d34bcb
Merge branch 'master' into feature-proxy-load-balance
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# flow/Knobs.cpp
2020-05-01 09:19:39 -07:00
Evan Tschannen
b7f5f3be48
merge in master
2020-04-28 13:11:47 -07:00
Evan Tschannen
0c84ad4bc6
Merge pull request #2917 from bnamasivayam/fail-slow-ss
...
Mark the storage servers that are continually lagging as unhealthy
2020-04-22 23:18:35 -07:00
Balachandar Namasivayam
d5bef6fc32
Update fdbserver/DataDistribution.actor.cpp
2020-04-22 09:45:56 -07:00
Evan Tschannen
ba3e2af473
Merge commit '5288033bcfe40c3ade97c8bf2d04cf31b3f16cb1' into feature-tree-broadcast
2020-04-17 15:17:37 -07:00
Evan Tschannen
33efb9ec97
code cleanup based on review comments
2020-04-17 15:05:01 -07:00
Alex Miller
1439de37b5
Convert GetRangeLimits() -> TOO_MANY + ASSERT().
2020-04-12 18:23:14 -07:00
Evan Tschannen
ce4493f679
many bug fixes
2020-04-10 13:45:16 -07:00
Balachandar Namasivayam
6916434f7d
Addressed review comments
2020-04-08 10:48:32 -07:00
Balachandar Namasivayam
69ef8a127b
Add a backstop mechanism to stop failing too many storage servers when they fall behind.
2020-04-06 23:37:11 -07:00
Alex Miller
6078fd1b18
Convert UID to Tag in keyServers to reduce txnStateStore size
2020-04-05 14:30:09 -07:00
Balachandar Namasivayam
73272fc72e
Version difference is now the diff between TLog versions and SS version.
2020-04-03 19:04:43 -07:00
Balachandar Namasivayam
a70bfcc3c8
Remove unnecessary comment.
2020-03-31 18:33:12 -07:00
Balachandar Namasivayam
b1c3893d40
Fix some corner case bugs exposed by simulation.
...
In one case, when a SS joins the cluster and DD doesn't find any healthy server to form a team with the newly added server, then the SS does not get added to any team even when the other servers get healthy.
Another is an extreme case where a data center is down, and a SS in the active DC joins and then dies immediately but not before DD adds it to a destination team for a relocating shard which will result in DD waiting indefinitely for the dead data center to come back up for the cluster to be fully recovered.
2020-03-31 18:33:12 -07:00
Balachandar Namasivayam
ad1dd4fd9b
Mark the storage servers that are continually lagging as unhealthy and so this will give the Data Distributor the chance to move data out of this server.
2020-03-31 18:25:39 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
Evan Tschannen
7adc916e18
Merge pull request #2806 from ajbeamon/improve-team-request-performance
...
Improve performance of get team requests.
2020-03-16 11:56:45 -07:00
A.J. Beamon
700b13e5f8
Remember the best team from team requests, which will likely be the best again and can save us some computation.
2020-03-13 15:21:33 -07:00
Evan Tschannen
12f2b32770
added additional logging in data distribution
2020-03-13 15:19:33 -07:00
Evan Tschannen
9e99a00c8f
fix: do not use priority 0 left when calculating priorities for empty teams
2020-03-13 13:56:46 -07:00
A.J. Beamon
555db50cd1
Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist.
2020-03-12 11:22:03 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1128666840
added additional logging on the log router
2020-03-05 18:17:06 -08:00
Evan Tschannen
e219c1671f
Merge branch 'release-6.2' into feature-dd-region-queue
...
# Conflicts:
# fdbserver/Knobs.h
2020-03-04 16:25:38 -08:00
Evan Tschannen
125bd13198
fix: in multi-region configurations, the data distribution queue could start too much work, expecting that the remote region would contribute to the read workload
2020-03-04 14:17:17 -08:00
Evan Tschannen
6296465e07
Make the DD priority associated with populating a remote region lower than machine failures
2020-03-04 14:07:32 -08:00
Evan Tschannen
924d335aa7
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# flow/Knobs.cpp
# flow/Knobs.h
2020-02-25 18:25:19 -08:00
Evan Tschannen
c05c95cbe8
forgot to rename the knob
2020-02-25 15:47:39 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
aa4d1357b3
handle the case that there is only one healthy team
2020-02-21 15:41:01 -08:00
Evan Tschannen
457dbc5215
Update fdbserver/DataDistribution.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-21 15:39:17 -08:00
Evan Tschannen
6a634652c4
Update fdbserver/DataDistribution.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-21 15:39:06 -08:00
Evan Tschannen
08914a2acd
Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team
2020-02-21 15:14:32 -08:00
Evan Tschannen
819c55556c
More aggressively attempt to find teams that do not have low disk space
2020-02-20 16:47:50 -08:00
A.J. Beamon
e1fb568fd1
Merge branch 'release-6.2' into dd-use-available-space
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d
Combine some logic that was doing similar computations for free space ratio.
2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253
Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them.
2020-02-20 11:21:03 -08:00
A.J. Beamon
3a1ba5a077
Rename variable for clarity
2020-02-20 10:59:52 -08:00
A.J. Beamon
c164acb88d
Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck.
2020-02-20 09:32:00 -08:00
mpilman
5a9d420cb7
Merge remote-tracking branch 'upstream/release-6.2' into release-merges/20200210
2020-02-10 10:02:05 -08:00
A.J. Beamon
b8a252da40
Clarify the names of a couple trace fields
2020-02-10 08:15:00 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
9b80498180
Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth
2020-01-10 14:58:38 -08:00
Evan Tschannen
c2608f0af9
fix: completeSources could be larger than the teamSize, so we need to check all completeSources
...
we do not need to track bestSize, since all teams in the list will be the same size
2020-01-10 14:46:40 -08:00
Evan Tschannen
ab7071932f
Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly
2020-01-09 16:59:37 -08:00
Evan Tschannen
83ad9caf54
implemented a load balancing algorithm which evens out the number of requests processes by each proxy
2020-01-08 01:59:01 -08:00
Evan Tschannen
59738e8ef1
fixed compiler error
2019-11-22 16:19:34 -08:00
Evan Tschannen
3c769fcf60
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2019-11-22 15:39:19 -08:00
Evan Tschannen
3a3ab5664b
fix: team trackers for bad teams that contain a removed servers must be cancelled or the cluster will falsely report those teams as failed
2019-11-22 10:20:13 -08:00
Andrew Noyes
d4de608bb6
Fix OPEN_FOR_IDE build
2019-10-25 10:42:22 -07:00