Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
A.J. Beamon
555db50cd1
Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist.
2020-03-12 11:22:03 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
125bd13198
fix: in multi-region configurations, the data distribution queue could start too much work, expecting that the remote region would contribute to the read workload
2020-03-04 14:17:17 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
08914a2acd
Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team
2020-02-21 15:14:32 -08:00
A.J. Beamon
e1fb568fd1
Merge branch 'release-6.2' into dd-use-available-space
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d
Combine some logic that was doing similar computations for free space ratio.
2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253
Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them.
2020-02-20 11:21:03 -08:00
A.J. Beamon
3a1ba5a077
Rename variable for clarity
2020-02-20 10:59:52 -08:00
A.J. Beamon
c164acb88d
Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck.
2020-02-20 09:32:00 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
ab7071932f
Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly
2020-01-09 16:59:37 -08:00
Jon Fu
d2b6626d5c
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-21 13:47:06 -07:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
Jon Fu
d146f1d636
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-07 11:27:15 -07:00
Jon Fu
450a09e117
Code Review Changes
2019-09-24 15:48:50 -07:00
Meng Xu
3c4dc1003d
DD:clang-format the PR
2019-09-11 11:16:29 -07:00
Meng Xu
0b785e5c1c
DD:getTeam may fail to get a team when it can
...
Due to randomness, when unhealthy teams are majority while there still
exists healthy teams, getTeam function may be unlucky to find
any feasible (ok) team, which leads to BestTeamStuck situation.
This commit increases the tries from 10 to 20.
A long-term solution may first find all feasible teams and choose a random
one from them. Since This can affect the statistics of which team is picked.
So it is not included in this commit.
Non-functional change: This commit removes unneeded printf introduced by
fast restore PR 1404.
2019-09-07 20:08:58 -07:00
Jon Fu
66bba51988
Implemented direct removal of failed storage server from system keyspace
2019-08-27 14:39:43 -07:00
Xin Dong
4ecfc9830f
Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
...
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)
Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
A.J. Beamon
14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
...
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
A.J. Beamon
b91795d288
Send bytes input rate to DD.
2019-07-25 16:27:32 -07:00
Meng Xu
378db79441
Resolve conflict when merge with master
2019-07-22 10:56:20 -07:00
Meng Xu
f243e77afc
Increase merge and split shard priority by 100
...
PRIORITY_TEAM_REDUNDANT should be in a different priority band from
PRIORITY_MERGE_SHARD and PRIORITY_SPLIT_SHARD, because
priority inversion happens within priorities in the same band.
2019-07-19 13:55:38 -07:00
Meng Xu
6df93173ca
TC:Lower priority for removing redundant teams
2019-07-16 18:02:36 -07:00
Meng Xu
c7a996267c
TeamRemover: Remove unused declaration
...
Also change state variable to variable.
2019-07-05 16:54:06 -07:00
Andrew Noyes
d7612a4426
Fix OPEN_FOR_IDE build errors
2019-04-05 16:30:42 -07:00
Jingyu Zhou
c0b58080ee
Fix type name warning for DDTeamCollection
...
Seen using 'class' now seen using 'struct' in DataDistribution.actor.cpp
2019-03-26 14:18:25 -07:00
Jingyu Zhou
cdfe906c30
Data distributor pulls batch limited info from proxy
...
Add a flag in HealthMetrics to indicate that batch priority is rate limited.
Data distributor pulls this flag from proxy to know roughly when rate limiting
happens.
DD uses this information to determine when to do the rebalance in the background,
i.e., moving data from heavily loaded servers to lighter ones. If the cluster is
currently rate limited for batch commits, then the rebalance will use longer
time intervals, otherwise use shorter intervals. See BgDDMountainChopper() and
BgDDValleyFiller() in DataDistributionQueue.actor.cpp.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
b2ee41ba33
Remove lastLimited from data distribution
...
Fix a serialization bug in ServerDBInfo, which causes test failures.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
3c86643822
Separate Ratekeeper from data distribution.
...
Add a new role for ratekeeper.
Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
anoyes
981426bac9
More ide fixes
2019-03-05 18:03:57 -08:00