Commit Graph

2964 Commits

Author SHA1 Message Date
Evan Tschannen 04052226df reverting a change which causes data inconsistency between the primary and secondary 2020-03-17 09:41:44 -07:00
Evan Tschannen ed4d02a3e4
Merge pull request #2812 from etschannen/feature-proxy-mem-limit
Limit the amount of requests the proxy can queue up in memory
2020-03-16 14:56:56 -07:00
Evan Tschannen 2038a56ff4
Merge pull request #2819 from etschannen/feature-first-proxy
A "proxy" class process would not be preferred as the "first proxy" for restore and DR purposes
2020-03-16 13:53:28 -07:00
A.J. Beamon ee3cde0b0d
Merge pull request #2815 from etschannen/feature-timeout-tlog-create
Treat a tlog which takes a long time to create its disk queue as failed
2020-03-16 12:49:33 -07:00
Evan Tschannen a068d4063f renamed ProxyGetConsistentReadVersion 2020-03-16 12:11:32 -07:00
Evan Tschannen 7adc916e18
Merge pull request #2806 from ajbeamon/improve-team-request-performance
Improve performance of get team requests.
2020-03-16 11:56:45 -07:00
A.J. Beamon fe19f30999
Merge pull request #2813 from etschannen/feature-satellite-usable-regions
do not recruit satellite tlogs when usable regions=1
2020-03-16 11:54:42 -07:00
Evan Tschannen 012344e297 refactor getWorkersForRoleInDatacenter 2020-03-16 11:50:17 -07:00
A.J. Beamon f2defc3a3a
Merge pull request #2814 from etschannen/feature-delay-recovery
Prevent coordinated state from filling up with too many old generations
2020-03-16 11:45:17 -07:00
Evan Tschannen ea98c7a40a added additional timeout on initPersistentState 2020-03-16 11:38:14 -07:00
A.J. Beamon 682b9faa1a
Merge pull request #2817 from etschannen/feature-fix-0-left
fix: do not use priority 0 left when calculating priorities for empty teams
2020-03-16 11:15:12 -07:00
Evan Tschannen 56dee89e6e active generations should include the current one 2020-03-16 11:09:42 -07:00
Evan Tschannen e5d53c863b report in status the number of active generations 2020-03-16 10:29:17 -07:00
Evan Tschannen 818537ed2d
Update fdbserver/masterserver.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-03-14 15:04:46 -07:00
Evan Tschannen 0ca89547a5 make sure the number of logRouterTags is larger than the number of satelliteTLogs to avoid having satellites with no data. 2020-03-14 15:02:19 -07:00
Evan Tschannen 04b752b40a Added additional logging related to memory errors (including in status) 2020-03-13 18:31:22 -07:00
Evan Tschannen a71e61f57b fixed compiler issue 2020-03-13 18:22:38 -07:00
Evan Tschannen ebbf4490b3 use a Deque for each priority instead of a priority queue to improve CPU with large numbers of outstanding requests 2020-03-13 18:07:48 -07:00
Evan Tschannen 79d5511149 A "proxy" class process would not be preferred as the "first proxy" for restore and DR purposes 2020-03-13 17:49:02 -07:00
Evan Tschannen 2f2f56020f
Update fdbserver/masterserver.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-03-13 15:54:13 -07:00
A.J. Beamon 700b13e5f8 Remember the best team from team requests, which will likely be the best again and can save us some computation. 2020-03-13 15:21:33 -07:00
Evan Tschannen 12f2b32770 added additional logging in data distribution 2020-03-13 15:19:33 -07:00
Evan Tschannen 9e99a00c8f fix: do not use priority 0 left when calculating priorities for empty teams 2020-03-13 13:56:46 -07:00
Evan Tschannen d6d347f665 treat a tlog which takes a long time to create its disk queue as failed 2020-03-13 10:31:59 -07:00
Evan Tschannen a39effa57d delay recoveries after 70 outstanding generations, and stop recoveries after 100 outstanding generations to prevent a death spiral from filling up the coordinated state 2020-03-13 10:28:32 -07:00
Evan Tschannen 4640edf5d6 do not recruit satellite tlogs when usable regions=1 2020-03-13 10:24:52 -07:00
Evan Tschannen 243c268d9d Limit the amount of requests the proxy can queue up in memory 2020-03-13 10:17:49 -07:00
Alex Miller d86a601b84 Add cluster.processes.id.network.tls_policy.hz to status.
This allows monitoring of TLS policy failures, but one has to go scrape
for TLSPolicyFailure trace events to figure out why they're happening.
2020-03-13 02:46:10 -07:00
A.J. Beamon 555db50cd1 Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist. 2020-03-12 11:22:03 -07:00
Evan Tschannen dbfc0cbcc0
Merge pull request #2781 from alexmiller-apple/certificate-refresh
Refresh certificates used for handshaking when they change on disk
2020-03-06 11:12:04 -08:00
Evan Tschannen 98647a61fc
Merge pull request #2784 from ajbeamon/add-resolver-metrics
Add ResolverMetrics trace event
2020-03-06 09:38:30 -08:00
A.J. Beamon faf9101ad4
Update fdbserver/Resolver.actor.cpp
Co-Authored-By: Evan Tschannen <36455792+etschannen@users.noreply.github.com>
2020-03-06 09:20:38 -08:00
Evan Tschannen 1076abdee5 fixed crash when interf was not created 2020-03-05 19:09:08 -08:00
Evan Tschannen 1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
A.J. Beamon 7fb8c3c080 Remove unused variable. 2020-03-05 11:38:30 -08:00
A.J. Beamon effb6d2d49 Add ResolverMetrics trace event 2020-03-05 10:49:21 -08:00
Alex Miller 595dd77ed1 Merge remote-tracking branch 'upstream/release-6.2' into certificate-refresh 2020-03-04 20:25:42 -08:00
Alex Miller 9b5ef3416e Refactor TLSParams into TLSConfig + LoadedTLSConfig
The idea being that we keep around a TLSConfig that the configuration
that the user has provided, and then when we want to intialize an SSL
context, we ask the TLSConfig to load all certificates and return us a
LoadedTLSConfig that is a concrete set of certificate bytes in memory.

initTLS now just takes the in-memory bytes and applies them to the ssl
context.

This is a large refactor to lead up into certificate refeshing, where we
will periodically check for changes to the certificates, and then
re-load them and apply them to a new SSL context.
2020-03-04 20:14:47 -08:00
Evan Tschannen f3ac2c9180 renamed a variable 2020-03-04 18:49:21 -08:00
Evan Tschannen b3ea9d5896 Do not allow the cluster controller to mark any process as failed within 30 seconds of startup 2020-03-04 18:45:26 -08:00
Evan Tschannen e219c1671f Merge branch 'release-6.2' into feature-dd-region-queue
# Conflicts:
#	fdbserver/Knobs.h
2020-03-04 16:25:38 -08:00
Evan Tschannen 6d6f184e2f added a knob which reverts the new queue behavior 2020-03-04 16:23:49 -08:00
Evan Tschannen b7834b2995
Merge pull request #2774 from etschannen/feature-dd-repopulate-priority
Make the DD priority of populating a region lower than machine failures
2020-03-04 16:15:18 -08:00
A.J. Beamon 58e621eca1 Invalid knobs or knob values are treated as warnings rather than errors. Apply this change to backup as well. 2020-03-04 15:50:04 -08:00
Evan Tschannen 125bd13198 fix: in multi-region configurations, the data distribution queue could start too much work, expecting that the remote region would contribute to the read workload 2020-03-04 14:17:17 -08:00
Evan Tschannen 6296465e07 Make the DD priority associated with populating a remote region lower than machine failures 2020-03-04 14:07:32 -08:00
Meng Xu ad9b3fb4a8 DD:Add trace for detailed relocate shard info 2020-02-29 13:45:10 -08:00
Evan Tschannen c3299b8ebe if tls cannot be initialized, throw an error from createDatabase 2020-02-26 18:53:06 -08:00
Evan Tschannen bf5a95e6df Merge commit 'dc39bdfbbf94a7f470386f439df08c044d08d90c' into feature-tls-environment-vars
# Conflicts:
#	flow/Net2.actor.cpp
2020-02-26 18:02:56 -08:00
Evan Tschannen d1598e7c99 set_verify_peers throws an error instead of returning a value 2020-02-26 16:06:16 -08:00
Evan Tschannen 2586bade68 re-added support for configuration TLS options with environment variables 2020-02-26 15:33:48 -08:00
A.J. Beamon 0f5c999d4b Better containment of boost errors related to TLS. 2020-02-26 12:26:43 -08:00
Evan Tschannen c05c95cbe8 forgot to rename the knob 2020-02-25 15:47:39 -08:00
Evan Tschannen 12b5064041 a high free_space_ratio_cutoff is not needed anymore because avoid teams with low disk space is no longer the responsibility of getLoadBytes() 2020-02-25 15:47:10 -08:00
Evan Tschannen 6e7d2ff7dd prevent the proxy from delaying too long based on an incorrect estimate of the compute time 2020-02-25 15:46:13 -08:00
Evan Tschannen 65fbe0d0bc revert AcceptSocket priority change because of bad performance results 2020-02-21 19:22:14 -08:00
A.J. Beamon 4c696d5bf2 Merge branch 'release-6.2' into dd-better-rebalance-logging
# Conflicts:
#	fdbserver/DataDistributionQueue.actor.cpp
2020-02-21 17:41:00 -08:00
A.J. Beamon dfa5f76c01 Remove unused parameter. Don't put check for g_network presence in ASSERT_WE_THINK. 2020-02-21 16:28:03 -08:00
Evan Tschannen aa4d1357b3 handle the case that there is only one healthy team 2020-02-21 15:41:01 -08:00
Evan Tschannen 457dbc5215
Update fdbserver/DataDistribution.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-21 15:39:17 -08:00
Evan Tschannen 6a634652c4
Update fdbserver/DataDistribution.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-21 15:39:06 -08:00
Evan Tschannen 08914a2acd Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team 2020-02-21 15:14:32 -08:00
Evan Tschannen e422874758 fix: reboot does not work un unreliable processes 2020-02-21 14:29:42 -08:00
A.J. Beamon 2e699fef55 Don't suppress actor cancellation because we've already initialized the trace event by adding details. 2020-02-21 11:28:59 -08:00
A.J. Beamon 6810a03283 Add more logging to valley filler and mountain chopper 2020-02-21 10:55:14 -08:00
Alvin Moore 9042cab7bc Changed ordering of link libraries 2020-02-21 08:56:52 -08:00
Alvin Moore 87751df40a Fixed problem with linking pthread 2020-02-21 08:45:39 -08:00
Evan Tschannen a27ea63500 Merge branch 'release-6.2' into feature-boost-ssl 2020-02-20 23:06:22 -08:00
Evan Tschannen 59ff782927 fix: only delete the processId file on binaryReader errors 2020-02-20 23:04:39 -08:00
Evan Tschannen 6f1d3ccd35 Merge branch 'release-6.2' into feature-boost-ssl 2020-02-20 20:03:40 -08:00
Evan Tschannen f04e311a1e Merge commit 'b46d6e25e24993ab5a5f04091fd3235050b7cd09' into feature-boost-ssl
# Conflicts:
#	fdbserver/SimulatedCluster.actor.cpp
#	flow/Net2.actor.cpp
2020-02-20 17:36:38 -08:00
Alex Miller 927cff3317 Report errors on TLS misconfigurations ... or at least try to. 2020-02-20 16:57:29 -08:00
Evan Tschannen 819c55556c More aggressively attempt to find teams that do not have low disk space 2020-02-20 16:47:50 -08:00
A.J. Beamon e1fb568fd1 Merge branch 'release-6.2' into dd-use-available-space
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon 9e84fa965d
Merge pull request #2703 from ajbeamon/fix-stuck-dd-rebalancing
Fix issue with rebalancing data movement doing no work
2020-02-20 15:56:04 -08:00
Evan Tschannen d7c841a28a
Merge pull request #2589 from etschannen/feature-proxy-delay
Improve version pipelining on the proxy
2020-02-20 15:23:30 -08:00
A.J. Beamon e4b483796d Combine some logic that was doing similar computations for free space ratio. 2020-02-20 14:52:08 -08:00
Evan Tschannen 8b768e66df
Merge pull request #2694 from dongxinEric/feature/2663/specialize-policy-for-zoneid-in-cc
Added a specialized algorithm for PolicyOne and PolicyAcross(,'zoneId…
2020-02-20 14:46:23 -08:00
Evan Tschannen 8129f74a10
Merge pull request #2698 from etschannen/feature-recruit-delay
The CC waits until no new workers register before starting a bad recruitment
2020-02-20 14:42:37 -08:00
Evan Tschannen 7d54acf4ca removed an unnecessary yield 2020-02-20 14:41:49 -08:00
Evan Tschannen 574e88ba8e updateGoodRemoteRecruitmentTime was unnecessary because the only way findRemoteWorkers would return would be after a new server has joined which already resets goodRemoteRecruitmentTime 2020-02-20 13:46:22 -08:00
A.J. Beamon 5586e6f6d8
Merge pull request #2697 from etschannen/feature-correctness-fixes
A variety of correctness fixes
2020-02-20 13:32:18 -08:00
A.J. Beamon 4f1301b2dd
Merge pull request #2583 from etschannen/feature-keep-status-connected
Clients should not disconnect from the CC after fetching status
2020-02-20 13:12:30 -08:00
A.J. Beamon fcbdcda490
Merge pull request #2650 from ajbeamon/fix-reverse-range-read-byte-limit-bug
Fix reverse range read performance bug
2020-02-20 12:47:17 -08:00
A.J. Beamon 6d9decdf59
Merge pull request #2672 from ajbeamon/improve-tlog-role-event
Improve TLog "Role" event
2020-02-20 12:45:25 -08:00
A.J. Beamon 4c9c736253 Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them. 2020-02-20 11:21:03 -08:00
A.J. Beamon 3a1ba5a077 Rename variable for clarity 2020-02-20 10:59:52 -08:00
Evan Tschannen 08c318d28a re-added the connect lock in the fdbcli so that the timeout is not spent before a connection has been initiated (because of the handshake lock) 2020-02-20 10:43:34 -08:00
Xin Dong 99095c9224 Again make Clang happy. 2020-02-20 09:50:22 -08:00
Xin Dong 298d6cb3d7 Address review comments. 2020-02-20 09:34:01 -08:00
A.J. Beamon c164acb88d Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck. 2020-02-20 09:32:00 -08:00
Evan Tschannen 761da5a059 code cleanup 2020-02-19 17:59:45 -08:00
Evan Tschannen fbd45963d8 The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment 2020-02-19 16:48:30 -08:00
Evan Tschannen cf4efca852 fix: buffered cursor should always make sure all of the sub-cursors are completely exhausted before calculating minVersion. It is not legal to advance a cursor version past an epochEnd (+100 million versions) without also returning the epochEnd mutation, or the storage servers might not be able to rollback far enough because the end of the previous epoch will be made durable 2020-02-19 15:24:32 -08:00
Evan Tschannen 9b3254d5f4 A corrupted processId file should be deleted in simulation, as that is the manual operation that would fix the problem in the real world 2020-02-19 15:21:42 -08:00
Evan Tschannen 4326984b1d fix: wait metrics can take a really long time to detect that two shards have been merged into one if both shards are assigned to the same team. Additional information should be added to the request to improve this. 2020-02-19 15:20:38 -08:00
Xin Dong 89fcbb2055 Make clang happy 2020-02-19 09:44:15 -08:00
Xin Dong efc0d7f9d5 Added a specialized algorithm for PolicyOne and PoilcyAcross(,'zoneId',PolicyOne()) to find a set of TLog servers which will be able to fulfill the policy later. 2020-02-19 09:25:57 -08:00
Alex Miller 88d36af9c7 Fix --tls_password and add better error logging
This refactors all tls settings into a TLSParams object so that we can
set the password before loading any certificates.

It turns out that the FDBLibTLS code did really nice things with error
logging, but I just didn't understand openssl enough before to realize
what pieces I should be copying.
2020-02-19 00:57:05 -08:00
A.J. Beamon 1d9140d874 Removed TLogVersion logging.
Added logging of SharedTLog ID for each TLog.
Switched ID logged for TLogRejoining event to the TLog instead of the SharedTLog.
Made some parameters to startRole passed by reference.
2020-02-14 12:33:43 -08:00
A.J. Beamon a41aa41816
Merge pull request #2670 from Daniel-B-Smith/skip-memcmp
Revert to memcmp comparison in SkipList
2020-02-14 09:03:08 -08:00
Alex Miller c859f859bc Remove certBytes. 2020-02-13 21:34:23 -08:00
Alex Miller f2d30a9954 comment out certBytes to fix cmake builds 2020-02-13 21:31:36 -08:00
Andrew Noyes 68a6f59830 Set options _within_ the retry loop 2020-02-13 16:15:41 -08:00
Meng Xu 5e78d0ad1c
Merge pull request #2641 from atn34/atn34/configure-locked
Allow a new database to be configured locked
2020-02-13 09:47:46 -08:00
Evan Tschannen 96eec756b3 more simulation fixes 2020-02-12 15:12:43 -08:00
A.J. Beamon 56053c565b Improve TLog "Role" event by adding the worker ID, the TLog version, and under what circumstances the TLog is being started (Restored, Recruited, or Recovered).
The SharedTLog role was being started and stopped twice, so remove one instance of it.
2020-02-12 15:11:38 -08:00
Daniel Smith 011e181183 Revert to memcmp comparison in SkipList 2020-02-12 17:33:33 -05:00
A.J. Beamon d2b7f92b49 Merge branch 'release-6.2' into fix-status-proxy-list
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2020-02-12 14:14:18 -08:00
A.J. Beamon 60f6b928f6 Slight reorganization of code to make it clearer. 2020-02-12 14:07:02 -08:00
Evan Tschannen 38a5511b96 additional simulation fixes 2020-02-11 15:52:06 -08:00
Andrew Noyes 86089fdc1b
Merge branch 'release-6.2' into atn34/configure-locked 2020-02-11 13:51:41 -08:00
Andrew Noyes 17660fb18d Fix simulation test 2020-02-11 13:49:19 -08:00
Andrew Noyes 1e1e75123f Add simulation testing 2020-02-11 11:10:22 -08:00
A.J. Beamon 962749a609 Several boolean knobs could not be set at runtime (TR_FLAG_DISABLE_MACHINE_TEAM_REMOVER, TR_FLAG_REMOVE_MT_WITH_MOST_TEAMS, TR_FLAG_DISABLE_SERVER_TEAM_REMOVER, BUGGIFY_ALL_COORDINATION) 2020-02-10 21:49:31 -08:00
Evan Tschannen dcbce3593e fixed TLS in simulation 2020-02-10 14:00:21 -08:00
A.J. Beamon 529200018a Merge branch 'release-6.2' into fix-reverse-range-read-byte-limit-bug
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2020-02-10 12:23:52 -08:00
A.J. Beamon b8a252da40 Clarify the names of a couple trace fields 2020-02-10 08:15:00 -08:00
A.J. Beamon 78cb1071dc Status should use the full list of proxies 2020-02-07 15:44:02 -08:00
A.J. Beamon fa920a6cef Step 5 of fixing storage server range reads: update the logic of reverse range reads to match forward range reads 2020-02-07 10:02:52 -08:00
Alex Miller e390dbd36c Add a non-FDBLibTLS verify peers framework to new TLS impl 2020-02-06 21:06:52 -08:00
Evan Tschannen 38d8d0d675 fixed simulation 2020-02-06 19:29:31 -08:00
A.J. Beamon 16167b07d5 Step 4 of fixing storage server range reads: remove another unneeded iteration case in the forward direction when we don't exhaust our limits in the disk read. This also hopefully makes the code a bit clearer. 2020-02-06 13:27:04 -08:00
A.J. Beamon df2b0452b4 Step 3 of fixing storage server range reads: change return type of readRange from VectorRef<KeyValueRef> to RangeResultRef. 2020-02-06 13:19:24 -08:00
A.J. Beamon 1c61957ca1 Step 2 of fixing storage server range reads: eliminate some unnecessary iterations in the forward case 2020-02-06 12:58:59 -08:00
A.J. Beamon 7037edc3f8 Step 1 of fixing storage server range reads: cleanup of the forward direction. This should not change any behavior. 2020-02-06 12:49:02 -08:00
A.J. Beamon f32d515fda Reverse range reads on the storage server would not pass the specified byte limit to the storage engine but would apply it to the results returned, causing a potentially significant amount of wasted reading. 2020-02-05 11:16:40 -08:00
Evan Tschannen 84853dd1fd switched SSL implementation to use boost ssl 2020-02-04 14:56:40 -08:00
A.J. Beamon d1b87f8b7f The storage server could fail to update its version to the latest processed if the peeked data contained a non-empty commit and ended with an empty commit. 2020-01-29 13:17:58 -08:00
Evan Tschannen 231d7830a0 more accurate calculation on the amount of time that proxy should wait before getting a version from the master 2020-01-26 19:47:12 -08:00
Evan Tschannen e167e63eaf Add delays between proxy batches which roughly corresponding to the amount of work the proxy needs to do. This will help avoid getting a version from the master and then waiting a long time before committing it. 2020-01-23 18:31:51 -08:00
Evan Tschannen 73ad702d14 Clients which fetch status should not disconnect from the coordinators and cluster controller between each retrieval 2020-01-22 15:41:22 -08:00
Evan Tschannen 8197f0562f merge priority did not need to be raised, because we no longer merge shards until they are untrackable
max_commit_updates was too large, and could cause proxies to run out of memory
2020-01-17 14:24:58 -08:00
Evan Tschannen 827cea74b5 fix: tlogs must send a recruitment reply even when actor cancelled or the recruitment endpoint will be marked as permanently failed 2020-01-16 17:37:17 -08:00
Evan Tschannen 4b90487b90 occasionally throw wrong_shard_server when waitMetrics times out so that the waitMetrics request can get the correct number of shards if two shards have been merged or split and the same storage server owns all the chunks 2020-01-15 13:22:18 -08:00
Evan Tschannen fd5705a451 fixed capitalization 2020-01-15 09:35:57 -08:00
Evan Tschannen c93ca04ea6 Do not merge more than 100 shards together to avoid creating untrackable shards 2020-01-15 09:33:27 -08:00
Evan Tschannen e65760eb46
Merge pull request #2536 from etschannen/feature-commit-latency
Improved commit latency in large clusters
2020-01-13 19:12:02 -08:00
Evan Tschannen 17e97f24e4
Merge pull request #2526 from etschannen/feature-dd-improvements
Data distribution improvements
2020-01-10 17:53:22 -08:00
Evan Tschannen 8475da359c
Merge pull request #2527 from etschannen/feature-region-fixes
A database could perform poorly while a remote region catches up to the primary
2020-01-10 17:26:43 -08:00
Evan Tschannen b331c5dafe wantsToMerge was created before the shardEvaluator has a chance to update it based on shardSize changes 2020-01-10 17:23:56 -08:00
Evan Tschannen fde53cbeef HasBeenTrueFor was ready immediately after a previous shard merge 2020-01-10 16:28:56 -08:00
Evan Tschannen 855f03a41f ratekeeper needed to check remoteDC in another location
the storage server scoped a transaction incorrectly
2020-01-10 15:58:36 -08:00
Evan Tschannen 9b80498180 Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth 2020-01-10 14:58:38 -08:00
Evan Tschannen c2608f0af9 fix: completeSources could be larger than the teamSize, so we need to check all completeSources
we do not need to track bestSize, since all teams in the list will be the same size
2020-01-10 14:46:40 -08:00
Evan Tschannen a5f544818c
Merge pull request #2420 from ajbeamon/trace-clock-source-fix
Revert change to make g_trace_clock thread_local, ...
2020-01-10 12:36:38 -08:00
Evan Tschannen 4aab9b7bc8 fix: clients would waste time attempting to read from a remote region when it was in the process of catching up 2020-01-10 12:23:59 -08:00
Evan Tschannen d55e56993d fix: the cluster controller would not recruit more remote logs before the database became fully_recovered 2020-01-10 12:21:48 -08:00
Evan Tschannen 7898f4425f fix: ratekeeper could limit based on remote storage servers 2020-01-10 12:21:08 -08:00
Evan Tschannen da1be272cb fix: servers which opened the database would use the full list of proxies 2020-01-10 12:20:30 -08:00
Evan Tschannen 02a8e8d1e9 batch priority must be heavily throttled before stopping data distribution rebalancing 2020-01-09 17:05:22 -08:00
Evan Tschannen 9842272ced raised the priority of shard merges, because the tracker cannot track an unmerged shard 2020-01-09 17:04:17 -08:00
Evan Tschannen e4fa4ad0c9 Data distribution will not merge a shard unless it has been low bandwidth for 5 minutes 2020-01-09 17:02:49 -08:00
Evan Tschannen ab7071932f Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly 2020-01-09 16:59:37 -08:00
Evan Tschannen 032797ca5c
Merge pull request #2430 from etschannen/release-6.2
Reduce recovery times caused by saturating the cluster controller
2020-01-02 17:35:59 -08:00
A.J. Beamon 3dd3ac3cfd Merge branch 'release-6.2' into trace-clock-source-fix
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2020-01-02 15:14:12 -08:00
Evan Tschannen 3eae401886 fix: we were recruiting one too few oldLogRouters
code cleanup
2020-01-02 15:05:44 -08:00
Evan Tschannen 3157d8a375 fixed typo 2019-12-18 16:57:39 -08:00
Evan Tschannen d8c3c2fda4 Improved prioritization of commit path on the proxies 2019-12-18 16:56:35 -08:00
Evan Tschannen 3c30215662 Merge branch 'release-6.2' of github.com:apple/foundationdb into release-6.2 2019-12-09 13:18:07 -08:00
Evan Tschannen 5e5e618da0 during recovery, only send the full serverDBInfo to processes that are part of the new generation 2019-12-09 13:17:49 -08:00
Evan Tschannen bcce5968a4 recruit oldLogRouters on TLogs, do not recruit oldLogRouters on the cluster controller if possible 2019-12-09 13:12:13 -08:00
Andrew Noyes 56f1ff7ff6 Test client-side buggify in simulation 2019-12-09 12:55:23 -08:00
A.J. Beamon 9866d1ce27 Revert change to make g_trace_clock thread_local, instead checking we are on the correct thread when getting the time. 2019-12-06 10:15:49 -08:00
Andrew Noyes 9188344d7b Update fdbserver/SkipList.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-12-05 15:44:43 -08:00
Andrew Noyes 46b675a719 Update fdbserver/SkipList.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-12-05 15:44:43 -08:00
Andrew Noyes 4263a17188 Change bitMask return type to wordType 2019-12-05 15:44:43 -08:00
Andrew Noyes 604351680b Corresponding fix for lowBits 2019-12-05 15:44:43 -08:00
Andrew Noyes 485dc5d5bc Define hightBits behavior
I looked at usages of highBits and it looks like this behavior makes
sense. It's also the behavior described in the nearby comment, and the
same behavior we happened to be getting, except now it's defined.
2019-12-05 15:44:43 -08:00
Andrew Noyes 55916534fe Make orImpl private
As far as I can tell MiniConflictSet2 is meant to be a reference
implementation for MiniConflictSet, so let's give them the same API
2019-12-05 15:44:43 -08:00
Evan Tschannen 5a6bc2aa71 increase the priority of cluster controller recruitment to prefer recruitment over sending serverDBInfo 2019-12-04 16:28:41 -08:00
Evan Tschannen 5f1ef53f62 increase the priority at which the cluster controller registers workers to avoid having a saturated cluster controller recruit a master without all available workers 2019-12-04 16:17:41 -08:00
Andrew Noyes e6678573db Fix load of bool which is not 0 or 1 2019-12-04 09:42:35 -08:00
Andrew Noyes 560c1da805 Fix UBSAN report
/home/anoyes/workspace/foundationdb/fdbserver/VersionedBTree.actor.cpp:1606:10: runtime error: null pointer passed as argument 2, which is declared to never be null
2019-12-04 09:42:34 -08:00
Andrew Noyes 9ef1f4da5c
Update fdbserver/storageserver.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-12-04 09:21:05 -08:00
Andrew Noyes 41583aa576 Guard unlikely indexCode calculation per Steve 2019-12-03 21:48:30 -08:00
Andrew Noyes 854c94c5ad Fix another "binding reference to nullptr" 2019-12-03 17:39:17 -08:00
Andrew Noyes 2aeb9e0cbf Fix UBSAN report 2019-12-03 16:20:39 -08:00
Andrew Noyes 46d10dc7dc Fix "null passed as argument declared not null"
Fix several such reports from ubsan

E.g.

/Users/anoyes/workspace/foundationdb/flow/Arena.h:794:16: runtime error: null pointer passed as argument 1, which is declared to never be null
2019-12-03 14:46:53 -08:00
Andrew Noyes 47de9d9d6e Fix UBSAN report
/Users/anoyes/workspace/foundationdb/fdbserver/TLogInterface.h:149:8: runtime error: load of value 232, which is not a valid value for type 'bool'
    #0 0xc608fb in TLogPeekReply::TLogPeekReply(TLogPeekReply const&) /Users/anoyes/workspace/foundationdb/fdbserver/TLogInterface.h:149
    #1 0x242bf87 in ILogSystem::ServerPeekCursor::ServerPeekCursor(TLogPeekReply const&, LogMessageVersion const&, LogMessageVersion const&, int, int, bool, long, Tag) /Users/anoyes/workspace/foundationdb/fdbserver/LogSystemPeekCursor.actor.cpp:35
    #2 0x242da77 in ILogSystem::ServerPeekCursor::cloneNoMore() /Users/anoyes/workspace/foundationdb/fdbserver/LogSystemPeekCursor.actor.cpp:47
    #3 0x24362d5 in ILogSystem::MergedPeekCursor::cloneNoMore() /Users/anoyes/workspace/foundationdb/fdbserver/LogSystemPeekCursor.actor.cpp:325
    #4 0x244bf45 in ILogSystem::MultiCursor::cloneNoMore() /Users/anoyes/workspace/foundationdb/fdbserver/LogSystemPeekCursor.actor.cpp:838
    #5 0x36b5a36 in a_body1cont5loopBody1 /Users/anoyes/workspace/foundationdb/fdbserver/storageserver.actor.cpp:2621
    #6 0x36b3110 in a_body1cont5loopHead1 /Users/anoyes/build/foundationdb/fdbserver/storageserver.actor.g.cpp:8664
    #7 0x36b07fe in a_body1cont5 /Users/anoyes/build/foundationdb/fdbserver/storageserver.actor.g.cpp:8576
    #8 0x36abda8 in a_body1cont4when1 /Users/anoyes/build/foundationdb/fdbserver/storageserver.actor.g.cpp:8582
    #9 0x36a8dc2 in a_body1cont4 /Users/anoyes/build/foundationdb/fdbserver/storageserver.actor.g.cpp:8454
    #10 0x36a4bf6 in a_body1cont3break1 /Users/anoyes/build/foundationdb/fdbserver/storageserver.actor.g.cpp:8489
    #11 0x36a2c01 in a_body1cont3loopBody1cont1 /Users/anoyes/build/foundationdb/fdbserver/storageserver.actor.g.cpp:8505
    #12 0x369fd36 in a_body1cont3loopBody1when1 /Users/anoyes/build/foundationdb/fdbserver/storageserver.actor.g.cpp:8513
    #13 0x3700dcb in a_callback_fire /Users/anoyes/build/foundationdb/fdbserver/storageserver.actor.g.cpp:8528
    #14 0x36e5210 in fire /Users/anoyes/workspace/foundationdb/flow/flow.h:998
    #15 0x4dfb2a in SAV<Void>::finishSendAndDelPromiseRef() /Users/anoyes/workspace/foundationdb/flow/flow.h:479
    #16 0x2484b07 in a_body1loopBody1cont1 /Users/anoyes/build/foundationdb/fdbserver/LogSystemPeekCursor.actor.g.cpp:1526
    #17 0x24822cf in a_body1loopBody1cont2 /Users/anoyes/build/foundationdb/fdbserver/LogSystemPeekCursor.actor.g.cpp:1535
    #18 0x247e228 in a_body1loopBody1when1 /Users/anoyes/build/foundationdb/fdbserver/LogSystemPeekCursor.actor.g.cpp:1541
    #19 0x249be87 in a_callback_fire /Users/anoyes/build/foundationdb/fdbserver/LogSystemPeekCursor.actor.g.cpp:1556
    #20 0x249668f in fire /Users/anoyes/workspace/foundationdb/flow/flow.h:998
    #21 0x4dfb2a in SAV<Void>::finishSendAndDelPromiseRef() /Users/anoyes/workspace/foundationdb/flow/flow.h:479
    #22 0x80557e in a_body1when1 /Users/anoyes/build/foundationdb/flow/genericactors.actor.g.h:11591
    #23 0x8916ef in a_callback_fire /Users/anoyes/build/foundationdb/flow/genericactors.actor.g.h:11620
    #24 0x8735f5 in fire /Users/anoyes/workspace/foundationdb/flow/flow.h:998
    #25 0x4dfb2a in SAV<Void>::finishSendAndDelPromiseRef() /Users/anoyes/workspace/foundationdb/flow/flow.h:479
    #26 0x24820f8 in a_body1cont1loopBody1when1 /Users/anoyes/build/foundationdb/fdbserver/LogSystemPeekCursor.actor.g.cpp:860
    #27 0x249c852 in a_callback_fire /Users/anoyes/build/foundationdb/fdbserver/LogSystemPeekCursor.actor.g.cpp:886
    #28 0x249786c in fire /Users/anoyes/workspace/foundationdb/flow/flow.h:998
    #29 0xc9d2dc in SAV<TLogPeekReply>::finishSendAndDelPromiseRef() /Users/anoyes/workspace/foundationdb/flow/flow.h:479
    #30 0x248b39f in a_body1cont2 /Users/anoyes/build/foundationdb/flow/genericactors.actor.g.h:11858
    #31 0x2489d02 in a_body1when1 /Users/anoyes/build/foundationdb/flow/genericactors.actor.g.h:11865
    #32 0x249a150 in a_callback_fire /Users/anoyes/build/foundationdb/flow/genericactors.actor.g.h:11880
    #33 0x2492a4f in fire /Users/anoyes/workspace/foundationdb/flow/flow.h:998
    #34 0xc9d2dc in SAV<TLogPeekReply>::finishSendAndDelPromiseRef() /Users/anoyes/workspace/foundationdb/flow/flow.h:479
    #35 0x248df9b in a_body1cont2 /Users/anoyes/build/foundationdb/fdbrpc/genericactors.actor.g.h:2762
    #36 0x248b7da in a_body1when1 /Users/anoyes/build/foundationdb/fdbrpc/genericactors.actor.g.h:2769
    #37 0x2499c88 in a_callback_fire /Users/anoyes/build/foundationdb/fdbrpc/genericactors.actor.g.h:2784
    #38 0x2492371 in fire /Users/anoyes/workspace/foundationdb/flow/flow.h:998
    #39 0xc9d2dc in SAV<TLogPeekReply>::finishSendAndDelPromiseRef() /Users/anoyes/workspace/foundationdb/flow/flow.h:479
    #40 0xc60fb3 in void SAV<TLogPeekReply>::sendAndDelPromiseRef<TLogPeekReply&>(TLogPeekReply&) /Users/anoyes/workspace/foundationdb/flow/flow.h:472
    #41 0xc1137a in NetSAV<TLogPeekReply>::receive(ArenaObjectReader&) /Users/anoyes/workspace/foundationdb/fdbrpc/fdbrpc.h:111
    #42 0x78eda75 in a_body1cont1 /Users/anoyes/workspace/foundationdb/fdbrpc/FlowTransport.actor.cpp:652
    #43 0x78f7967 in a_body1cont2 /Users/anoyes/build/foundationdb/fdbrpc/FlowTransport.actor.g.cpp:2369
    #44 0x78ed4d8 in a_body1when1 /Users/anoyes/build/foundationdb/fdbrpc/FlowTransport.actor.g.cpp:2375
    #45 0x791af45 in a_callback_fire /Users/anoyes/build/foundationdb/fdbrpc/FlowTransport.actor.g.cpp:2390
    #46 0x7914670 in fire /Users/anoyes/workspace/foundationdb/flow/flow.h:998
    #47 0x4df0dc in void SAV<Void>::send<Void>(Void&&) /Users/anoyes/workspace/foundationdb/flow/flow.h:447
    #48 0x959891 in void Promise<Void>::send<Void>(Void&&) const /Users/anoyes/workspace/foundationdb/flow/flow.h:778
    #49 0x7b4b022 in Sim2::execTask(Sim2::Task&) (/Users/anoyes/build/foundationdb/bin/fdbserver+0x7b4b022)
    #50 0x7bf9172 in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) /Users/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:979
    #51 0x7be7b72 in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1when1(Void const&, int) /Users/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5391
    #52 0x7c32a09 in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_callback_fire(ActorCallback<Sim2::RunLoopActor, 0, Void>*, Void) /Users/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5406
    #53 0x7c1fc7d in ActorCallback<Sim2::RunLoopActor, 0, Void>::fire(Void const&) /Users/anoyes/workspace/foundationdb/flow/flow.h:998
    #54 0x4df0dc in void SAV<Void>::send<Void>(Void&&) /Users/anoyes/workspace/foundationdb/flow/flow.h:447
    #55 0x959891 in void Promise<Void>::send<Void>(Void&&) const /Users/anoyes/workspace/foundationdb/flow/flow.h:778
    #56 0x7fe74ae in N2::PromiseTask::operator()() /Users/anoyes/workspace/foundationdb/flow/Net2.actor.cpp:481
    #57 0x7fb7001 in N2::Net2::run() /Users/anoyes/workspace/foundationdb/flow/Net2.actor.cpp:657
    #58 0x7b71bdd in Sim2::_runActorState<Sim2::_runActor>::a_body1(int) /Users/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:989
    #59 0x7b2ee5b in Sim2::_runActor::_runActor(Sim2* const&) /Users/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5608
    #60 0x7b2f272 in Sim2::_run(Sim2* const&) /Users/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:987
    #61 0x7b2f2d2 in Sim2::run() /Users/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:996
    #62 0x2104064 in main /Users/anoyes/workspace/foundationdb/fdbserver/fdbserver.actor.cpp:1793
    #63 0x7fb7c6561504 in __libc_start_main (/lib64/libc.so.6+0x22504)
    #64 0x464914  (/Users/anoyes/build/foundationdb/bin/fdbserver+0x464914)
2019-12-03 12:51:36 -08:00
Andrew Noyes 6bde67f2b3 Fix UBSAN report
/home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:86:8: runtime error: load of value 1231493777, which is not a valid value for type 'limitReason_t'
    #0 0x310e961 in StorageQueueInfo::StorageQueueInfo(StorageQueueInfo&&) /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:86
    #1 0x310eacd in MapPair<UID, StorageQueueInfo>::MapPair<UID, StorageQueueInfo>(UID&&, StorageQueueInfo&&) /home/anoyes/workspace/foundationdb/flow/IndexedSet.h:242
    #2 0x310b35e in MapPair<std::decay<UID>::type, std::decay<StorageQueueInfo>::type> mapPair<UID, StorageQueueInfo>(UID&&, StorageQueueInfo&&) /home/anoyes/workspace/foundationdb/flow/IndexedSet.h:258
    #3 0x30a8b79 in a_body1 /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:195
    #4 0x309b529 in TrackStorageServerQueueInfoActor /home/anoyes/build/foundationdb/fdbserver/Ratekeeper.actor.g.cpp:495
    #5 0x309b9be in trackStorageServerQueueInfo(RatekeeperData* const&, StorageServerInterface const&) /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:194
    #6 0x30cff63 in a_body1loopBody1when1cont1 /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:303
    #7 0x30cd9da in a_body1loopBody1when1when1 /home/anoyes/build/foundationdb/fdbserver/Ratekeeper.actor.g.cpp:1170
    #8 0x30ed4dd in a_callback_fire /home/anoyes/build/foundationdb/fdbserver/Ratekeeper.actor.g.cpp:1185
    #9 0x30e6d81 in fire /home/anoyes/workspace/foundationdb/flow/flow.h:998
    #10 0x4df0dc in void SAV<Void>::send<Void>(Void&&) /home/anoyes/workspace/foundationdb/flow/flow.h:447
    #11 0x959891 in void Promise<Void>::send<Void>(Void&&) const /home/anoyes/workspace/foundationdb/flow/flow.h:778
    #12 0x7b4b018 in Sim2::execTask(Sim2::Task&) (/home/anoyes/build/foundationdb/bin/fdbserver+0x7b4b018)
    #13 0x7bf9168 in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:979
    #14 0x7be7b68 in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1when1(Void const&, int) /home/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5391
    #15 0x7c329ff in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_callback_fire(ActorCallback<Sim2::RunLoopActor, 0, Void>*, Void) /home/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5406
    #16 0x7c1fc73 in ActorCallback<Sim2::RunLoopActor, 0, Void>::fire(Void const&) /home/anoyes/workspace/foundationdb/flow/flow.h:998
    #17 0x4df0dc in void SAV<Void>::send<Void>(Void&&) /home/anoyes/workspace/foundationdb/flow/flow.h:447
    #18 0x959891 in void Promise<Void>::send<Void>(Void&&) const /home/anoyes/workspace/foundationdb/flow/flow.h:778
    #19 0x7fe74a4 in N2::PromiseTask::operator()() /home/anoyes/workspace/foundationdb/flow/Net2.actor.cpp:481
    #20 0x7fb6ff7 in N2::Net2::run() /home/anoyes/workspace/foundationdb/flow/Net2.actor.cpp:657
    #21 0x7b71bd3 in Sim2::_runActorState<Sim2::_runActor>::a_body1(int) /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:989
    #22 0x7b2ee51 in Sim2::_runActor::_runActor(Sim2* const&) /home/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5608
    #23 0x7b2f268 in Sim2::_run(Sim2* const&) /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:987
    #24 0x7b2f2c8 in Sim2::run() /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:996
    #25 0x21040a6 in main /home/anoyes/workspace/foundationdb/fdbserver/fdbserver.actor.cpp:1793
    #26 0x7f03492ba504 in __libc_start_main (/lib64/libc.so.6+0x22504)
    #27 0x464914  (/home/anoyes/build/foundationdb/bin/fdbserver+0x464914)
2019-12-03 12:49:12 -08:00
Andrew Noyes b086dbecac Fix another UBSAN error
fdbserver/sqlite/sqlite3.amalgamation.c:14709:15: runtime error: left shift of 205 by 24 places cannot be represented in type 'int'
2019-12-02 12:51:33 -08:00
Andrew Noyes 36e9f40fc2 Fix negative shift exponent
fdbserver/KeyValueStoreSQLite.actor.cpp:438:11: runtime error: shift exponent -1 is negative
2019-12-02 12:51:33 -08:00
Andrew Noyes e0bf7c4d65 Fix signed integer overflow
Not sure if this is the right fix or not

fdbserver/Ratekeeper.actor.cpp:557:40: runtime error: signed integer overflow: -9223372036854775808 - 9223372036854775807 cannot be represented in type 'long long'
2019-12-02 12:51:33 -08:00
Andrew Noyes f320f6c174 Fix occurrence of undefined behavior
UBSAN has this to say:

flow/Arena.h:982:10: runtime error: reference binding to null pointer of type 'KeyValueRef'

After this change UBSAN no longer complains about this occurrence
2019-11-26 21:34:24 -08:00
Evan Tschannen 44765a46ac Merge branch 'release-6.2' of github.com:apple/foundationdb into release-6.2
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-11-22 11:00:22 -08:00
Evan Tschannen 3a3ab5664b fix: team trackers for bad teams that contain a removed servers must be cancelled or the cluster will falsely report those teams as failed 2019-11-22 10:20:13 -08:00
A.J. Beamon 7c801513e2 Fix cases where latency band config could be discarded during recovery or process start. 2019-11-20 11:44:18 -08:00
Evan Tschannen c8aef99e87
Merge pull request #2354 from etschannen/fix-dd-recruit
The data distributor could still be killed and re-recruited infinitely
2019-11-13 13:26:57 -08:00
Evan Tschannen ffc89d1182 fix: dd test recruitment should prefer the location of ratekeeper over other used processes 2019-11-13 12:58:55 -08:00
Evan Tschannen 11525f6922 added comments 2019-11-13 12:53:23 -08:00
Evan Tschannen 8f725db92e serialization of logRangeMutation->second caused long slow tasks 2019-11-12 23:06:58 -08:00
Evan Tschannen b1b5f88cb1
Merge pull request #2344 from bnamasivayam/release-6.2
Fix bug where DD or RK could be halted and re-recruited in a loop for…
2019-11-12 21:47:28 -08:00
Evan Tschannen 5e463f7290
Merge pull request #2342 from ajbeamon/packet-size-event-rename
Rename LargePacket warnings to distinguish between sent and received packets.
2019-11-12 20:42:55 -08:00
Evan Tschannen be303cad7a
Merge pull request #2339 from etschannen/feature-increase-reboot-priority
Increase the priority of reboot and recruitment requests
2019-11-12 20:42:22 -08:00
Evan Tschannen 7ebbb4d9cf
Merge pull request #2337 from etschannen/feature-logrouter-peek
Do not limit log router peeking from satellite logs
2019-11-12 20:41:22 -08:00
Balachandar Namasivayam c26bb52979 Enable Consistency Checks for DD and RK. 2019-11-12 20:11:08 -08:00
Balachandar Namasivayam 2e41497580 This commit tries to distribute RK and DD among other empty available processes. 2019-11-12 17:52:42 -08:00
Steve Atherton 17059596e9
Merge pull request #2346 from satherton/feature-redwood
Update Redwood
2019-11-12 16:25:10 -08:00
Balachandar Namasivayam f5282f2c7e Fix bug where DD or RK could be halted and re-recruited in a loop for certain valid process class configurations. Specifically, recruitment of DD or RK takes into account that master process is preferred over proxy, resolver or cc.
But check for better DD only looks for better machine class ignoring that the new recruit could share a proxy or resolver or CC. Also try to balance the distribution of the DD and RK role if there are enough processes to do so.
2019-11-12 14:22:36 -08:00