Commit Graph

1678 Commits

Author SHA1 Message Date
Evan Tschannen 72203ba47a Merge commit '56f3f0b1bc60604f965152d856ae29a591227703' 2019-04-01 18:45:38 -07:00
Evan Tschannen f5de52de91 fix: cancel the previous log system recruitment before calling newEpoch, to avoid multiple actors attempting to modify oldLogSystem at the same time 2019-04-01 16:38:25 -07:00
Evan Tschannen a46620fbee Merge branch 'release-6.1' 2019-03-30 17:59:28 -07:00
Evan Tschannen 8ebf771392 cleanup cluster controller trace events 2019-03-30 14:17:18 -07:00
Alex Miller e7ad39246c
Fix typo 2019-03-29 20:16:26 -07:00
Evan Tschannen a44ffd851e fix: the shared tlog could fail to update a stopped tlog’s queueCommitVersion to version if a second tlog registered before it could issue the first commit for the tlog 2019-03-29 20:11:30 -07:00
Evan Tschannen d882c060bf Merge commit '5dd6396eed0de0dfea6cf9eecc307995eff5cedc' 2019-03-28 18:00:55 -07:00
Balachandar Namasivayam 0bbdc15f71 Multi-test processes waits until a timeout if any of the tester processes restarts. Use getReplyUnlessFailedFor instead of getReply to detect the restarts and fail quickly instead of waiting for a timeout which is usually large. 2019-03-28 17:05:30 -07:00
Evan Tschannen b6008558d3 renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen 836bb95a7a
Merge pull request #1372 from etschannen/master
Merge 6.1 into master
2019-03-27 21:00:49 -07:00
Evan Tschannen 34b9d5e722
Merge pull request #1364 from etschannen/feature-fast-serialize
A few performance optimizations
2019-03-27 20:57:25 -07:00
Evan Tschannen e5a80f2c94 optimized IPaddress 2019-03-27 18:21:13 -07:00
A.J. Beamon 91014d4529 Add file changes that I accidentally failed to commit; fix naming issue in worker. 2019-03-27 08:41:19 -07:00
A.J. Beamon 71e2fdafb8 Changes to ratekeeper camel case 2019-03-27 08:24:25 -07:00
A.J. Beamon d508658569 Make ratekeeper one word to match our existing convention 2019-03-27 08:15:19 -07:00
Jingyu Zhou 38c6681349 Fix some signed and unsigned mismatch warnings. 2019-03-26 14:54:11 -07:00
Jingyu Zhou c0b58080ee Fix type name warning for DDTeamCollection
Seen using 'class' now seen using 'struct' in DataDistribution.actor.cpp
2019-03-26 14:18:25 -07:00
Jingyu Zhou 7c02ee6fdd Fix compiler warning about unreferenced exception variable 2019-03-26 13:43:47 -07:00
Jingyu Zhou 466a59a99d Merge remote-tracking branch 'apple/release-6.1' into ratekeeper 2019-03-25 15:27:38 -07:00
Jingyu Zhou f57a22e2ed Add data distributor and ratekeeper to status output 2019-03-25 15:11:29 -07:00
Evan Tschannen 5e03e178de
Merge pull request #1345 from ajbeamon/support-multiple-client-or-worker-issues
Add support for a client or worker having multiple issues.
2019-03-24 17:27:50 -07:00
Evan Tschannen d45159ebf7
Merge pull request #1307 from jzhou77/ratekeeper
Monitor placement of Ratekeeper and DataDistributor
2019-03-24 17:26:07 -07:00
Evan Tschannen d6ad027d37 ratekeeper needs to be recruited for proxies to make progress, so if one has not registered with the cluster controller by the time we are accepting commits, recruit a new one 2019-03-24 16:48:24 -07:00
Evan Tschannen f426d732ea fix: forgot to remove one location where id_used was incremented for distributor and ratekeeper 2019-03-24 16:04:59 -07:00
Evan Tschannen e8948726e8 once we recruit a ratekeeper, do not allow any other ratekeepers to register 2019-03-24 11:04:39 -07:00
Evan Tschannen 24c92a1870
Merge pull request #1352 from etschannen/feature-network-address-list
Changed NetworkAddressList to at most two addresses for performance
2019-03-24 10:22:38 -07:00
Evan Tschannen 50a4403661 fix: missing parathesis 2019-03-23 21:52:15 -07:00
Jingyu Zhou 40eec20252 Restore master PID in worker registration
This fix is lost during merge.
2019-03-23 21:02:11 -07:00
Jingyu Zhou 3ef26e6be3 Fix fitness assignment statements
Found by MacOS build.
2019-03-23 19:16:04 -07:00
Evan Tschannen 1fc6937802 changed NetworkAddressList to at most two addresses for performance 2019-03-23 17:54:46 -07:00
Evan Tschannen b51a24453e the data distributor and ratekeeper are not included in id_used, but when comparing equally good options we prefer to avoid sharing with those roles
excluded data distributor and ratekeeper were improperly killed when the best option was also excluded
2019-03-23 13:25:36 -07:00
Jingyu Zhou 10988f89d9 Code refactoring for ConsistencyCheck.actor.cpp 2019-03-23 11:06:43 -07:00
Jingyu Zhou fdc5b5ddbf Fix: spurious ratekeeper registration
A rare race condition:
-r simulation -f ./foundationdb/tests/slow/WriteDuringReadAtomicRestore.txt -s 114256311 -b on

- A is the ratekeeper.
- CC recruit B and B starts
- CC halts ratekeeper A and A is halted
- A registers back with CC, which then halts B. CC sets A to be the ratekeeper.

CC starts recruiting and finds A is the best machine. But skips recruiting
because CC thinks A is already used. Now the cluster is left with no ratekeeper.

Fix by disallowing ratekeeper registration with previous ID.
2019-03-23 11:03:51 -07:00
Jingyu Zhou 6523cd4931 Fix: recruit ratekeeper is not triggerred 2019-03-23 09:20:54 -07:00
Steve Atherton 09f37cf3d2
Merge pull request #533 from ajbeamon/fix-parent-directory
Fixes to parentDirectory() and abspath()
2019-03-22 23:53:46 -07:00
Evan Tschannen 2da46e3172 fix: halt if datacenters are different 2019-03-22 23:53:21 -07:00
Evan Tschannen b68bc46042
Merge pull request #1348 from ajbeamon/fix-missing-metrics-when-ss-down
Fix missing read workload metrics
2019-03-22 19:08:04 -07:00
Evan Tschannen d34c56c9a5 ensure that the processId exists in id_worker before accessing it 2019-03-22 18:54:39 -07:00
Balachandar Namasivayam ac8ad07b45 Address review comments. 2019-03-22 18:48:49 -07:00
Balachandar Namasivayam 4ed323ac52 Fixed bug and addressed review comments. 2019-03-22 18:48:49 -07:00
Balachandar Namasivayam d75020b44a Fix bug where accessing shared memory created by boost 1.52 leads to error when accessed by boost 1.67. 2019-03-22 18:48:49 -07:00
Evan Tschannen 36ab852bb1 Merge branch 'master' into ratekeeper
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-22 18:41:00 -07:00
Evan Tschannen 6254a1a8e4 fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop 2019-03-22 18:37:39 -07:00
Evan Tschannen 7dd1c1b60c fix: processClassFitness could be wrong if the client changed their class while rebooting 2019-03-22 18:37:39 -07:00
Evan Tschannen ddb6058770 simplified ratekeeper monitoring loop 2019-03-22 18:22:45 -07:00
Jingyu Zhou 12917d8c7d Add actors to store halt request futures
Address best fitness in checking better DD or RK.
2019-03-22 18:06:38 -07:00
Jingyu Zhou e8977aeb98 Remove clusterControllerDcId check
This is no longer needed since it'll be set in the ctor.
2019-03-22 18:01:54 -07:00
Evan Tschannen 82bc447e29 startRatekeeper is responsible for updating serverDBInfo 2019-03-22 17:56:16 -07:00
Evan Tschannen 82c80c225d make sure id_worker is updated before setting ratekeeper or data distribution 2019-03-22 17:08:54 -07:00
Evan Tschannen 6a9c9d79cc
Update fdbserver/ClusterController.actor.cpp 2019-03-22 17:00:58 -07:00