Commit Graph

7509 Commits

Author SHA1 Message Date
Evan Tschannen 2038a56ff4
Merge pull request #2819 from etschannen/feature-first-proxy
A "proxy" class process would not be preferred as the "first proxy" for restore and DR purposes
2020-03-16 13:53:28 -07:00
A.J. Beamon ee3cde0b0d
Merge pull request #2815 from etschannen/feature-timeout-tlog-create
Treat a tlog which takes a long time to create its disk queue as failed
2020-03-16 12:49:33 -07:00
Evan Tschannen 77dde00da7
Merge pull request #2818 from ajbeamon/increase-metrics-priority
Increase priority of the logging of various metrics trace events
2020-03-16 11:57:37 -07:00
Evan Tschannen 7adc916e18
Merge pull request #2806 from ajbeamon/improve-team-request-performance
Improve performance of get team requests.
2020-03-16 11:56:45 -07:00
A.J. Beamon fe19f30999
Merge pull request #2813 from etschannen/feature-satellite-usable-regions
do not recruit satellite tlogs when usable regions=1
2020-03-16 11:54:42 -07:00
Evan Tschannen 012344e297 refactor getWorkersForRoleInDatacenter 2020-03-16 11:50:17 -07:00
A.J. Beamon f2defc3a3a
Merge pull request #2814 from etschannen/feature-delay-recovery
Prevent coordinated state from filling up with too many old generations
2020-03-16 11:45:17 -07:00
Evan Tschannen ea98c7a40a added additional timeout on initPersistentState 2020-03-16 11:38:14 -07:00
A.J. Beamon 682b9faa1a
Merge pull request #2817 from etschannen/feature-fix-0-left
fix: do not use priority 0 left when calculating priorities for empty teams
2020-03-16 11:15:12 -07:00
A.J. Beamon 5f4373c200
Merge pull request #2811 from alexmiller-apple/tls-failures-status
Add TLS Policy Failure count to ProcessMetrics and status json
2020-03-16 11:11:30 -07:00
Evan Tschannen 56dee89e6e active generations should include the current one 2020-03-16 11:09:42 -07:00
Evan Tschannen 76db8343c0 update status schema 2020-03-16 11:00:51 -07:00
Evan Tschannen e5d53c863b report in status the number of active generations 2020-03-16 10:29:17 -07:00
Evan Tschannen 818537ed2d
Update fdbserver/masterserver.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-03-14 15:04:46 -07:00
Evan Tschannen 0ca89547a5 make sure the number of logRouterTags is larger than the number of satelliteTLogs to avoid having satellites with no data. 2020-03-14 15:02:19 -07:00
Evan Tschannen 79d5511149 A "proxy" class process would not be preferred as the "first proxy" for restore and DR purposes 2020-03-13 17:49:02 -07:00
A.J. Beamon 031b579ede Increase priority of the logging of various metrics trace events. 2020-03-13 16:20:23 -07:00
Evan Tschannen 3a0af091b2
Merge pull request #2804 from ajbeamon/mismatched-file-identifier-logging
Add logging when file identifiers don't match
2020-03-13 16:16:44 -07:00
Evan Tschannen 7f00d674a0
Merge pull request #2807 from ajbeamon/fix-status-truncation-on-null-byte
Escape unprintable characters in status JSON output read from magic key
2020-03-13 16:10:41 -07:00
Evan Tschannen 2f2f56020f
Update fdbserver/masterserver.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-03-13 15:54:13 -07:00
A.J. Beamon 700b13e5f8 Remember the best team from team requests, which will likely be the best again and can save us some computation. 2020-03-13 15:21:33 -07:00
Evan Tschannen 12f2b32770 added additional logging in data distribution 2020-03-13 15:19:33 -07:00
Alex Miller 5be7fa52bc Remove comma, and add schema change to documentation 2020-03-13 14:51:56 -07:00
Evan Tschannen 9e99a00c8f fix: do not use priority 0 left when calculating priorities for empty teams 2020-03-13 13:56:46 -07:00
Evan Tschannen d6d347f665 treat a tlog which takes a long time to create its disk queue as failed 2020-03-13 10:31:59 -07:00
Evan Tschannen a39effa57d delay recoveries after 70 outstanding generations, and stop recoveries after 100 outstanding generations to prevent a death spiral from filling up the coordinated state 2020-03-13 10:28:32 -07:00
Evan Tschannen 4640edf5d6 do not recruit satellite tlogs when usable regions=1 2020-03-13 10:24:52 -07:00
Alex Miller 04498cbc0e Make policy failures be reported as per 1s and not over 5s. 2020-03-13 02:49:06 -07:00
Alex Miller d86a601b84 Add cluster.processes.id.network.tls_policy.hz to status.
This allows monitoring of TLS policy failures, but one has to go scrape
for TLSPolicyFailure trace events to figure out why they're happening.
2020-03-13 02:46:10 -07:00
Alex Miller 75e2fffe5a Add a ProcessMetrics.TLSPolicyFailures metric
This reports the number of policy failures over the past 5s interval.
It also is step 1 towards getting this information into status json.
2020-03-13 02:24:37 -07:00
A.J. Beamon f7198c4ba3 Use the std::string constructor of StringRef, which will use the length of string correctly. 2020-03-12 12:35:08 -07:00
A.J. Beamon 6940d546f5 Fix bug where status is truncated when a null byte is included. This is implemented by escaping unprintable characters. 2020-03-12 12:27:53 -07:00
A.J. Beamon 555db50cd1 Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist. 2020-03-12 11:22:03 -07:00
A.J. Beamon 8cdf918316 Add logging when file identifiers don't match 2020-03-12 11:06:53 -07:00
Evan Tschannen 53d4798c75
Merge pull request #2789 from etschannen/post-release-cleanup-6.2.18
Post release cleanup 6.2.18
2020-03-06 13:58:56 -08:00
Evan Tschannen f2cb743cfa update installer WIX GUID following release 2020-03-06 13:58:03 -08:00
Evan Tschannen 4020919185 update version to 6.2.19 2020-03-06 13:58:02 -08:00
Evan Tschannen ca5782c2a4
Merge pull request #2788 from etschannen/release-6.2
updated documentation for 6.2.18
2020-03-06 11:20:49 -08:00
Evan Tschannen 15f1a75d4f updated documentation for 6.2.18 2020-03-06 11:16:10 -08:00
Evan Tschannen dbfc0cbcc0
Merge pull request #2781 from alexmiller-apple/certificate-refresh
Refresh certificates used for handshaking when they change on disk
2020-03-06 11:12:04 -08:00
Alex Miller f9969a853c Merge remote-tracking branch 'origin/certificate-refresh' into certificate-refresh 2020-03-06 11:10:05 -08:00
Alex Miller 188d9b8239 Don't swallow actor cancellation in certificate refreshing. 2020-03-06 11:09:17 -08:00
Alex Miller 9b760fae2d Rewrite all Errors into tls_errors if they happen as part of initializing TLS. 2020-03-06 11:06:19 -08:00
Alex Miller 1f56bf8933
Fix the build with success()
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-03-06 10:15:04 -08:00
Evan Tschannen 98647a61fc
Merge pull request #2784 from ajbeamon/add-resolver-metrics
Add ResolverMetrics trace event
2020-03-06 09:38:30 -08:00
Evan Tschannen a3662c68e8
Merge pull request #2786 from ajbeamon/add-new-transaction-metrics
Add more metrics to the TransactionMetrics event
2020-03-06 09:34:16 -08:00
A.J. Beamon faf9101ad4
Update fdbserver/Resolver.actor.cpp
Co-Authored-By: Evan Tschannen <36455792+etschannen@users.noreply.github.com>
2020-03-06 09:20:38 -08:00
A.J. Beamon d59e25b0dc
Merge pull request #2787 from etschannen/feature-log-router-logging
Added additional logging
2020-03-06 09:20:18 -08:00
Alex Miller ac52b6b474 Rework a bit of error and exception handling.
I went back and dug through all of the "what functions can throw what
types", and made sane decisions about them.  boost errors are
aggressively translated into FDB ones, whcih might result in multiple
lines of logging about errors, but this is in infrequently run code, so
it should be fine.
2020-03-06 02:33:16 -08:00
Evan Tschannen 1076abdee5 fixed crash when interf was not created 2020-03-05 19:09:08 -08:00