Evan Tschannen
2038a56ff4
Merge pull request #2819 from etschannen/feature-first-proxy
...
A "proxy" class process would not be preferred as the "first proxy" for restore and DR purposes
2020-03-16 13:53:28 -07:00
A.J. Beamon
ee3cde0b0d
Merge pull request #2815 from etschannen/feature-timeout-tlog-create
...
Treat a tlog which takes a long time to create its disk queue as failed
2020-03-16 12:49:33 -07:00
Evan Tschannen
77dde00da7
Merge pull request #2818 from ajbeamon/increase-metrics-priority
...
Increase priority of the logging of various metrics trace events
2020-03-16 11:57:37 -07:00
Evan Tschannen
7adc916e18
Merge pull request #2806 from ajbeamon/improve-team-request-performance
...
Improve performance of get team requests.
2020-03-16 11:56:45 -07:00
A.J. Beamon
fe19f30999
Merge pull request #2813 from etschannen/feature-satellite-usable-regions
...
do not recruit satellite tlogs when usable regions=1
2020-03-16 11:54:42 -07:00
Evan Tschannen
012344e297
refactor getWorkersForRoleInDatacenter
2020-03-16 11:50:17 -07:00
A.J. Beamon
f2defc3a3a
Merge pull request #2814 from etschannen/feature-delay-recovery
...
Prevent coordinated state from filling up with too many old generations
2020-03-16 11:45:17 -07:00
Evan Tschannen
ea98c7a40a
added additional timeout on initPersistentState
2020-03-16 11:38:14 -07:00
A.J. Beamon
682b9faa1a
Merge pull request #2817 from etschannen/feature-fix-0-left
...
fix: do not use priority 0 left when calculating priorities for empty teams
2020-03-16 11:15:12 -07:00
A.J. Beamon
5f4373c200
Merge pull request #2811 from alexmiller-apple/tls-failures-status
...
Add TLS Policy Failure count to ProcessMetrics and status json
2020-03-16 11:11:30 -07:00
Evan Tschannen
56dee89e6e
active generations should include the current one
2020-03-16 11:09:42 -07:00
Evan Tschannen
76db8343c0
update status schema
2020-03-16 11:00:51 -07:00
Evan Tschannen
e5d53c863b
report in status the number of active generations
2020-03-16 10:29:17 -07:00
Evan Tschannen
818537ed2d
Update fdbserver/masterserver.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-03-14 15:04:46 -07:00
Evan Tschannen
0ca89547a5
make sure the number of logRouterTags is larger than the number of satelliteTLogs to avoid having satellites with no data.
2020-03-14 15:02:19 -07:00
Evan Tschannen
79d5511149
A "proxy" class process would not be preferred as the "first proxy" for restore and DR purposes
2020-03-13 17:49:02 -07:00
A.J. Beamon
031b579ede
Increase priority of the logging of various metrics trace events.
2020-03-13 16:20:23 -07:00
Evan Tschannen
3a0af091b2
Merge pull request #2804 from ajbeamon/mismatched-file-identifier-logging
...
Add logging when file identifiers don't match
2020-03-13 16:16:44 -07:00
Evan Tschannen
7f00d674a0
Merge pull request #2807 from ajbeamon/fix-status-truncation-on-null-byte
...
Escape unprintable characters in status JSON output read from magic key
2020-03-13 16:10:41 -07:00
Evan Tschannen
2f2f56020f
Update fdbserver/masterserver.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-03-13 15:54:13 -07:00
A.J. Beamon
700b13e5f8
Remember the best team from team requests, which will likely be the best again and can save us some computation.
2020-03-13 15:21:33 -07:00
Evan Tschannen
12f2b32770
added additional logging in data distribution
2020-03-13 15:19:33 -07:00
Alex Miller
5be7fa52bc
Remove comma, and add schema change to documentation
2020-03-13 14:51:56 -07:00
Evan Tschannen
9e99a00c8f
fix: do not use priority 0 left when calculating priorities for empty teams
2020-03-13 13:56:46 -07:00
Evan Tschannen
d6d347f665
treat a tlog which takes a long time to create its disk queue as failed
2020-03-13 10:31:59 -07:00
Evan Tschannen
a39effa57d
delay recoveries after 70 outstanding generations, and stop recoveries after 100 outstanding generations to prevent a death spiral from filling up the coordinated state
2020-03-13 10:28:32 -07:00
Evan Tschannen
4640edf5d6
do not recruit satellite tlogs when usable regions=1
2020-03-13 10:24:52 -07:00
Alex Miller
04498cbc0e
Make policy failures be reported as per 1s and not over 5s.
2020-03-13 02:49:06 -07:00
Alex Miller
d86a601b84
Add cluster.processes.id.network.tls_policy.hz to status.
...
This allows monitoring of TLS policy failures, but one has to go scrape
for TLSPolicyFailure trace events to figure out why they're happening.
2020-03-13 02:46:10 -07:00
Alex Miller
75e2fffe5a
Add a ProcessMetrics.TLSPolicyFailures metric
...
This reports the number of policy failures over the past 5s interval.
It also is step 1 towards getting this information into status json.
2020-03-13 02:24:37 -07:00
A.J. Beamon
f7198c4ba3
Use the std::string constructor of StringRef, which will use the length of string correctly.
2020-03-12 12:35:08 -07:00
A.J. Beamon
6940d546f5
Fix bug where status is truncated when a null byte is included. This is implemented by escaping unprintable characters.
2020-03-12 12:27:53 -07:00
A.J. Beamon
555db50cd1
Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist.
2020-03-12 11:22:03 -07:00
A.J. Beamon
8cdf918316
Add logging when file identifiers don't match
2020-03-12 11:06:53 -07:00
Evan Tschannen
53d4798c75
Merge pull request #2789 from etschannen/post-release-cleanup-6.2.18
...
Post release cleanup 6.2.18
2020-03-06 13:58:56 -08:00
Evan Tschannen
f2cb743cfa
update installer WIX GUID following release
2020-03-06 13:58:03 -08:00
Evan Tschannen
4020919185
update version to 6.2.19
2020-03-06 13:58:02 -08:00
Evan Tschannen
ca5782c2a4
Merge pull request #2788 from etschannen/release-6.2
...
updated documentation for 6.2.18
2020-03-06 11:20:49 -08:00
Evan Tschannen
15f1a75d4f
updated documentation for 6.2.18
2020-03-06 11:16:10 -08:00
Evan Tschannen
dbfc0cbcc0
Merge pull request #2781 from alexmiller-apple/certificate-refresh
...
Refresh certificates used for handshaking when they change on disk
2020-03-06 11:12:04 -08:00
Alex Miller
f9969a853c
Merge remote-tracking branch 'origin/certificate-refresh' into certificate-refresh
2020-03-06 11:10:05 -08:00
Alex Miller
188d9b8239
Don't swallow actor cancellation in certificate refreshing.
2020-03-06 11:09:17 -08:00
Alex Miller
9b760fae2d
Rewrite all Errors into tls_errors if they happen as part of initializing TLS.
2020-03-06 11:06:19 -08:00
Alex Miller
1f56bf8933
Fix the build with success()
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-03-06 10:15:04 -08:00
Evan Tschannen
98647a61fc
Merge pull request #2784 from ajbeamon/add-resolver-metrics
...
Add ResolverMetrics trace event
2020-03-06 09:38:30 -08:00
Evan Tschannen
a3662c68e8
Merge pull request #2786 from ajbeamon/add-new-transaction-metrics
...
Add more metrics to the TransactionMetrics event
2020-03-06 09:34:16 -08:00
A.J. Beamon
faf9101ad4
Update fdbserver/Resolver.actor.cpp
...
Co-Authored-By: Evan Tschannen <36455792+etschannen@users.noreply.github.com>
2020-03-06 09:20:38 -08:00
A.J. Beamon
d59e25b0dc
Merge pull request #2787 from etschannen/feature-log-router-logging
...
Added additional logging
2020-03-06 09:20:18 -08:00
Alex Miller
ac52b6b474
Rework a bit of error and exception handling.
...
I went back and dug through all of the "what functions can throw what
types", and made sane decisions about them. boost errors are
aggressively translated into FDB ones, whcih might result in multiple
lines of logging about errors, but this is in infrequently run code, so
it should be fine.
2020-03-06 02:33:16 -08:00
Evan Tschannen
1076abdee5
fixed crash when interf was not created
2020-03-05 19:09:08 -08:00