Commit Graph

1763 Commits

Author SHA1 Message Date
Xin Dong a7e8bfad82 Fix the test failure, which was introduced by a typo 2020-03-30 15:24:08 -07:00
Xin Dong 012d41548e Address review comments 2020-03-30 13:55:59 -07:00
Balachandar Namasivayam 804fe1b22e Revert "Merge pull request #2257 from zjuLcg/report-conflicting-key"
This reverts commit 648dc4a933, reversing
changes made to 487d131b38.
2020-03-19 21:34:28 -07:00
Balachandar Namasivayam 58a9bfa78b
Merge pull request #2820 from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report
Fix/1977/add back trace event flush failure report
2020-03-18 16:11:44 -07:00
Balachandar Namasivayam a476127f5f
Merge pull request #2802 from xumengpanda/mengxu/debug-master-PR
Fix correctness failure on master branch
2020-03-18 16:07:36 -07:00
Evan Tschannen 648dc4a933
Merge pull request #2257 from zjuLcg/report-conflicting-key
Report conflicting keys
2020-03-18 13:39:42 -07:00
Evan Tschannen e08f0201f1 merge release 6.2 into master 2020-03-17 12:51:47 -07:00
Meng Xu 7f559bc712 Cleanup code and apply clang-format
Self code review
2020-03-16 15:08:32 -07:00
Evan Tschannen ed4d02a3e4
Merge pull request #2812 from etschannen/feature-proxy-mem-limit
Limit the amount of requests the proxy can queue up in memory
2020-03-16 14:56:56 -07:00
Meng Xu 1513df22f3 AutoQuorumChange:Exclude unreliable node from coordinator in simulation 2020-03-16 14:39:25 -07:00
Evan Tschannen a068d4063f renamed ProxyGetConsistentReadVersion 2020-03-16 12:11:32 -07:00
Evan Tschannen 77dde00da7
Merge pull request #2818 from ajbeamon/increase-metrics-priority
Increase priority of the logging of various metrics trace events
2020-03-16 11:57:37 -07:00
A.J. Beamon fe19f30999
Merge pull request #2813 from etschannen/feature-satellite-usable-regions
do not recruit satellite tlogs when usable regions=1
2020-03-16 11:54:42 -07:00
A.J. Beamon f2defc3a3a
Merge pull request #2814 from etschannen/feature-delay-recovery
Prevent coordinated state from filling up with too many old generations
2020-03-16 11:45:17 -07:00
A.J. Beamon 5f4373c200
Merge pull request #2811 from alexmiller-apple/tls-failures-status
Add TLS Policy Failure count to ProcessMetrics and status json
2020-03-16 11:11:30 -07:00
Evan Tschannen 76db8343c0 update status schema 2020-03-16 11:00:51 -07:00
Meng Xu 15c48b9e19 Add event for getDesired coordinators 2020-03-16 09:40:35 -07:00
Evan Tschannen 04b752b40a Added additional logging related to memory errors (including in status) 2020-03-13 18:31:22 -07:00
A.J. Beamon 031b579ede Increase priority of the logging of various metrics trace events. 2020-03-13 16:20:23 -07:00
chaoguang 39a37531db Fix issues according to Andrew's comments 2020-03-13 15:42:15 -07:00
Alex Miller 5be7fa52bc Remove comma, and add schema change to documentation 2020-03-13 14:51:56 -07:00
chaoguang c4c38c5eca Delete commented code 2020-03-13 12:58:12 -07:00
chaoguang 6e92716be7 update comments 2020-03-13 12:41:48 -07:00
Evan Tschannen a39effa57d delay recoveries after 70 outstanding generations, and stop recoveries after 100 outstanding generations to prevent a death spiral from filling up the coordinated state 2020-03-13 10:28:32 -07:00
Evan Tschannen 4640edf5d6 do not recruit satellite tlogs when usable regions=1 2020-03-13 10:24:52 -07:00
Evan Tschannen 243c268d9d Limit the amount of requests the proxy can queue up in memory 2020-03-13 10:17:49 -07:00
Alex Miller 04498cbc0e Make policy failures be reported as per 1s and not over 5s. 2020-03-13 02:49:06 -07:00
Alex Miller d86a601b84 Add cluster.processes.id.network.tls_policy.hz to status.
This allows monitoring of TLS policy failures, but one has to go scrape
for TLSPolicyFailure trace events to figure out why they're happening.
2020-03-13 02:46:10 -07:00
Xin Dong 5967ef5eab Added back the changes that report trace log flush failures and fix the random crash 2020-03-12 14:34:19 -07:00
A.J. Beamon f7198c4ba3 Use the std::string constructor of StringRef, which will use the length of string correctly. 2020-03-12 12:35:08 -07:00
A.J. Beamon 6940d546f5 Fix bug where status is truncated when a null byte is included. This is implemented by escaping unprintable characters. 2020-03-12 12:27:53 -07:00
chaoguang 6f90228a0b change to krmSetRangeCoalescing 2020-03-12 11:31:36 -07:00
Meng Xu 1759d5c8c4 Apply clang-format 2020-03-12 10:18:53 -07:00
chaoguang 4e8cb0cb96 add krmSetRangeCoalescing for RYWTr 2020-03-12 09:53:00 -07:00
chaoguang c2f0c41c52 use krmSetRange 2020-03-11 23:12:38 -07:00
chaoguang 0094293d50 add const vars 2020-03-11 23:11:49 -07:00
chaoguang 6ae60870fc use krmSetRange 2020-03-11 13:20:40 -07:00
chaoguang bdabb8638e Change prefix 2020-03-11 12:40:40 -07:00
chaoguang d1c56d3b57 add constant KeyRefs in SystemData 2020-03-11 12:25:50 -07:00
Meng Xu bd345f85db ConsistencyCheck:Fix failue due to address inconsistency between process and worker
With TLS, a worker (or process) can have a TLS address and non-TLS address.
When a process is created in simulation, the primary address is TLS by default.
The non-TLS one is the TLS address port plus one.

In a connection between two workers, if their primary addresses do not enable
or disable TLS together, one worker will swap its primary address and secondary address
so that the TLS config of the two endpoints can match.

The swap can make the primary address no longer the TLS one that was created
when the process is created. And the swap only happens for worker instead of
process struct in simulation.

This swap can cause worker->address != process->address.
In checkForExtraDataStores actor, we use worker->address to check if a process
is killable and use the process->address to kill the process. The inconsistency
can cause simulation to kill a protected process that is not killable and leads
to simulation failure.
2020-03-10 21:07:16 -07:00
chaoguang 698198a09e Merge remote-tracking branch 'upstream/master' into report-conflicting-key 2020-03-09 10:50:33 -07:00
Evan Tschannen 303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen 15f1a75d4f updated documentation for 6.2.18 2020-03-06 11:16:10 -08:00
Evan Tschannen dbfc0cbcc0
Merge pull request #2781 from alexmiller-apple/certificate-refresh
Refresh certificates used for handshaking when they change on disk
2020-03-06 11:12:04 -08:00
A.J. Beamon fd8d569b91 Fix a few typos. 2020-03-05 14:42:07 -08:00
A.J. Beamon 6479034645 Add more metrics to the TransactionMetrics event 2020-03-05 14:00:44 -08:00
Alex Miller 595dd77ed1 Merge remote-tracking branch 'upstream/release-6.2' into certificate-refresh 2020-03-04 20:25:42 -08:00
Alex Miller 9b5ef3416e Refactor TLSParams into TLSConfig + LoadedTLSConfig
The idea being that we keep around a TLSConfig that the configuration
that the user has provided, and then when we want to intialize an SSL
context, we ask the TLSConfig to load all certificates and return us a
LoadedTLSConfig that is a concrete set of certificate bytes in memory.

initTLS now just takes the in-memory bytes and applies them to the ssl
context.

This is a large refactor to lead up into certificate refeshing, where we
will periodically check for changes to the certificates, and then
re-load them and apply them to a new SSL context.
2020-03-04 20:14:47 -08:00
Xin Dong 39610d15f8 Revert this change since it somehow introduced a random crash detected on circus 2020-03-04 16:14:38 -08:00
Evan Tschannen c73cae0feb
Merge pull request #2760 from ajbeamon/client-version-fixes
Improvements to client version reporting
2020-03-04 15:52:49 -08:00