Commit Graph

226 Commits

Author SHA1 Message Date
A.J. Beamon 6a6ea56596 Restore line that stores the data lag seconds of a storage server. This value is used to add a data lag message to status. 2020-10-20 10:12:00 -07:00
Evan Tschannen 8befb0829d
Merge pull request #3481 from ajbeamon/fix-dc-timeout-message
Add missing messages to schema and rename one to match later versions
2020-07-10 10:30:21 -07:00
A.J. Beamon b51beead53 The backport of a change in later versions didn't include some updates to the schema and a change to the name of one of the messages. 2020-07-09 16:58:13 -07:00
A.J. Beamon 04d1217941 Track statistics about server-side request latency on each process, to include min, max, mean, and various percentiles. 2020-07-09 16:39:15 -07:00
A.J. Beamon e10704fd76 Cherry-pick region related status changes from 6.3 2020-06-09 14:56:21 -07:00
Evan Tschannen ed4d02a3e4
Merge pull request #2812 from etschannen/feature-proxy-mem-limit
Limit the amount of requests the proxy can queue up in memory
2020-03-16 14:56:56 -07:00
A.J. Beamon fe19f30999
Merge pull request #2813 from etschannen/feature-satellite-usable-regions
do not recruit satellite tlogs when usable regions=1
2020-03-16 11:54:42 -07:00
A.J. Beamon f2defc3a3a
Merge pull request #2814 from etschannen/feature-delay-recovery
Prevent coordinated state from filling up with too many old generations
2020-03-16 11:45:17 -07:00
Evan Tschannen e5d53c863b report in status the number of active generations 2020-03-16 10:29:17 -07:00
Evan Tschannen 04b752b40a Added additional logging related to memory errors (including in status) 2020-03-13 18:31:22 -07:00
Evan Tschannen 4640edf5d6 do not recruit satellite tlogs when usable regions=1 2020-03-13 10:24:52 -07:00
Alex Miller d86a601b84 Add cluster.processes.id.network.tls_policy.hz to status.
This allows monitoring of TLS policy failures, but one has to go scrape
for TLSPolicyFailure trace events to figure out why they're happening.
2020-03-13 02:46:10 -07:00
Evan Tschannen 6296465e07 Make the DD priority associated with populating a remote region lower than machine failures 2020-03-04 14:07:32 -08:00
A.J. Beamon 78cb1071dc Status should use the full list of proxies 2020-02-07 15:44:02 -08:00
Evan Tschannen 35ac0071a8 fixed a compiler error 2019-10-22 17:06:54 -07:00
Evan Tschannen 2d74288d16 Added a comment to clarify why cleanup work is done in status 2019-10-22 16:33:44 -07:00
Evan Tschannen 3478652d06
Apply suggestions from code review
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:32:09 -07:00
Evan Tschannen d5c2147c0c
Update fdbserver/Status.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:27:52 -07:00
Evan Tschannen 2caad04d9c Keys in the destUIDLookupPrefix can be cleaned up automatically if they do not have an associated entry in the logRangesRange keyspace 2019-10-22 11:58:40 -07:00
Evan Tschannen 42b7acf7b7
Merge pull request #2202 from etschannen/feature-share-mutations
Backup and DR would not share mutations if started on different versions of FDB
2019-10-16 20:28:39 -07:00
Evan Tschannen 587cbefe7f duplicate mutation stream checker did not have a timeout
duplicate mutation stream did not work properly when multiple ranges exist with the same begin key
2019-10-16 20:17:09 -07:00
Evan Tschannen 5be773f145
Update fdbserver/Status.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:24 -07:00
Evan Tschannen 2facfc090b
Update fdbserver/Status.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:12 -07:00
Evan Tschannen 552eb44bf8
Merge pull request #2230 from ajbeamon/fix-fault-tolerance-reporting-with-remote-regions
Fix: status would fail to account for remote regions when...
2019-10-16 14:51:48 -07:00
Evan Tschannen 8b09cd16b2 Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-share-mutations 2019-10-16 14:50:37 -07:00
Evan Tschannen 86bcb84b45 Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards 2019-10-11 17:50:43 -07:00
A.J. Beamon a6da9d3df5 Fix: status would fail to account for remote regions when computing fault tolerance in the presence of a failure on the primary. 2019-10-10 10:36:35 -07:00
Evan Tschannen 628b4e0220 added a warning if multiple log ranges exist for the same range 2019-10-02 17:06:19 -07:00
Evan Tschannen 045175bd0e added tracking for the size of the system keyspace 2019-09-27 22:39:19 -07:00
Evan Tschannen 945cff1e5b the cluster controller caches the serialization of serverDBInfo, to avoid regenerating it many times 2019-09-10 14:27:22 -07:00
Jingyu Zhou e551523b04 Fix the same iterator bug of passing the end 2019-09-05 11:36:34 -07:00
Jingyu Zhou 73044bdc36 Fix a crash failure due to iterator passing the end 2019-09-05 11:34:11 -07:00
A.J. Beamon 3f9e392668
Merge pull request #2014 from etschannen/feature-fdbcli-sleep
Added a sleep command to fdbcli
2019-08-30 11:22:13 -07:00
Evan Tschannen f3bc7e0abd do not duplicate data distribution disabled fields in status
fixed a few bugs related to the existing data distribution disabled fields in status
2019-08-29 18:41:34 -07:00
Evan Tschannen 0b0c9fe0ff data distribution status was combined into regular status 2019-08-21 14:44:15 -07:00
A.J. Beamon 2b80d836f4 Merge branch 'release-6.2' into add-coordinator-to-status-roles-list
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-08-19 15:03:59 -07:00
A.J. Beamon b8e57f37d7 Add 'coordinator' to the list of roles that a process can have in status. 2019-08-15 14:42:49 -07:00
A.J. Beamon bb72cdd36a Report lag with the usual "seconds" and "versions" fields. Rename and deprecate the qos.*version_lag_storage_server fields. 2019-08-15 13:42:39 -07:00
A.J. Beamon 6581161dd3 Add ratekeeper's durability lag statistics to status 2019-08-15 11:07:04 -07:00
Evan Tschannen 70ce678879 fix: max_protocol_clients were being added to the connected_clients list
fix: the clientCount was included clients with unknown protocol versions. This has been changed back to the pre-6.2 behavior where it is just a count of clients with known versions, and now clients with unknown versions are tracked explicitly as its own supported_version section
2019-08-13 15:54:40 -07:00
A.J. Beamon 476641a087
Merge pull request #1929 from jzhou77/fix-warning
Fix compiler warnings
2019-08-01 11:15:41 -07:00
Jingyu Zhou 37450be706 Fix format usage for currentProtocolVersion
ProtocolVersion now is a class.
2019-08-01 10:19:46 -07:00
Xin Dong 1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong c6e5472d8d Apply suggestions from code review
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-07-30 22:20:45 -07:00
Xin Dong ae11efcb0a Made following changes:
- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
2019-07-30 22:20:45 -07:00
A.J. Beamon 438bc636d5 Rename max_machine_failures_without_losing_X to max_zone_failures_without_losing_X in status. 2019-07-30 14:02:31 -07:00
Evan Tschannen 90e3b50213 Merge branch 'master' into feature-coordinator-connection
# Conflicts:
#	fdbclient/DatabaseContext.h
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen ee92f0574f fix: lastRequestTime was not updated
fix: COORDINATOR_REGISTER_INTERVAL was not set
fixed review comments
2019-07-26 13:23:56 -07:00
Evan Tschannen be5d144b8b added status information on connected clients 2019-07-25 17:15:31 -07:00
Evan Tschannen 4a866290b7 Clients keep a persistent connection open with coordinators to get updates to the list of proxies
Status still needs to be updated with client information with information from the coordinators
2019-07-23 19:22:44 -07:00