A.J. Beamon
6a6ea56596
Restore line that stores the data lag seconds of a storage server. This value is used to add a data lag message to status.
2020-10-20 10:12:00 -07:00
Evan Tschannen
8befb0829d
Merge pull request #3481 from ajbeamon/fix-dc-timeout-message
...
Add missing messages to schema and rename one to match later versions
2020-07-10 10:30:21 -07:00
A.J. Beamon
b51beead53
The backport of a change in later versions didn't include some updates to the schema and a change to the name of one of the messages.
2020-07-09 16:58:13 -07:00
A.J. Beamon
04d1217941
Track statistics about server-side request latency on each process, to include min, max, mean, and various percentiles.
2020-07-09 16:39:15 -07:00
A.J. Beamon
e10704fd76
Cherry-pick region related status changes from 6.3
2020-06-09 14:56:21 -07:00
Evan Tschannen
ed4d02a3e4
Merge pull request #2812 from etschannen/feature-proxy-mem-limit
...
Limit the amount of requests the proxy can queue up in memory
2020-03-16 14:56:56 -07:00
A.J. Beamon
fe19f30999
Merge pull request #2813 from etschannen/feature-satellite-usable-regions
...
do not recruit satellite tlogs when usable regions=1
2020-03-16 11:54:42 -07:00
A.J. Beamon
f2defc3a3a
Merge pull request #2814 from etschannen/feature-delay-recovery
...
Prevent coordinated state from filling up with too many old generations
2020-03-16 11:45:17 -07:00
Evan Tschannen
e5d53c863b
report in status the number of active generations
2020-03-16 10:29:17 -07:00
Evan Tschannen
04b752b40a
Added additional logging related to memory errors (including in status)
2020-03-13 18:31:22 -07:00
Evan Tschannen
4640edf5d6
do not recruit satellite tlogs when usable regions=1
2020-03-13 10:24:52 -07:00
Alex Miller
d86a601b84
Add cluster.processes.id.network.tls_policy.hz to status.
...
This allows monitoring of TLS policy failures, but one has to go scrape
for TLSPolicyFailure trace events to figure out why they're happening.
2020-03-13 02:46:10 -07:00
Evan Tschannen
6296465e07
Make the DD priority associated with populating a remote region lower than machine failures
2020-03-04 14:07:32 -08:00
A.J. Beamon
78cb1071dc
Status should use the full list of proxies
2020-02-07 15:44:02 -08:00
Evan Tschannen
35ac0071a8
fixed a compiler error
2019-10-22 17:06:54 -07:00
Evan Tschannen
2d74288d16
Added a comment to clarify why cleanup work is done in status
2019-10-22 16:33:44 -07:00
Evan Tschannen
3478652d06
Apply suggestions from code review
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:32:09 -07:00
Evan Tschannen
d5c2147c0c
Update fdbserver/Status.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:27:52 -07:00
Evan Tschannen
2caad04d9c
Keys in the destUIDLookupPrefix can be cleaned up automatically if they do not have an associated entry in the logRangesRange keyspace
2019-10-22 11:58:40 -07:00
Evan Tschannen
42b7acf7b7
Merge pull request #2202 from etschannen/feature-share-mutations
...
Backup and DR would not share mutations if started on different versions of FDB
2019-10-16 20:28:39 -07:00
Evan Tschannen
587cbefe7f
duplicate mutation stream checker did not have a timeout
...
duplicate mutation stream did not work properly when multiple ranges exist with the same begin key
2019-10-16 20:17:09 -07:00
Evan Tschannen
5be773f145
Update fdbserver/Status.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:24 -07:00
Evan Tschannen
2facfc090b
Update fdbserver/Status.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:12 -07:00
Evan Tschannen
552eb44bf8
Merge pull request #2230 from ajbeamon/fix-fault-tolerance-reporting-with-remote-regions
...
Fix: status would fail to account for remote regions when...
2019-10-16 14:51:48 -07:00
Evan Tschannen
8b09cd16b2
Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-share-mutations
2019-10-16 14:50:37 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
A.J. Beamon
a6da9d3df5
Fix: status would fail to account for remote regions when computing fault tolerance in the presence of a failure on the primary.
2019-10-10 10:36:35 -07:00
Evan Tschannen
628b4e0220
added a warning if multiple log ranges exist for the same range
2019-10-02 17:06:19 -07:00
Evan Tschannen
045175bd0e
added tracking for the size of the system keyspace
2019-09-27 22:39:19 -07:00
Evan Tschannen
945cff1e5b
the cluster controller caches the serialization of serverDBInfo, to avoid regenerating it many times
2019-09-10 14:27:22 -07:00
Jingyu Zhou
e551523b04
Fix the same iterator bug of passing the end
2019-09-05 11:36:34 -07:00
Jingyu Zhou
73044bdc36
Fix a crash failure due to iterator passing the end
2019-09-05 11:34:11 -07:00
A.J. Beamon
3f9e392668
Merge pull request #2014 from etschannen/feature-fdbcli-sleep
...
Added a sleep command to fdbcli
2019-08-30 11:22:13 -07:00
Evan Tschannen
f3bc7e0abd
do not duplicate data distribution disabled fields in status
...
fixed a few bugs related to the existing data distribution disabled fields in status
2019-08-29 18:41:34 -07:00
Evan Tschannen
0b0c9fe0ff
data distribution status was combined into regular status
2019-08-21 14:44:15 -07:00
A.J. Beamon
2b80d836f4
Merge branch 'release-6.2' into add-coordinator-to-status-roles-list
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-08-19 15:03:59 -07:00
A.J. Beamon
b8e57f37d7
Add 'coordinator' to the list of roles that a process can have in status.
2019-08-15 14:42:49 -07:00
A.J. Beamon
bb72cdd36a
Report lag with the usual "seconds" and "versions" fields. Rename and deprecate the qos.*version_lag_storage_server fields.
2019-08-15 13:42:39 -07:00
A.J. Beamon
6581161dd3
Add ratekeeper's durability lag statistics to status
2019-08-15 11:07:04 -07:00
Evan Tschannen
70ce678879
fix: max_protocol_clients were being added to the connected_clients list
...
fix: the clientCount was included clients with unknown protocol versions. This has been changed back to the pre-6.2 behavior where it is just a count of clients with known versions, and now clients with unknown versions are tracked explicitly as its own supported_version section
2019-08-13 15:54:40 -07:00
A.J. Beamon
476641a087
Merge pull request #1929 from jzhou77/fix-warning
...
Fix compiler warnings
2019-08-01 11:15:41 -07:00
Jingyu Zhou
37450be706
Fix format usage for currentProtocolVersion
...
ProtocolVersion now is a class.
2019-08-01 10:19:46 -07:00
Xin Dong
1922c39377
Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race.
2019-07-30 22:24:30 -07:00
Xin Dong
c6e5472d8d
Apply suggestions from code review
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-07-30 22:20:45 -07:00
Xin Dong
ae11efcb0a
Made following changes:
...
- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
2019-07-30 22:20:45 -07:00
A.J. Beamon
438bc636d5
Rename max_machine_failures_without_losing_X to max_zone_failures_without_losing_X in status.
2019-07-30 14:02:31 -07:00
Evan Tschannen
90e3b50213
Merge branch 'master' into feature-coordinator-connection
...
# Conflicts:
# fdbclient/DatabaseContext.h
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen
ee92f0574f
fix: lastRequestTime was not updated
...
fix: COORDINATOR_REGISTER_INTERVAL was not set
fixed review comments
2019-07-26 13:23:56 -07:00
Evan Tschannen
be5d144b8b
added status information on connected clients
2019-07-25 17:15:31 -07:00
Evan Tschannen
4a866290b7
Clients keep a persistent connection open with coordinators to get updates to the list of proxies
...
Status still needs to be updated with client information with information from the coordinators
2019-07-23 19:22:44 -07:00