Xin Dong
4363dd0f25
This resolves issue #3739 by exposing time since last full recovery.
2020-09-08 14:26:01 -07:00
Young Liu
23e1ff694c
Report missing old tlogs in recovery between accepting commits and storage recovered
2020-09-08 13:35:42 -07:00
XiaoxiWang
ecf2c0109c
more concise status json
2020-09-04 18:40:45 +00:00
XiaoxiWang
fb758bf937
update Schemas.cpp
2020-09-04 16:34:05 +00:00
Young Liu
9564171463
Merge branch 'master' into grv-proxy
2020-08-24 22:45:01 -07:00
XiaoxiWang
4e627691a9
add throttle objects into Schemas.cpp
2020-08-24 23:37:58 +00:00
Young Liu
79ce16650d
merge master branch
2020-08-11 19:22:10 -07:00
Young Liu
d6a23a4d6b
Resolve comments to make GRV proxy a separate process class
2020-08-06 00:01:57 -07:00
Chaoguang Lin
3ba940a63d
Change json string to snake_case
2020-07-28 11:42:03 -07:00
Chaoguang Lin
52178f9eae
Add json schema for the management api error message in special key space
2020-07-27 17:37:19 -07:00
A.J. Beamon
b09dddc07e
Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
...
# Conflicts:
# cmake/ConfigureCompiler.cmake
# documentation/sphinx/source/downloads.rst
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/fdbrpc.vcxproj
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen
8befb0829d
Merge pull request #3481 from ajbeamon/fix-dc-timeout-message
...
Add missing messages to schema and rename one to match later versions
2020-07-10 10:30:21 -07:00
A.J. Beamon
b51beead53
The backport of a change in later versions didn't include some updates to the schema and a change to the name of one of the messages.
2020-07-09 16:58:13 -07:00
A.J. Beamon
693595f4e5
Fix make build, fix GRV schema
2020-07-09 16:50:08 -07:00
A.J. Beamon
04d1217941
Track statistics about server-side request latency on each process, to include min, max, mean, and various percentiles.
2020-07-09 16:39:15 -07:00
Andrew Noyes
409ccf3be2
Use snake_case to match status json convention
2020-06-30 17:05:29 +00:00
Andrew Noyes
e1dfa410c1
Remove "Storages" field from data_distribution_stats
...
It seems like a cool idea, but it feels rushed to me. Reverting for now.
2020-06-30 15:24:16 +00:00
Andrew Noyes
403274bba8
Add schemas, and check dataDistributionStatsSchema
2020-06-30 15:24:16 +00:00
Daniel Smith
acbfe2e4c9
Revert "Revert "Initial RocksDB""
2020-06-15 12:45:36 -04:00
Jingyu Zhou
9cd1614c82
Revert "Initial RocksDB"
2020-06-11 15:29:46 -07:00
A.J. Beamon
e10704fd76
Cherry-pick region related status changes from 6.3
2020-06-09 14:56:21 -07:00
Daniel Smith
5d361fe532
Copy/paste rebase onto 6.3
2020-05-22 15:02:51 +00:00
A.J. Beamon
d636194d0d
Remove deprecated fields in status: worst_version_lag_storage_server and limiting_version_lag_storage_server
2020-05-19 13:12:10 -07:00
A.J. Beamon
d0c66d7282
Fix typo
2020-05-12 18:38:20 -07:00
A.J. Beamon
e0526e0095
Add busiest read tags to storage server status
2020-05-12 15:49:40 -07:00
A.J. Beamon
36da61dd9c
Merge branch 'master' into transaction-tagging
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbclient/vexillographer/fdb.options
2020-04-07 21:12:14 -07:00
A.J. Beamon
2309e9f156
Consistently use timeout instead of timedout in status messages.
2020-04-07 08:43:23 -07:00
Xin Dong
587419f984
Fix the missing schema field which caused a lot noise in nightly
2020-04-06 10:18:58 -07:00
A.J. Beamon
2336f073ad
Checkpointing a bunch of work on throttles. Rudimentary implementation of auto-throttling. Support for manual throttling via fdbcli. Throttles are stored in the system keyspace.
2020-04-03 15:24:14 -07:00
Xin Dong
e755583c07
Address review comments.
2020-04-01 15:13:04 -07:00
Xin Dong
a7e8bfad82
Fix the test failure, which was introduced by a typo
2020-03-30 15:24:08 -07:00
Xin Dong
012d41548e
Address review comments
2020-03-30 13:55:59 -07:00
Balachandar Namasivayam
58a9bfa78b
Merge pull request #2820 from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report
...
Fix/1977/add back trace event flush failure report
2020-03-18 16:11:44 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
Evan Tschannen
ed4d02a3e4
Merge pull request #2812 from etschannen/feature-proxy-mem-limit
...
Limit the amount of requests the proxy can queue up in memory
2020-03-16 14:56:56 -07:00
A.J. Beamon
f2defc3a3a
Merge pull request #2814 from etschannen/feature-delay-recovery
...
Prevent coordinated state from filling up with too many old generations
2020-03-16 11:45:17 -07:00
Evan Tschannen
76db8343c0
update status schema
2020-03-16 11:00:51 -07:00
Evan Tschannen
04b752b40a
Added additional logging related to memory errors (including in status)
2020-03-13 18:31:22 -07:00
Alex Miller
5be7fa52bc
Remove comma, and add schema change to documentation
2020-03-13 14:51:56 -07:00
Alex Miller
04498cbc0e
Make policy failures be reported as per 1s and not over 5s.
2020-03-13 02:49:06 -07:00
Alex Miller
d86a601b84
Add cluster.processes.id.network.tls_policy.hz to status.
...
This allows monitoring of TLS policy failures, but one has to go scrape
for TLSPolicyFailure trace events to figure out why they're happening.
2020-03-13 02:46:10 -07:00
Xin Dong
5967ef5eab
Added back the changes that report trace log flush failures and fix the random crash
2020-03-12 14:34:19 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
15f1a75d4f
updated documentation for 6.2.18
2020-03-06 11:16:10 -08:00
Xin Dong
39610d15f8
Revert this change since it somehow introduced a random crash detected on circus
2020-03-04 16:14:38 -08:00
Xin Dong
090c89e90a
Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request.
2020-02-25 15:39:38 -08:00
Xin Dong
1c346fcfb0
Added the new issues into Status Schema. Remove the issue reporting in lastError since:
...
- If the issue string contains the error number, status schema needs to be super verbose to include all possible issue strings
- If the issue string does not contain the error number, the generic issue string can be pretty useless.
Thus now specific issues are being reported before calling lastError
2020-02-25 15:38:14 -08:00
Jingyu Zhou
52c6737411
Rename backupLoggingEnabled as backupWorkerEnabled
...
To highlight the changes for 7.0 backup changes. By default,
backup_worker_enabled flag is set for 7.0 version.
2020-02-04 10:09:16 -08:00
Jingyu Zhou
0db03f1d3c
Use backup_logging_enabled flag
...
The default is to enable new backup workers. Users can disable this flag to
turn off the backup worker feature.
2020-02-03 20:03:22 -08:00
Jingyu Zhou
297f22726c
Add backup_type database configuration option
...
Update simulation tests to randomly set backup types to be one of: old backup
(default), new backup (tagged), or both (default+tagged).
2020-01-31 19:29:09 -08:00
mengranwo
227edd4248
change memory storage engine name from memory-radixtree to memory-radixtree-beta
2020-01-15 13:49:45 -08:00
mengranwo
f597aa7e18
WIP : deployable/stable version since Nov 3. Start rebase to master branch
2020-01-15 13:49:45 -08:00
negoyal
a4a0bf18f9
Merging with Master.
2019-11-12 13:01:29 -08:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Evan Tschannen
ef0890c23a
updated status schema
2019-10-16 22:37:57 -07:00
Evan Tschannen
35e816e9ad
added the ability to configure satellite_logs by satellite location, this will overwrite the region configure if both are present
2019-10-14 18:30:15 -07:00
Meng Xu
d0147e5e5d
Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
...
Resolved Conflicts:
documentation/sphinx/source/release-notes.rst
fdbserver/DataDistribution.actor.cpp
versions.target
2019-10-02 13:22:56 -07:00
Evan Tschannen
045175bd0e
added tracking for the size of the system keyspace
2019-09-27 22:39:19 -07:00
A.J. Beamon
6100d3274d
Merge pull request #2058 from tclinken/expose-lock-status
...
Added lockUID to status output if database is locked
2019-09-11 08:47:35 -07:00
Trevor Clinkenbeard
8c31a839be
s/lockUID/lock_uid in status
2019-09-06 22:20:55 -07:00
Trevor Clinkenbeard
2d216f7ae5
Added database_lock_state to statusSchema
2019-09-06 22:20:50 -07:00
A.J. Beamon
3f9e392668
Merge pull request #2014 from etschannen/feature-fdbcli-sleep
...
Added a sleep command to fdbcli
2019-08-30 11:22:13 -07:00
Evan Tschannen
f3bc7e0abd
do not duplicate data distribution disabled fields in status
...
fixed a few bugs related to the existing data distribution disabled fields in status
2019-08-29 18:41:34 -07:00
A.J. Beamon
2b80d836f4
Merge branch 'release-6.2' into add-coordinator-to-status-roles-list
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-08-19 15:03:59 -07:00
A.J. Beamon
b8e57f37d7
Add 'coordinator' to the list of roles that a process can have in status.
2019-08-15 14:42:49 -07:00
A.J. Beamon
bb72cdd36a
Report lag with the usual "seconds" and "versions" fields. Rename and deprecate the qos.*version_lag_storage_server fields.
2019-08-15 13:42:39 -07:00
A.J. Beamon
6581161dd3
Add ratekeeper's durability lag statistics to status
2019-08-15 11:07:04 -07:00
A.J. Beamon
438bc636d5
Rename max_machine_failures_without_losing_X to max_zone_failures_without_losing_X in status.
2019-07-30 14:02:31 -07:00
Evan Tschannen
90e3b50213
Merge branch 'master' into feature-coordinator-connection
...
# Conflicts:
# fdbclient/DatabaseContext.h
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen
be5d144b8b
added status information on connected clients
2019-07-25 17:15:31 -07:00
Evan Tschannen
846038b0e6
Merge pull request #1858 from bnamasivayam/rk-ssfetch-throttle
...
Ratekeeper throttling aggressively when unable to fetch storage server list
2019-07-19 16:41:58 -07:00
Evan Tschannen
94c66f8d58
Merge pull request #1738 from bnamasivayam/consistency-check-disable
...
Disable/Re-enable consistency check through a database key.
2019-07-18 10:56:02 -07:00
Balachandar Namasivayam
406bcebdc4
Ratekeeper to throttle tpsLimit to 1 if it is not able to fetch storage server list for some configurable amount of time.
2019-07-17 18:08:17 -07:00
A.J. Beamon
2cd05e9ac9
Merge pull request #1712 from tclinken/add-local-rk-to-status
...
Track the local ratekeeper rate in status
2019-07-15 15:17:11 -07:00
Balachandar Namasivayam
9169232fa9
Add the new messages to Schema.
2019-07-15 13:47:27 -07:00
Evan Tschannen
1a18c859c7
knobified the durability lag rate controls
2019-07-12 18:50:56 -07:00
A.J. Beamon
f31884c749
Merge branch 'master' into add-priority-starts-to-status
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-07-11 15:26:52 -07:00
A.J. Beamon
97609ad991
Add information about transaction starts at different priorities to status.
2019-07-11 13:54:44 -07:00
A.J. Beamon
b4dbc6d7fa
Change the way cache hits and misses are tracked to avoid counting blind page writes as misses and count the results of partial page writes. Report cache hit rate in status.
2019-07-10 14:43:20 -07:00
A.J. Beamon
69d7c4f79c
Merge branch 'master' into track-run-loop-busyness
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# flow/Net2.actor.cpp
# flow/network.h
2019-07-09 18:39:23 -07:00
Trevor Clinkenbeard
1bac04509e
Track the local ratekeeper rate as a percentage
...
This value is reported in status for each storage server.
2019-07-09 12:46:53 -07:00
A.J. Beamon
4be08d9b2d
Rename datacenter_version_difference to datacenter_lag and include both seconds and versions.
2019-07-05 14:36:18 -07:00
Evan Tschannen
b9a6271375
local ratekeeper no longer globally limits
2019-06-28 16:54:22 -07:00
Evan Tschannen
92b32855ca
ratekeeper’s control algorithm would oscillate when limited by local ratekeeper
2019-06-28 16:54:22 -07:00
A.J. Beamon
7f23814841
Track run loop busyness and report it in status.
2019-06-26 14:03:02 -07:00
Evan Tschannen
dccb9bc26d
fixed a number of correctness problems
2019-06-12 19:40:50 -07:00
Evan Tschannen
8590b710bf
added additional logging on the logs and log routers
2019-05-02 17:24:39 -07:00
Evan Tschannen
3356ac27bf
added three_data_hall_fallback configuration
2019-04-07 22:58:18 -07:00
Evan Tschannen
628fec8c8b
updated status with information about ongoing maintenance
...
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
Jingyu Zhou
b81de9831f
Fix SchemaMismatch error
...
Add data_distributor and ratekeeper roles to schema.
2019-03-27 09:54:01 -07:00
Evan Tschannen
eb54a700ba
changed the old memory configuration to memory-1
2019-03-18 15:10:04 -07:00
Evan Tschannen
a372c7cf18
configure memory now selects the ssd engine for transaction log spilling. Transaction log spilling is only used when the transaction logs cannot keep all of the unpopped mutations it has in memory. If we are only using this data structure because we do not have enough memory, it is much less safe to use the memory storage engine for this purpose.
2019-03-16 22:48:24 -07:00
Meng Xu
5a10bf5dfc
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-14 10:35:12 -07:00
Evan Tschannen
e068c478b5
merge master
2019-03-12 18:31:25 -07:00
Meng Xu
435e515985
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-11 11:17:40 -07:00
Evan Tschannen
80c3f2f8e2
added status fields detailing which processes are degraded, and also the total number of degraded processes
2019-03-10 22:58:15 -07:00
Jingyu Zhou
7340998261
Fix status message for ratekeeper
2019-03-07 13:16:20 -08:00
Meng Xu
845f8fdcbc
Status:healthy: Add optimizing_team_collections
...
Change removing_redundant_teams status name to
optimizing_team_collections.
The new name is more general and can be applied in the future
when we switch storage engines.
2019-03-06 15:05:23 -08:00
Meng Xu
04880e3d4d
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-06 13:41:16 -08:00
Meng Xu
b7a52e81e2
Status: Count connected coordinators per client
...
A client will always try to connect all coordinators.
This commit let Status track the number of connected coordinators
for each client.
This allows us to do canary in coordinators. For example,
when we switch from non-TLS to TLS, we can switch 1 coordinator
from non-TLS to TLS. This can help check if a client has the ability
to connect through TLS.
We can make the non-TLS to TLS switch for each coordinators
one by one. This avoid the risk of losing connection in the switch.
2019-03-05 21:21:23 -08:00