Commit Graph

173 Commits

Author SHA1 Message Date
Jingyu Zhou 0db03f1d3c Use backup_logging_enabled flag
The default is to enable new backup workers. Users can disable this flag to
turn off the backup worker feature.
2020-02-03 20:03:22 -08:00
Jingyu Zhou 297f22726c Add backup_type database configuration option
Update simulation tests to randomly set backup types to be one of: old backup
(default), new backup (tagged), or both (default+tagged).
2020-01-31 19:29:09 -08:00
mengranwo 227edd4248 change memory storage engine name from memory-radixtree to memory-radixtree-beta 2020-01-15 13:49:45 -08:00
mengranwo f597aa7e18 WIP : deployable/stable version since Nov 3. Start rebase to master branch 2020-01-15 13:49:45 -08:00
negoyal a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
Evan Tschannen 688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Evan Tschannen ef0890c23a updated status schema 2019-10-16 22:37:57 -07:00
Evan Tschannen 35e816e9ad added the ability to configure satellite_logs by satellite location, this will overwrite the region configure if both are present 2019-10-14 18:30:15 -07:00
Meng Xu d0147e5e5d Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
Resolved Conflicts:
	documentation/sphinx/source/release-notes.rst
	fdbserver/DataDistribution.actor.cpp
	versions.target
2019-10-02 13:22:56 -07:00
Evan Tschannen 045175bd0e added tracking for the size of the system keyspace 2019-09-27 22:39:19 -07:00
A.J. Beamon 6100d3274d
Merge pull request #2058 from tclinken/expose-lock-status
Added lockUID to status output if database is locked
2019-09-11 08:47:35 -07:00
Trevor Clinkenbeard 8c31a839be s/lockUID/lock_uid in status 2019-09-06 22:20:55 -07:00
Trevor Clinkenbeard 2d216f7ae5 Added database_lock_state to statusSchema 2019-09-06 22:20:50 -07:00
A.J. Beamon 3f9e392668
Merge pull request #2014 from etschannen/feature-fdbcli-sleep
Added a sleep command to fdbcli
2019-08-30 11:22:13 -07:00
Evan Tschannen f3bc7e0abd do not duplicate data distribution disabled fields in status
fixed a few bugs related to the existing data distribution disabled fields in status
2019-08-29 18:41:34 -07:00
A.J. Beamon 2b80d836f4 Merge branch 'release-6.2' into add-coordinator-to-status-roles-list
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-08-19 15:03:59 -07:00
A.J. Beamon b8e57f37d7 Add 'coordinator' to the list of roles that a process can have in status. 2019-08-15 14:42:49 -07:00
A.J. Beamon bb72cdd36a Report lag with the usual "seconds" and "versions" fields. Rename and deprecate the qos.*version_lag_storage_server fields. 2019-08-15 13:42:39 -07:00
A.J. Beamon 6581161dd3 Add ratekeeper's durability lag statistics to status 2019-08-15 11:07:04 -07:00
A.J. Beamon 438bc636d5 Rename max_machine_failures_without_losing_X to max_zone_failures_without_losing_X in status. 2019-07-30 14:02:31 -07:00
Evan Tschannen 90e3b50213 Merge branch 'master' into feature-coordinator-connection
# Conflicts:
#	fdbclient/DatabaseContext.h
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen be5d144b8b added status information on connected clients 2019-07-25 17:15:31 -07:00
Evan Tschannen 846038b0e6
Merge pull request #1858 from bnamasivayam/rk-ssfetch-throttle
Ratekeeper throttling aggressively when unable to fetch storage server list
2019-07-19 16:41:58 -07:00
Evan Tschannen 94c66f8d58
Merge pull request #1738 from bnamasivayam/consistency-check-disable
Disable/Re-enable consistency check through a database key.
2019-07-18 10:56:02 -07:00
Balachandar Namasivayam 406bcebdc4 Ratekeeper to throttle tpsLimit to 1 if it is not able to fetch storage server list for some configurable amount of time. 2019-07-17 18:08:17 -07:00
A.J. Beamon 2cd05e9ac9
Merge pull request #1712 from tclinken/add-local-rk-to-status
Track the local ratekeeper rate in status
2019-07-15 15:17:11 -07:00
Balachandar Namasivayam 9169232fa9 Add the new messages to Schema. 2019-07-15 13:47:27 -07:00
Evan Tschannen 1a18c859c7 knobified the durability lag rate controls 2019-07-12 18:50:56 -07:00
A.J. Beamon f31884c749 Merge branch 'master' into add-priority-starts-to-status
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-07-11 15:26:52 -07:00
A.J. Beamon 97609ad991 Add information about transaction starts at different priorities to status. 2019-07-11 13:54:44 -07:00
A.J. Beamon b4dbc6d7fa Change the way cache hits and misses are tracked to avoid counting blind page writes as misses and count the results of partial page writes. Report cache hit rate in status. 2019-07-10 14:43:20 -07:00
A.J. Beamon 69d7c4f79c Merge branch 'master' into track-run-loop-busyness
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	flow/Net2.actor.cpp
#	flow/network.h
2019-07-09 18:39:23 -07:00
Trevor Clinkenbeard 1bac04509e Track the local ratekeeper rate as a percentage
This value is reported in status for each storage server.
2019-07-09 12:46:53 -07:00
A.J. Beamon 4be08d9b2d Rename datacenter_version_difference to datacenter_lag and include both seconds and versions. 2019-07-05 14:36:18 -07:00
Evan Tschannen b9a6271375 local ratekeeper no longer globally limits 2019-06-28 16:54:22 -07:00
Evan Tschannen 92b32855ca ratekeeper’s control algorithm would oscillate when limited by local ratekeeper 2019-06-28 16:54:22 -07:00
A.J. Beamon 7f23814841 Track run loop busyness and report it in status. 2019-06-26 14:03:02 -07:00
Evan Tschannen dccb9bc26d fixed a number of correctness problems 2019-06-12 19:40:50 -07:00
Evan Tschannen 8590b710bf added additional logging on the logs and log routers 2019-05-02 17:24:39 -07:00
Evan Tschannen 3356ac27bf added three_data_hall_fallback configuration 2019-04-07 22:58:18 -07:00
Evan Tschannen 628fec8c8b updated status with information about ongoing maintenance
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
Jingyu Zhou b81de9831f Fix SchemaMismatch error
Add data_distributor and ratekeeper roles to schema.
2019-03-27 09:54:01 -07:00
Evan Tschannen eb54a700ba changed the old memory configuration to memory-1 2019-03-18 15:10:04 -07:00
Evan Tschannen a372c7cf18 configure memory now selects the ssd engine for transaction log spilling. Transaction log spilling is only used when the transaction logs cannot keep all of the unpopped mutations it has in memory. If we are only using this data structure because we do not have enough memory, it is much less safe to use the memory storage engine for this purpose. 2019-03-16 22:48:24 -07:00
Meng Xu 5a10bf5dfc Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-14 10:35:12 -07:00
Evan Tschannen e068c478b5 merge master 2019-03-12 18:31:25 -07:00
Meng Xu 435e515985 Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-11 11:17:40 -07:00
Evan Tschannen 80c3f2f8e2 added status fields detailing which processes are degraded, and also the total number of degraded processes 2019-03-10 22:58:15 -07:00
Jingyu Zhou 7340998261 Fix status message for ratekeeper 2019-03-07 13:16:20 -08:00
Meng Xu 845f8fdcbc Status:healthy: Add optimizing_team_collections
Change removing_redundant_teams status name to
optimizing_team_collections.
The new name is more general and can be applied in the future
when we switch storage engines.
2019-03-06 15:05:23 -08:00
Meng Xu 04880e3d4d Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-06 13:41:16 -08:00
Meng Xu b7a52e81e2 Status: Count connected coordinators per client
A client will always try to connect all coordinators.
This commit let Status track the number of connected coordinators
for each client.

This allows us to do canary in coordinators. For example,
when we switch from non-TLS to TLS, we can switch 1 coordinator
from non-TLS to TLS. This can help check if a client has the ability
to connect through TLS.
We can make the non-TLS to TLS switch for each coordinators
one by one. This avoid the risk of losing connection in the switch.
2019-03-05 21:21:23 -08:00
Meng Xu 94385447bc Status: Get if client configured TLS
To understand if all clients have configured TLS,
we check the tlsoption when a client tries to open database.
This is similar to how we track the versions of multi-version clients.
2019-03-01 15:17:01 -08:00
A.J. Beamon 3e6a6a6569 Update status schema for correctness. Send the count of batch transactions started back to ratekeeper so that it can be logged with other ratekeeper metrics. 2019-02-28 12:00:58 -08:00
Evan Tschannen 8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Alex Miller 6d23eb2d1a Implement log_version.
This mega-commit introduces a new configuration setting, `log_version`,
that controls the TLog implementations and features that are available
within FDB, so that users can opt in to new features if they're willing
to sacrifice backwards compatibility.
2019-02-22 12:15:23 -08:00
Meng Xu 64db109f20 Status: Add schema for the new data distributor role 2019-02-22 10:05:12 -08:00
Meng Xu 9445ac0b0c Status: Use new data distributor worker to publish status
After we add a new data distributor role, we publish the data
related to data distributor and rate keeper through the new
role (and new worker).

So the status needs to contact the data distributor, instead of master,
to get the status information.
2019-02-21 18:05:50 -08:00
Meng Xu db19b08762 TeamRemover: Add new status to fdbcli
Add the healthy_removing_redundant_teams status to fdbcli
2019-02-21 15:03:32 -08:00
Alex Miller bf8bfb8137 Set log_spill in SimulationConfig.
Which also revealed that it needed to be added to the schema.
2019-02-19 22:30:15 -08:00
A.J. Beamon d4349293b9 Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing. 2019-02-07 13:39:22 -08:00
A.J. Beamon 2198d24ce1 Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
# Conflicts:
#	fdbclient/MasterProxyInterface.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/ServerDBInfo.h
#	fdbserver/Status.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
A.J. Beamon 8e05e95045 Added the ability to configure the latency band settings by setting a special key in \xff keyspace. 2019-01-18 16:18:34 -08:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Stephen Atherton 3ea9193fa7 Renamed redwood to redwood-experimental. UnitTest names can now be hidden using # as the first character so that random correctness tests will not run them. Excluded redwood tests from correctness testing. Reverted default storage engine to ssd. 2018-10-05 14:43:54 -07:00
Stephen Atherton 7c1dc305cb Merge commit 'a72c8f5cb2e79a673abc0ed3d27ef1c51028fb13' into feature-redwood 2018-10-05 10:15:10 -07:00
A.J. Beamon a98fcf5972 Rename durable_lag to durability_lag 2018-10-01 09:58:49 -07:00
A.J. Beamon f196e2d4dc Lot metrics about read requests as well as completed reads. 2018-09-27 15:32:39 -07:00
A.J. Beamon 118e21c446 Add new metrics for bytes queried, keys queried, mutation bytes, mutations, and durable lag to the storage role in status. 2018-09-27 14:33:21 -07:00
Stephen Atherton da70ba7e68 Update status schema to know about redwood storage engine. Remove status schema document which is no longer used. 2018-09-20 11:01:21 -07:00
Evan Tschannen 282e9e41c2 fix: windows build was broken because statusSchema string was too long 2018-09-05 11:40:04 -07:00
Evan Tschannen 40f5dbe423 fixed issues from review, added a safeguard to prevent configuring a cluster to an invalid configuration 2018-09-04 22:16:35 -07:00
Evan Tschannen d8ea3dbf9a Added the ability to configure a cluster from a JSON file 2018-08-16 17:34:59 -07:00