Commit Graph

370 Commits

Author SHA1 Message Date
Evan Tschannen 2d74288d16 Added a comment to clarify why cleanup work is done in status 2019-10-22 16:33:44 -07:00
Evan Tschannen 3478652d06
Apply suggestions from code review
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:32:09 -07:00
Evan Tschannen d5c2147c0c
Update fdbserver/Status.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:27:52 -07:00
Evan Tschannen 2caad04d9c Keys in the destUIDLookupPrefix can be cleaned up automatically if they do not have an associated entry in the logRangesRange keyspace 2019-10-22 11:58:40 -07:00
Evan Tschannen 688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Evan Tschannen 42b7acf7b7
Merge pull request #2202 from etschannen/feature-share-mutations
Backup and DR would not share mutations if started on different versions of FDB
2019-10-16 20:28:39 -07:00
Evan Tschannen 587cbefe7f duplicate mutation stream checker did not have a timeout
duplicate mutation stream did not work properly when multiple ranges exist with the same begin key
2019-10-16 20:17:09 -07:00
Evan Tschannen 5be773f145
Update fdbserver/Status.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:24 -07:00
Evan Tschannen 2facfc090b
Update fdbserver/Status.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:12 -07:00
Evan Tschannen 552eb44bf8
Merge pull request #2230 from ajbeamon/fix-fault-tolerance-reporting-with-remote-regions
Fix: status would fail to account for remote regions when...
2019-10-16 14:51:48 -07:00
Evan Tschannen 8b09cd16b2 Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-share-mutations 2019-10-16 14:50:37 -07:00
Evan Tschannen 86bcb84b45 Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards 2019-10-11 17:50:43 -07:00
A.J. Beamon a6da9d3df5 Fix: status would fail to account for remote regions when computing fault tolerance in the presence of a failure on the primary. 2019-10-10 10:36:35 -07:00
Evan Tschannen 628b4e0220 added a warning if multiple log ranges exist for the same range 2019-10-02 17:06:19 -07:00
Meng Xu d0147e5e5d Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
Resolved Conflicts:
	documentation/sphinx/source/release-notes.rst
	fdbserver/DataDistribution.actor.cpp
	versions.target
2019-10-02 13:22:56 -07:00
Evan Tschannen 045175bd0e added tracking for the size of the system keyspace 2019-09-27 22:39:19 -07:00
Evan Tschannen b495cc697b Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-09-13 09:25:08 -07:00
A.J. Beamon 6100d3274d
Merge pull request #2058 from tclinken/expose-lock-status
Added lockUID to status output if database is locked
2019-09-11 08:47:35 -07:00
Evan Tschannen 945cff1e5b the cluster controller caches the serialization of serverDBInfo, to avoid regenerating it many times 2019-09-10 14:27:22 -07:00
Trevor Clinkenbeard 95ccf23e88 Use READ_LOCK_AWARE instead of LOCK_AWARE in lockedStatusFetcher
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-09-06 22:21:07 -07:00
Trevor Clinkenbeard 29fe5f16d3 Use READ_SYSTEM_KEYS instead of ACCESS_SYSTEM_KEYS in lockedStatusFetcher
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-09-06 22:21:07 -07:00
Trevor Clinkenbeard 8c31a839be s/lockUID/lock_uid in status 2019-09-06 22:20:55 -07:00
Trevor Clinkenbeard 2b6961826e Added lockUID to status output if database is locked 2019-09-06 22:20:34 -07:00
Jingyu Zhou e551523b04 Fix the same iterator bug of passing the end 2019-09-05 11:36:34 -07:00
Jingyu Zhou 73044bdc36 Fix a crash failure due to iterator passing the end 2019-09-05 11:34:11 -07:00
A.J. Beamon 3f9e392668
Merge pull request #2014 from etschannen/feature-fdbcli-sleep
Added a sleep command to fdbcli
2019-08-30 11:22:13 -07:00
Evan Tschannen f3bc7e0abd do not duplicate data distribution disabled fields in status
fixed a few bugs related to the existing data distribution disabled fields in status
2019-08-29 18:41:34 -07:00
Evan Tschannen 0b0c9fe0ff data distribution status was combined into regular status 2019-08-21 14:44:15 -07:00
A.J. Beamon 2b80d836f4 Merge branch 'release-6.2' into add-coordinator-to-status-roles-list
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-08-19 15:03:59 -07:00
A.J. Beamon b8e57f37d7 Add 'coordinator' to the list of roles that a process can have in status. 2019-08-15 14:42:49 -07:00
A.J. Beamon bb72cdd36a Report lag with the usual "seconds" and "versions" fields. Rename and deprecate the qos.*version_lag_storage_server fields. 2019-08-15 13:42:39 -07:00
A.J. Beamon 6581161dd3 Add ratekeeper's durability lag statistics to status 2019-08-15 11:07:04 -07:00
Evan Tschannen 70ce678879 fix: max_protocol_clients were being added to the connected_clients list
fix: the clientCount was included clients with unknown protocol versions. This has been changed back to the pre-6.2 behavior where it is just a count of clients with known versions, and now clients with unknown versions are tracked explicitly as its own supported_version section
2019-08-13 15:54:40 -07:00
A.J. Beamon 476641a087
Merge pull request #1929 from jzhou77/fix-warning
Fix compiler warnings
2019-08-01 11:15:41 -07:00
Jingyu Zhou 37450be706 Fix format usage for currentProtocolVersion
ProtocolVersion now is a class.
2019-08-01 10:19:46 -07:00
Xin Dong 1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong c6e5472d8d Apply suggestions from code review
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-07-30 22:20:45 -07:00
Xin Dong ae11efcb0a Made following changes:
- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
2019-07-30 22:20:45 -07:00
A.J. Beamon 438bc636d5 Rename max_machine_failures_without_losing_X to max_zone_failures_without_losing_X in status. 2019-07-30 14:02:31 -07:00
Evan Tschannen 90e3b50213 Merge branch 'master' into feature-coordinator-connection
# Conflicts:
#	fdbclient/DatabaseContext.h
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen ee92f0574f fix: lastRequestTime was not updated
fix: COORDINATOR_REGISTER_INTERVAL was not set
fixed review comments
2019-07-26 13:23:56 -07:00
Evan Tschannen be5d144b8b added status information on connected clients 2019-07-25 17:15:31 -07:00
Evan Tschannen 4a866290b7 Clients keep a persistent connection open with coordinators to get updates to the list of proxies
Status still needs to be updated with client information with information from the coordinators
2019-07-23 19:22:44 -07:00
Meng Xu 378db79441 Resolve conflict when merge with master 2019-07-22 10:56:20 -07:00
Meng Xu 612a51fe00 Apply Clang format to PRIORITY_TEAM_REDUNDANT 2019-07-19 18:32:22 -07:00
Meng Xu ea76451f15 Count PRIORITY_TEAM_REDUNDANT as count PRIORITY_TEAM_UNHEALTHY 2019-07-19 18:30:01 -07:00
Evan Tschannen 94c66f8d58
Merge pull request #1738 from bnamasivayam/consistency-check-disable
Disable/Re-enable consistency check through a database key.
2019-07-18 10:56:02 -07:00
Balachandar Namasivayam e08c25ffd8 Style fix. 2019-07-17 17:31:50 -07:00
A.J. Beamon 2cd05e9ac9
Merge pull request #1712 from tclinken/add-local-rk-to-status
Track the local ratekeeper rate in status
2019-07-15 15:17:11 -07:00
Balachandar Namasivayam 9169232fa9 Add the new messages to Schema. 2019-07-15 13:47:27 -07:00
Balachandar Namasivayam 4a99bd2961 Addressed review comments. 2019-07-15 12:33:18 -07:00
A.J. Beamon f31884c749 Merge branch 'master' into add-priority-starts-to-status
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-07-11 15:26:52 -07:00
A.J. Beamon 97609ad991 Add information about transaction starts at different priorities to status. 2019-07-11 13:54:44 -07:00
A.J. Beamon b4dbc6d7fa Change the way cache hits and misses are tracked to avoid counting blind page writes as misses and count the results of partial page writes. Report cache hit rate in status. 2019-07-10 14:43:20 -07:00
A.J. Beamon 69d7c4f79c Merge branch 'master' into track-run-loop-busyness
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	flow/Net2.actor.cpp
#	flow/network.h
2019-07-09 18:39:23 -07:00
Evan Tschannen c8d86516f0
Merge pull request #1800 from ajbeamon/rename-datacenter-version-difference
Rename datacenter_version_difference to datacenter_lag and include bo…
2019-07-09 17:29:27 -07:00
Trevor Clinkenbeard 1bac04509e Track the local ratekeeper rate as a percentage
This value is reported in status for each storage server.
2019-07-09 12:46:53 -07:00
A.J. Beamon 4be08d9b2d Rename datacenter_version_difference to datacenter_lag and include both seconds and versions. 2019-07-05 14:36:18 -07:00
A.J. Beamon 7f23814841 Track run loop busyness and report it in status. 2019-06-26 14:03:02 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Balachandar Namasivayam 7489f83a7f Disable/Re-enable consistency check through a database key.
fdbcli has a new command 'consistencycheck' to disable/re-enable consistency check.
cluster_healthy metric in status becomes false if consistencycheck is disabled.
2019-06-20 21:38:45 -07:00
mpilman 844dd60202 FDB compiling with intel compiler 2019-06-20 09:29:01 -07:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Evan Tschannen 22499666d0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/LogRouter.actor.cpp
#	flow/Trace.cpp
#	versions.target
2019-05-08 18:19:35 -07:00
Evan Tschannen d9a4553270 fix: The team tracker does not provide data movement priority information for non-failure related data movement 2019-05-07 17:06:54 -07:00
Austin Seipp bf378952cb fdbserver: fix some print/scan format warnings
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Evan Tschannen f0fe0d7858 added additional logging on the logs and log routers 2019-05-02 16:16:25 -07:00
Andrew Noyes ef04471a66 Fix more unused-variable warnings 2019-04-17 16:04:10 -07:00
Evan Tschannen 6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
Remove unused functions
2019-04-09 11:49:45 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Evan Tschannen 39c595223b Merge branch 'release-6.1' 2019-04-02 22:30:02 -07:00
Evan Tschannen 1d4a6ab551 cleaned up status to keep the healthyZone read separated from relicaFutures 2019-04-02 14:46:56 -07:00
Evan Tschannen 628fec8c8b updated status with information about ongoing maintenance
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
Jingyu Zhou 3f76be8f45 Merge remote-tracking branch 'apple/master' into fix-unreferenced 2019-04-01 14:00:43 -07:00
Jingyu Zhou f7f8ddd894 Fix warnings on unused variables
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
Evan Tschannen 836bb95a7a
Merge pull request #1372 from etschannen/master
Merge 6.1 into master
2019-03-27 21:00:49 -07:00
Evan Tschannen 34b9d5e722
Merge pull request #1364 from etschannen/feature-fast-serialize
A few performance optimizations
2019-03-27 20:57:25 -07:00
Evan Tschannen e5a80f2c94 optimized IPaddress 2019-03-27 18:21:13 -07:00
A.J. Beamon 71e2fdafb8 Changes to ratekeeper camel case 2019-03-27 08:24:25 -07:00
A.J. Beamon d508658569 Make ratekeeper one word to match our existing convention 2019-03-27 08:15:19 -07:00
Jingyu Zhou 7c02ee6fdd Fix compiler warning about unreferenced exception variable 2019-03-26 13:43:47 -07:00
Jingyu Zhou 466a59a99d Merge remote-tracking branch 'apple/release-6.1' into ratekeeper 2019-03-25 15:27:38 -07:00
Jingyu Zhou f57a22e2ed Add data distributor and ratekeeper to status output 2019-03-25 15:11:29 -07:00
Evan Tschannen 5e03e178de
Merge pull request #1345 from ajbeamon/support-multiple-client-or-worker-issues
Add support for a client or worker having multiple issues.
2019-03-24 17:27:50 -07:00
A.J. Beamon fc48b6050e When tabulating read workload metrics, ignore the absence of any particular storage server. 2019-03-22 14:22:22 -07:00
A.J. Beamon 4eb5715689 Add support for a client or worker having multiple issues. 2019-03-22 08:29:41 -07:00
Evan Tschannen f9aad46573 made use_provisional_proxies a transaction option 2019-03-19 18:44:37 -07:00
Meng Xu 5a10bf5dfc Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-14 10:35:12 -07:00
Evan Tschannen e068c478b5 merge master 2019-03-12 18:31:25 -07:00
Meng Xu 435e515985 Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-11 11:17:40 -07:00
Evan Tschannen 80c3f2f8e2 added status fields detailing which processes are degraded, and also the total number of degraded processes 2019-03-10 22:58:15 -07:00
Evan Tschannen 044b6b4f8a Merge branch 'master' into feature-degraded-tlog
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
Evan Tschannen 710a64dc4e replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails 2019-03-08 11:25:07 -05:00
Jingyu Zhou 7340998261 Fix status message for ratekeeper 2019-03-07 13:16:20 -08:00
Meng Xu 845f8fdcbc Status:healthy: Add optimizing_team_collections
Change removing_redundant_teams status name to
optimizing_team_collections.
The new name is more general and can be applied in the future
when we switch storage engines.
2019-03-06 15:05:23 -08:00
Meng Xu 04880e3d4d Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-06 13:41:16 -08:00
Meng Xu 820548223a Status: connected_coordinators misc minor changes
Change the rst document file;
Change the coding style to be consistent with the nearby code;
Ensure we always initilize the connectedCoordinatesNum to 0
even when the variable is not used.
2019-03-05 21:45:18 -08:00
Meng Xu b7a52e81e2 Status: Count connected coordinators per client
A client will always try to connect all coordinators.
This commit let Status track the number of connected coordinators
for each client.

This allows us to do canary in coordinators. For example,
when we switch from non-TLS to TLS, we can switch 1 coordinator
from non-TLS to TLS. This can help check if a client has the ability
to connect through TLS.
We can make the non-TLS to TLS switch for each coordinators
one by one. This avoid the risk of losing connection in the switch.
2019-03-05 21:21:23 -08:00
anoyes 981426bac9 More ide fixes 2019-03-05 18:03:57 -08:00
Meng Xu c0535c49bb Status: TLS client status
Use ClientStatusInfo structure for each network address (client),
instead of passing each status info as a parameter.
2019-03-04 16:35:10 -08:00
Vishesh Yadav 592e224155 net: add/use formatIpPort to format IP:PORT pairs #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav 57832e625d net: Support IPv6 #963
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.

- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.

- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.

- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
Meng Xu 94385447bc Status: Get if client configured TLS
To understand if all clients have configured TLS,
we check the tlsoption when a client tries to open database.
This is similar to how we track the versions of multi-version clients.
2019-03-01 15:17:01 -08:00
A.J. Beamon 3e6a6a6569 Update status schema for correctness. Send the count of batch transactions started back to ratekeeper so that it can be logged with other ratekeeper metrics. 2019-02-28 12:00:58 -08:00
A.J. Beamon eb629d87a5 Add information about batch ratekeeper to status. Make it possible to track latencies in the ReadWrite workload for concurrently run instances separately. 2019-02-28 09:53:16 -08:00
Evan Tschannen d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Meng Xu 9445ac0b0c Status: Use new data distributor worker to publish status
After we add a new data distributor role, we publish the data
related to data distributor and rate keeper through the new
role (and new worker).

So the status needs to contact the data distributor, instead of master,
to get the status information.
2019-02-21 18:05:50 -08:00
Meng Xu 7cca439e00 TeamRemover: Add status to show redundant team removing
Distinguish the removal of unhealthy team and redundant team.
Change status report to include redundant team removal report.
2019-02-21 14:16:46 -08:00
mpilman 3f0fd2a20c Use fwd decls in WorkerInterface
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman 0bb60e5a3b Use proper fwd decl in NativeAPI
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Jingyu Zhou 62c67a50e5 Fix segfault error
The usedIds is updated by master registration request, which populates the
usedIds map. However, this request may contain processes that cluster controller
is not aware, i.e., not in id_worker map.

This is ok until I added tracing the usedIds, which silently insert an empty
entry into id_worker map for the unknown process. This new entry can cause
crashing failure when trying to access its LocalityData.

Remove AsyncTrigger for usedIds, and change to serverInfo->onChange.

Use const & to avoid unnecessary copies in WorkerInterface's LocalityData
and getExtraTLogEligibleMachines().
2019-02-14 16:37:16 -08:00
A.J. Beamon b435d51061 Merge branch 'master' into track-server-request-latencies 2019-02-14 08:07:32 -08:00
Andrew Noyes 067a445e06 Replace unused _ variables with wait(success(...)) 2019-02-12 17:30:30 -08:00
A.J. Beamon d4349293b9 Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing. 2019-02-07 13:39:22 -08:00
A.J. Beamon 2198d24ce1 Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
# Conflicts:
#	fdbclient/MasterProxyInterface.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/ServerDBInfo.h
#	fdbserver/Status.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
A.J. Beamon 8e05e95045 Added the ability to configure the latency band settings by setting a special key in \xff keyspace. 2019-01-18 16:18:34 -08:00
A.J. Beamon eb2f27b8e5 Work in progress implementation of server-side latency tracking. The intent of this is to be able to measure the number of requests that achieve certain latency targets across the system relative to the total number of requests. 2018-11-30 10:46:04 -08:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen db71b60d72
Merge pull request #819 from satherton/feature-redwood
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen 0217aed74c Merge branch 'release-6.0'
# Conflicts:
#	bindings/go/README.md
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2018-10-15 18:38:51 -07:00
A.J. Beamon 419231d798 Fix: status was trying to read a metric under the wrong name, leading to an error that caused the cluster to report itself unhealthy and some metrics to be missing. 2018-10-10 13:33:28 -07:00
Stephen Atherton 22f8a4efa9 Normalized all unit test names to begin with "/" if they should be included in random unit testing. 2018-10-05 22:09:58 -07:00
Evan Tschannen 3922e477a5 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/LogSystemDiskQueueAdapter.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
A.J. Beamon 84c2e3567f Fix keys queried to use the RowsQueried metric instead of BytesQueried. 2018-10-01 11:19:28 -07:00
A.J. Beamon a98fcf5972 Rename durable_lag to durability_lag 2018-10-01 09:58:49 -07:00
A.J. Beamon f196e2d4dc Lot metrics about read requests as well as completed reads. 2018-09-27 15:32:39 -07:00
A.J. Beamon 118e21c446 Add new metrics for bytes queried, keys queried, mutation bytes, mutations, and durable lag to the storage role in status. 2018-09-27 14:33:21 -07:00
Alec Grieser 10a8e67266
Merge remote-tracking branch 'upstream/release-6.0' into merge-release-6.0 2018-09-11 21:49:59 -07:00
Stephen Atherton b47902584b Added comparison of serialization speed of json_spirit and JsonBuilder to JsonBuilder perf test. 2018-09-10 19:54:27 -07:00
Stephen Atherton 5b8971ef18 Rename JsonString to JsonBuilder 2018-09-10 19:01:24 -07:00
Stephen Atherton 8ed8f290c4 Merge branch 'release-6.0' of github.com:etschannen/foundationdb into evan-6 2018-09-10 13:28:49 -07:00
Stephen Atherton 4753172991 Added support for NaN and infinity in JsonBuilder in double and ascii number interfaces. 2018-09-10 13:28:31 -07:00
Evan Tschannen 63271072e8 fixed linker problem 2018-09-10 12:59:22 -07:00
Stephen Atherton 9febc01106 In simulation, strict JSON parser will print bad input to stdout (not std since whether it's an error depends on context). 2018-09-10 12:07:03 -07:00
Evan Tschannen ce38ddbd4b added tests with the + sign 2018-09-10 10:52:15 -07:00
Stephen Atherton 4510881ba1 Added JSON reparsing and error output to all JsonBuilder output tests. 2018-09-10 03:21:55 -07:00
Stephen Atherton 9828db4399 Rewrote ASCII -> JSON number support as sort of a filter which cleans up ASCII numbers into valid JSON. Added tests for strange numbers which tests the generated output by parsing it as JSON. 2018-09-10 03:07:11 -07:00
Evan Tschannen 5bff967763 avoid parsing numbers that are never used as numbers 2018-09-09 23:19:00 -07:00
Stephen Atherton 2afd4cffb9 Simplified raw string writers and added StringRef version so that locality field names in status do not have to be repeatedly copied. 2018-09-09 22:47:12 -07:00
Stephen Atherton 41b03f6f68 Some refactoring in JsonBuilder to remove code duplication and make some expressions simpler. Some performance improvements, switched vector to VectorRef and used its bulk append method, and redirected basic mValue types to faster serialization. Added StringRef writeValue() method so that locality field writing in status does not have to use toString(). 2018-09-09 22:39:51 -07:00
Stephen Atherton 7ca304d209 Bug fix, adding empty JsonBuilder object or array to same would cause an errant comma at the end of the target. 2018-09-09 22:11:11 -07:00
Stephen Atherton 2d6632dfc6 Added more JsonBuilder unit tests, some of which fail. 2018-09-09 22:05:04 -07:00
Evan Tschannen 84737b1fbe speed up status 2018-09-09 18:12:41 -07:00
Stephen Atherton ce6a9da423 Added JsonBuilder performance unit test for tracking optimizations. 2018-09-08 18:05:16 -07:00
Stephen Atherton 21c7ed2b50 JsonString refactored so that type compatibility is enforced at compile time. Specific JSON types are handled with subclasses that have type-specific access methods and a caller's intentions can no longer be ambiguous. Nothing that compiles should be able to produce malformed JSON. 2018-09-08 15:44:48 -07:00
Stephen Atherton ce3f01a0cf Added concept of type to JsonString. Appending single items or key/value pairs is now type-safe and only allowed in certain cases. JsonString will refuse to produce invalid JSON. All duplicative functions have been replaced with templates. Encoding of values uses json_spirit's value writer which should be no worse performance than format() and it will escape everything properly. Final string form is now built directly using knowledge of type, such as when an instance becomes an Array or Object the appropriate opening character is written. This avoids a full copy just to prepend the opening character later. Index interface for key/value pairs no longer makes a temporary copy of the key string. JsonString is now only needed by Status.actor.cpp. Still more work to be done here. 2018-09-08 07:15:28 -07:00
Evan Tschannen d3c8d7ab4e fix: status would generate invalid json 2018-09-07 18:26:05 -07:00
Bhaskar Muppana 70ba750bcd dbName is not used anymore. 2018-09-06 14:31:26 -07:00
Bhaskar Muppana 920fd3fe97 Merge branch 'release-6.0' 2018-09-06 14:24:02 -07:00
Evan Tschannen 98651bafb1 removed _keyNames from JsonString 2018-09-05 22:51:15 -07:00
Evan Tschannen 90301f497f Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/TLSConnection.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	versions.target
2018-09-05 16:06:33 -07:00
Alvin Moore 221f73e69e Merge branch 'release-6.0' of github.com:apple/foundationdb into status-json
# Conflicts:
#	fdbclient/fdbclient.vcxproj
#	fdbserver/Status.actor.cpp
2018-09-05 14:44:09 -07:00
Alvin Moore 6aa22af83b Added explicit keyword for appropriate constructor
Templatized some methods
Removed unused hash function
2018-09-05 12:22:04 -07:00
Evan Tschannen 4eaff42e4f
Merge pull request #712 from ajbeamon/remove-database-name-internal
Eliminate use of database names (phase 1)
2018-09-05 10:35:00 -07:00
Alvin Moore 43a2afc3b6 Added TODO comment
Removed debug comments
2018-09-05 08:06:19 -07:00
Alvin Moore 04a768042a Added TraceEvent to measure time to create Status Json
Simplified JsonString class to use implementation method for reuse of methods
Removed quotes from non-string values within json
Added Tests for jsonstring
Removed hashing of names for JsonString
Switched name tracker to unordered set
2018-09-05 03:50:53 -07:00
Evan Tschannen 1e2ce75ce4 fix: if usable_regions=1 extraTlogEligibleMachines was calculated incorrectly 2018-08-31 13:04:00 -07:00
Alvin Moore affd7423b4 Added class to write json as objects are added
Integrated JsonString class into status
2018-08-31 01:21:24 -07:00
Evan Tschannen 717c43a69f merge 6.0 into master 2018-08-22 00:28:04 -07:00
A.J. Beamon 2a97139d5d This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated. 2018-08-16 10:24:12 -07:00
Evan Tschannen e770629229 fix: json_spirit::write_string is very CPU intensive, especially for large JSON documents. The cluster controller would call this function for each status reply it needed to send, resulting in a slow task. 2018-08-15 19:39:06 -07:00
Alex Miller fb31a6999f Rewrite all files to have #include actorcompiler.h as the last include. 2018-08-14 15:50:26 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen 9c918a28f6 fix: status was reporting no replicas remaining when the remote datacenter was initially configured with usable_regions=2 2018-08-09 13:16:09 -07:00
A.J. Beamon 7d831ef9c3 Revert change that prints lag with 2 decimal points of precision. 2018-08-07 15:41:51 -07:00
A.J. Beamon e0cf525951 Fix: use new data lag fields when making storage server message indicating high lag. 2018-08-07 11:02:09 -07:00
Evan Tschannen 6d76ff67a3 added the connection string to status 2018-07-09 22:11:58 -07:00
Evan Tschannen 507b3bacb0 fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
added more recovery states.
2018-07-05 00:08:51 -07:00
Evan Tschannen a288d5b9a9 added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down 2018-06-28 23:15:32 -07:00
A.J. Beamon 1f0561a9c0 Missed a couple requested changes 2018-06-26 15:22:39 -07:00
A.J. Beamon fec225075f Merge branch 'master' into trace-log-refactor 2018-06-26 14:54:42 -07:00
A.J. Beamon fe956bc35a Address review comments 2018-06-26 14:37:21 -07:00
A.J. Beamon 9f545ce002 Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor 2018-06-26 11:37:23 -07:00
A.J. Beamon e8f66df001 Add metrics for watches and mutations on the storage server. The storage server tracks its lag with the logs, and status tries to report a more accurate measure of this lag. 2018-06-21 15:59:43 -07:00
A.J. Beamon 5e81f4ac7e Track unused allocated memory in ProcessMetrics and report it in status. 2018-06-20 10:10:51 -07:00
Evan Tschannen 0913368651 added usable_regions to specify if we will replicate into a remote region
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Evan Tschannen f694f7c9ca removed hasBestPolicy 2018-06-15 12:36:19 -07:00
Evan Tschannen 09c92c887b fix: extraTlogEligibleMachines was not calculated correctly in all cases 2018-06-15 10:23:33 -07:00
Evan Tschannen 246abd1207 added full_replication to status 2018-06-14 21:14:18 -07:00
Evan Tschannen 0103b6f5ed added datacenter_version_difference to status 2018-06-14 19:09:25 -07:00
Evan Tschannen 99e21c869c fixed a number of status calculations, and re-enabled the status workload 2018-06-14 17:58:57 -07:00
Richard Low 39894ea798 Merge remote-tracking branch 'apple/release-5.2' 2018-06-12 18:31:20 -07:00
Balachandar Namasivayam 819929e1be Address review comments. 2018-06-12 11:59:47 -07:00
Balachandar Namasivayam 7db928ccec Cluster file and its parent directory needs to be writable for operation of fdb cluster.
Document this requirement and also add relevant details to status output.
2018-06-11 16:47:24 -07:00
A.J. Beamon f965954122 Merge commit '82be52205b95464e355c449fdf3e7d483fa06677' into trace-log-refactor
# Conflicts:
#	fdbserver/Status.actor.cpp
#	fdbserver/workloads/DDMetrics.actor.cpp
#	flow/Trace.cpp
2018-06-08 16:22:22 -07:00
A.J. Beamon 99c9958db7 Some more trace event normalization 2018-06-08 13:57:00 -07:00
A.J. Beamon 0ca51989bb Merge branch 'master' into trace-log-refactor
# Conflicts:
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/Status.actor.cpp
#	flow/Trace.cpp
2018-06-08 13:24:30 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
A.J. Beamon 78839b20fd Merge branch 'master' into trace-log-refactor
# Conflicts:
#	flow/Trace.cpp
2018-05-31 10:46:20 -07:00
A.J. Beamon 54b4c9e061 Merge branch 'release-5.2' into trace-log-refactor
# Conflicts:
#	fdbserver/Status.actor.cpp
2018-05-08 15:51:54 -07:00
A.J. Beamon ca720e1540
Merge pull request #297 from apple/release-5.2
Merge 5.2 to Master
2018-05-08 12:04:20 -07:00
A.J. Beamon 432a295bc2 Add read bytes and read keys info to status. Collect this information directly from StorageMetrics rather than through ratekeeper. 2018-05-04 12:01:40 -07:00
A.J. Beamon ce0c991e78 Refactor trace events to store a vector of fields that aren't encoded until write time. Better support for pre-network trace events. Rework how trace events are queried. Some initial work towards pluggable formatting of logs. 2018-05-02 10:44:38 -07:00
Evan Tschannen 3abf4d7fdf Merge branch 'master' into feature-remote-logs 2018-03-09 14:50:04 -08:00
Evan Tschannen 91bb8faa45 Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
A.J. Beamon bb9f51bb5c Don't try to extract attributes from the program start trace events if they couldn't be collected. 2018-03-09 11:55:57 -08:00
Evan Tschannen 1194e3a361 added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed. 2018-03-05 19:27:46 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen cb25564d38 simulated cluster supports fearless configurations
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00