Evan Tschannen
2d74288d16
Added a comment to clarify why cleanup work is done in status
2019-10-22 16:33:44 -07:00
Evan Tschannen
3478652d06
Apply suggestions from code review
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:32:09 -07:00
Evan Tschannen
d5c2147c0c
Update fdbserver/Status.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:27:52 -07:00
Evan Tschannen
2caad04d9c
Keys in the destUIDLookupPrefix can be cleaned up automatically if they do not have an associated entry in the logRangesRange keyspace
2019-10-22 11:58:40 -07:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Evan Tschannen
42b7acf7b7
Merge pull request #2202 from etschannen/feature-share-mutations
...
Backup and DR would not share mutations if started on different versions of FDB
2019-10-16 20:28:39 -07:00
Evan Tschannen
587cbefe7f
duplicate mutation stream checker did not have a timeout
...
duplicate mutation stream did not work properly when multiple ranges exist with the same begin key
2019-10-16 20:17:09 -07:00
Evan Tschannen
5be773f145
Update fdbserver/Status.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:24 -07:00
Evan Tschannen
2facfc090b
Update fdbserver/Status.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:12 -07:00
Evan Tschannen
552eb44bf8
Merge pull request #2230 from ajbeamon/fix-fault-tolerance-reporting-with-remote-regions
...
Fix: status would fail to account for remote regions when...
2019-10-16 14:51:48 -07:00
Evan Tschannen
8b09cd16b2
Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-share-mutations
2019-10-16 14:50:37 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
A.J. Beamon
a6da9d3df5
Fix: status would fail to account for remote regions when computing fault tolerance in the presence of a failure on the primary.
2019-10-10 10:36:35 -07:00
Evan Tschannen
628b4e0220
added a warning if multiple log ranges exist for the same range
2019-10-02 17:06:19 -07:00
Meng Xu
d0147e5e5d
Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
...
Resolved Conflicts:
documentation/sphinx/source/release-notes.rst
fdbserver/DataDistribution.actor.cpp
versions.target
2019-10-02 13:22:56 -07:00
Evan Tschannen
045175bd0e
added tracking for the size of the system keyspace
2019-09-27 22:39:19 -07:00
Evan Tschannen
b495cc697b
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-09-13 09:25:08 -07:00
A.J. Beamon
6100d3274d
Merge pull request #2058 from tclinken/expose-lock-status
...
Added lockUID to status output if database is locked
2019-09-11 08:47:35 -07:00
Evan Tschannen
945cff1e5b
the cluster controller caches the serialization of serverDBInfo, to avoid regenerating it many times
2019-09-10 14:27:22 -07:00
Trevor Clinkenbeard
95ccf23e88
Use READ_LOCK_AWARE instead of LOCK_AWARE in lockedStatusFetcher
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-09-06 22:21:07 -07:00
Trevor Clinkenbeard
29fe5f16d3
Use READ_SYSTEM_KEYS instead of ACCESS_SYSTEM_KEYS in lockedStatusFetcher
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-09-06 22:21:07 -07:00
Trevor Clinkenbeard
8c31a839be
s/lockUID/lock_uid in status
2019-09-06 22:20:55 -07:00
Trevor Clinkenbeard
2b6961826e
Added lockUID to status output if database is locked
2019-09-06 22:20:34 -07:00
Jingyu Zhou
e551523b04
Fix the same iterator bug of passing the end
2019-09-05 11:36:34 -07:00
Jingyu Zhou
73044bdc36
Fix a crash failure due to iterator passing the end
2019-09-05 11:34:11 -07:00
A.J. Beamon
3f9e392668
Merge pull request #2014 from etschannen/feature-fdbcli-sleep
...
Added a sleep command to fdbcli
2019-08-30 11:22:13 -07:00
Evan Tschannen
f3bc7e0abd
do not duplicate data distribution disabled fields in status
...
fixed a few bugs related to the existing data distribution disabled fields in status
2019-08-29 18:41:34 -07:00
Evan Tschannen
0b0c9fe0ff
data distribution status was combined into regular status
2019-08-21 14:44:15 -07:00
A.J. Beamon
2b80d836f4
Merge branch 'release-6.2' into add-coordinator-to-status-roles-list
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-08-19 15:03:59 -07:00
A.J. Beamon
b8e57f37d7
Add 'coordinator' to the list of roles that a process can have in status.
2019-08-15 14:42:49 -07:00
A.J. Beamon
bb72cdd36a
Report lag with the usual "seconds" and "versions" fields. Rename and deprecate the qos.*version_lag_storage_server fields.
2019-08-15 13:42:39 -07:00
A.J. Beamon
6581161dd3
Add ratekeeper's durability lag statistics to status
2019-08-15 11:07:04 -07:00
Evan Tschannen
70ce678879
fix: max_protocol_clients were being added to the connected_clients list
...
fix: the clientCount was included clients with unknown protocol versions. This has been changed back to the pre-6.2 behavior where it is just a count of clients with known versions, and now clients with unknown versions are tracked explicitly as its own supported_version section
2019-08-13 15:54:40 -07:00
A.J. Beamon
476641a087
Merge pull request #1929 from jzhou77/fix-warning
...
Fix compiler warnings
2019-08-01 11:15:41 -07:00
Jingyu Zhou
37450be706
Fix format usage for currentProtocolVersion
...
ProtocolVersion now is a class.
2019-08-01 10:19:46 -07:00
Xin Dong
1922c39377
Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race.
2019-07-30 22:24:30 -07:00
Xin Dong
c6e5472d8d
Apply suggestions from code review
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-07-30 22:20:45 -07:00
Xin Dong
ae11efcb0a
Made following changes:
...
- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
2019-07-30 22:20:45 -07:00
A.J. Beamon
438bc636d5
Rename max_machine_failures_without_losing_X to max_zone_failures_without_losing_X in status.
2019-07-30 14:02:31 -07:00
Evan Tschannen
90e3b50213
Merge branch 'master' into feature-coordinator-connection
...
# Conflicts:
# fdbclient/DatabaseContext.h
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen
ee92f0574f
fix: lastRequestTime was not updated
...
fix: COORDINATOR_REGISTER_INTERVAL was not set
fixed review comments
2019-07-26 13:23:56 -07:00
Evan Tschannen
be5d144b8b
added status information on connected clients
2019-07-25 17:15:31 -07:00
Evan Tschannen
4a866290b7
Clients keep a persistent connection open with coordinators to get updates to the list of proxies
...
Status still needs to be updated with client information with information from the coordinators
2019-07-23 19:22:44 -07:00
Meng Xu
378db79441
Resolve conflict when merge with master
2019-07-22 10:56:20 -07:00
Meng Xu
612a51fe00
Apply Clang format to PRIORITY_TEAM_REDUNDANT
2019-07-19 18:32:22 -07:00
Meng Xu
ea76451f15
Count PRIORITY_TEAM_REDUNDANT as count PRIORITY_TEAM_UNHEALTHY
2019-07-19 18:30:01 -07:00
Evan Tschannen
94c66f8d58
Merge pull request #1738 from bnamasivayam/consistency-check-disable
...
Disable/Re-enable consistency check through a database key.
2019-07-18 10:56:02 -07:00
Balachandar Namasivayam
e08c25ffd8
Style fix.
2019-07-17 17:31:50 -07:00
A.J. Beamon
2cd05e9ac9
Merge pull request #1712 from tclinken/add-local-rk-to-status
...
Track the local ratekeeper rate in status
2019-07-15 15:17:11 -07:00
Balachandar Namasivayam
9169232fa9
Add the new messages to Schema.
2019-07-15 13:47:27 -07:00
Balachandar Namasivayam
4a99bd2961
Addressed review comments.
2019-07-15 12:33:18 -07:00
A.J. Beamon
f31884c749
Merge branch 'master' into add-priority-starts-to-status
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-07-11 15:26:52 -07:00
A.J. Beamon
97609ad991
Add information about transaction starts at different priorities to status.
2019-07-11 13:54:44 -07:00
A.J. Beamon
b4dbc6d7fa
Change the way cache hits and misses are tracked to avoid counting blind page writes as misses and count the results of partial page writes. Report cache hit rate in status.
2019-07-10 14:43:20 -07:00
A.J. Beamon
69d7c4f79c
Merge branch 'master' into track-run-loop-busyness
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# flow/Net2.actor.cpp
# flow/network.h
2019-07-09 18:39:23 -07:00
Evan Tschannen
c8d86516f0
Merge pull request #1800 from ajbeamon/rename-datacenter-version-difference
...
Rename datacenter_version_difference to datacenter_lag and include bo…
2019-07-09 17:29:27 -07:00
Trevor Clinkenbeard
1bac04509e
Track the local ratekeeper rate as a percentage
...
This value is reported in status for each storage server.
2019-07-09 12:46:53 -07:00
A.J. Beamon
4be08d9b2d
Rename datacenter_version_difference to datacenter_lag and include both seconds and versions.
2019-07-05 14:36:18 -07:00
A.J. Beamon
7f23814841
Track run loop busyness and report it in status.
2019-06-26 14:03:02 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Balachandar Namasivayam
7489f83a7f
Disable/Re-enable consistency check through a database key.
...
fdbcli has a new command 'consistencycheck' to disable/re-enable consistency check.
cluster_healthy metric in status becomes false if consistencycheck is disabled.
2019-06-20 21:38:45 -07:00
mpilman
844dd60202
FDB compiling with intel compiler
2019-06-20 09:29:01 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Evan Tschannen
22499666d0
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/LogRouter.actor.cpp
# flow/Trace.cpp
# versions.target
2019-05-08 18:19:35 -07:00
Evan Tschannen
d9a4553270
fix: The team tracker does not provide data movement priority information for non-failure related data movement
2019-05-07 17:06:54 -07:00
Austin Seipp
bf378952cb
fdbserver: fix some print/scan format warnings
...
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Evan Tschannen
f0fe0d7858
added additional logging on the logs and log routers
2019-05-02 16:16:25 -07:00
Andrew Noyes
ef04471a66
Fix more unused-variable warnings
2019-04-17 16:04:10 -07:00
Evan Tschannen
6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
...
Remove unused functions
2019-04-09 11:49:45 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
Evan Tschannen
39c595223b
Merge branch 'release-6.1'
2019-04-02 22:30:02 -07:00
Evan Tschannen
1d4a6ab551
cleaned up status to keep the healthyZone read separated from relicaFutures
2019-04-02 14:46:56 -07:00
Evan Tschannen
628fec8c8b
updated status with information about ongoing maintenance
...
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
Jingyu Zhou
3f76be8f45
Merge remote-tracking branch 'apple/master' into fix-unreferenced
2019-04-01 14:00:43 -07:00
Jingyu Zhou
f7f8ddd894
Fix warnings on unused variables
...
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
Evan Tschannen
836bb95a7a
Merge pull request #1372 from etschannen/master
...
Merge 6.1 into master
2019-03-27 21:00:49 -07:00
Evan Tschannen
34b9d5e722
Merge pull request #1364 from etschannen/feature-fast-serialize
...
A few performance optimizations
2019-03-27 20:57:25 -07:00
Evan Tschannen
e5a80f2c94
optimized IPaddress
2019-03-27 18:21:13 -07:00
A.J. Beamon
71e2fdafb8
Changes to ratekeeper camel case
2019-03-27 08:24:25 -07:00
A.J. Beamon
d508658569
Make ratekeeper one word to match our existing convention
2019-03-27 08:15:19 -07:00
Jingyu Zhou
7c02ee6fdd
Fix compiler warning about unreferenced exception variable
2019-03-26 13:43:47 -07:00
Jingyu Zhou
466a59a99d
Merge remote-tracking branch 'apple/release-6.1' into ratekeeper
2019-03-25 15:27:38 -07:00
Jingyu Zhou
f57a22e2ed
Add data distributor and ratekeeper to status output
2019-03-25 15:11:29 -07:00
Evan Tschannen
5e03e178de
Merge pull request #1345 from ajbeamon/support-multiple-client-or-worker-issues
...
Add support for a client or worker having multiple issues.
2019-03-24 17:27:50 -07:00
A.J. Beamon
fc48b6050e
When tabulating read workload metrics, ignore the absence of any particular storage server.
2019-03-22 14:22:22 -07:00
A.J. Beamon
4eb5715689
Add support for a client or worker having multiple issues.
2019-03-22 08:29:41 -07:00
Evan Tschannen
f9aad46573
made use_provisional_proxies a transaction option
2019-03-19 18:44:37 -07:00
Meng Xu
5a10bf5dfc
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-14 10:35:12 -07:00
Evan Tschannen
e068c478b5
merge master
2019-03-12 18:31:25 -07:00
Meng Xu
435e515985
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-11 11:17:40 -07:00
Evan Tschannen
80c3f2f8e2
added status fields detailing which processes are degraded, and also the total number of degraded processes
2019-03-10 22:58:15 -07:00
Evan Tschannen
044b6b4f8a
Merge branch 'master' into feature-degraded-tlog
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
Evan Tschannen
710a64dc4e
replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails
2019-03-08 11:25:07 -05:00
Jingyu Zhou
7340998261
Fix status message for ratekeeper
2019-03-07 13:16:20 -08:00
Meng Xu
845f8fdcbc
Status:healthy: Add optimizing_team_collections
...
Change removing_redundant_teams status name to
optimizing_team_collections.
The new name is more general and can be applied in the future
when we switch storage engines.
2019-03-06 15:05:23 -08:00
Meng Xu
04880e3d4d
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-06 13:41:16 -08:00
Meng Xu
820548223a
Status: connected_coordinators misc minor changes
...
Change the rst document file;
Change the coding style to be consistent with the nearby code;
Ensure we always initilize the connectedCoordinatesNum to 0
even when the variable is not used.
2019-03-05 21:45:18 -08:00
Meng Xu
b7a52e81e2
Status: Count connected coordinators per client
...
A client will always try to connect all coordinators.
This commit let Status track the number of connected coordinators
for each client.
This allows us to do canary in coordinators. For example,
when we switch from non-TLS to TLS, we can switch 1 coordinator
from non-TLS to TLS. This can help check if a client has the ability
to connect through TLS.
We can make the non-TLS to TLS switch for each coordinators
one by one. This avoid the risk of losing connection in the switch.
2019-03-05 21:21:23 -08:00
anoyes
981426bac9
More ide fixes
2019-03-05 18:03:57 -08:00
Meng Xu
c0535c49bb
Status: TLS client status
...
Use ClientStatusInfo structure for each network address (client),
instead of passing each status info as a parameter.
2019-03-04 16:35:10 -08:00
Vishesh Yadav
592e224155
net: add/use formatIpPort to format IP:PORT pairs #963
2019-03-04 14:12:45 -08:00
Vishesh Yadav
57832e625d
net: Support IPv6 #963
...
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.
- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.
- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.
- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
Meng Xu
94385447bc
Status: Get if client configured TLS
...
To understand if all clients have configured TLS,
we check the tlsoption when a client tries to open database.
This is similar to how we track the versions of multi-version clients.
2019-03-01 15:17:01 -08:00
A.J. Beamon
3e6a6a6569
Update status schema for correctness. Send the count of batch transactions started back to ratekeeper so that it can be logged with other ratekeeper metrics.
2019-02-28 12:00:58 -08:00
A.J. Beamon
eb629d87a5
Add information about batch ratekeeper to status. Make it possible to track latencies in the ReadWrite workload for concurrently run instances separately.
2019-02-28 09:53:16 -08:00
Evan Tschannen
d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
...
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Meng Xu
9445ac0b0c
Status: Use new data distributor worker to publish status
...
After we add a new data distributor role, we publish the data
related to data distributor and rate keeper through the new
role (and new worker).
So the status needs to contact the data distributor, instead of master,
to get the status information.
2019-02-21 18:05:50 -08:00
Meng Xu
7cca439e00
TeamRemover: Add status to show redundant team removing
...
Distinguish the removal of unhealthy team and redundant team.
Change status report to include redundant team removal report.
2019-02-21 14:16:46 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Jingyu Zhou
62c67a50e5
Fix segfault error
...
The usedIds is updated by master registration request, which populates the
usedIds map. However, this request may contain processes that cluster controller
is not aware, i.e., not in id_worker map.
This is ok until I added tracing the usedIds, which silently insert an empty
entry into id_worker map for the unknown process. This new entry can cause
crashing failure when trying to access its LocalityData.
Remove AsyncTrigger for usedIds, and change to serverInfo->onChange.
Use const & to avoid unnecessary copies in WorkerInterface's LocalityData
and getExtraTLogEligibleMachines().
2019-02-14 16:37:16 -08:00
A.J. Beamon
b435d51061
Merge branch 'master' into track-server-request-latencies
2019-02-14 08:07:32 -08:00
Andrew Noyes
067a445e06
Replace unused _ variables with wait(success(...))
2019-02-12 17:30:30 -08:00
A.J. Beamon
d4349293b9
Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing.
2019-02-07 13:39:22 -08:00
A.J. Beamon
2198d24ce1
Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
...
# Conflicts:
# fdbclient/MasterProxyInterface.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/ServerDBInfo.h
# fdbserver/Status.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
A.J. Beamon
8e05e95045
Added the ability to configure the latency band settings by setting a special key in \xff keyspace.
2019-01-18 16:18:34 -08:00
A.J. Beamon
eb2f27b8e5
Work in progress implementation of server-side latency tracking. The intent of this is to be able to measure the number of requests that achieve certain latency targets across the system relative to the total number of requests.
2018-11-30 10:46:04 -08:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
db71b60d72
Merge pull request #819 from satherton/feature-redwood
...
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen
0217aed74c
Merge branch 'release-6.0'
...
# Conflicts:
# bindings/go/README.md
# documentation/sphinx/source/release-notes.rst
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2018-10-15 18:38:51 -07:00
A.J. Beamon
419231d798
Fix: status was trying to read a metric under the wrong name, leading to an error that caused the cluster to report itself unhealthy and some metrics to be missing.
2018-10-10 13:33:28 -07:00
Stephen Atherton
22f8a4efa9
Normalized all unit test names to begin with "/" if they should be included in random unit testing.
2018-10-05 22:09:58 -07:00
Evan Tschannen
3922e477a5
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/LogSystemDiskQueueAdapter.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
A.J. Beamon
84c2e3567f
Fix keys queried to use the RowsQueried metric instead of BytesQueried.
2018-10-01 11:19:28 -07:00
A.J. Beamon
a98fcf5972
Rename durable_lag to durability_lag
2018-10-01 09:58:49 -07:00
A.J. Beamon
f196e2d4dc
Lot metrics about read requests as well as completed reads.
2018-09-27 15:32:39 -07:00
A.J. Beamon
118e21c446
Add new metrics for bytes queried, keys queried, mutation bytes, mutations, and durable lag to the storage role in status.
2018-09-27 14:33:21 -07:00
Alec Grieser
10a8e67266
Merge remote-tracking branch 'upstream/release-6.0' into merge-release-6.0
2018-09-11 21:49:59 -07:00
Stephen Atherton
b47902584b
Added comparison of serialization speed of json_spirit and JsonBuilder to JsonBuilder perf test.
2018-09-10 19:54:27 -07:00
Stephen Atherton
5b8971ef18
Rename JsonString to JsonBuilder
2018-09-10 19:01:24 -07:00
Stephen Atherton
8ed8f290c4
Merge branch 'release-6.0' of github.com:etschannen/foundationdb into evan-6
2018-09-10 13:28:49 -07:00
Stephen Atherton
4753172991
Added support for NaN and infinity in JsonBuilder in double and ascii number interfaces.
2018-09-10 13:28:31 -07:00
Evan Tschannen
63271072e8
fixed linker problem
2018-09-10 12:59:22 -07:00
Stephen Atherton
9febc01106
In simulation, strict JSON parser will print bad input to stdout (not std since whether it's an error depends on context).
2018-09-10 12:07:03 -07:00
Evan Tschannen
ce38ddbd4b
added tests with the + sign
2018-09-10 10:52:15 -07:00
Stephen Atherton
4510881ba1
Added JSON reparsing and error output to all JsonBuilder output tests.
2018-09-10 03:21:55 -07:00
Stephen Atherton
9828db4399
Rewrote ASCII -> JSON number support as sort of a filter which cleans up ASCII numbers into valid JSON. Added tests for strange numbers which tests the generated output by parsing it as JSON.
2018-09-10 03:07:11 -07:00
Evan Tschannen
5bff967763
avoid parsing numbers that are never used as numbers
2018-09-09 23:19:00 -07:00
Stephen Atherton
2afd4cffb9
Simplified raw string writers and added StringRef version so that locality field names in status do not have to be repeatedly copied.
2018-09-09 22:47:12 -07:00
Stephen Atherton
41b03f6f68
Some refactoring in JsonBuilder to remove code duplication and make some expressions simpler. Some performance improvements, switched vector to VectorRef and used its bulk append method, and redirected basic mValue types to faster serialization. Added StringRef writeValue() method so that locality field writing in status does not have to use toString().
2018-09-09 22:39:51 -07:00
Stephen Atherton
7ca304d209
Bug fix, adding empty JsonBuilder object or array to same would cause an errant comma at the end of the target.
2018-09-09 22:11:11 -07:00
Stephen Atherton
2d6632dfc6
Added more JsonBuilder unit tests, some of which fail.
2018-09-09 22:05:04 -07:00
Evan Tschannen
84737b1fbe
speed up status
2018-09-09 18:12:41 -07:00
Stephen Atherton
ce6a9da423
Added JsonBuilder performance unit test for tracking optimizations.
2018-09-08 18:05:16 -07:00
Stephen Atherton
21c7ed2b50
JsonString refactored so that type compatibility is enforced at compile time. Specific JSON types are handled with subclasses that have type-specific access methods and a caller's intentions can no longer be ambiguous. Nothing that compiles should be able to produce malformed JSON.
2018-09-08 15:44:48 -07:00
Stephen Atherton
ce3f01a0cf
Added concept of type to JsonString. Appending single items or key/value pairs is now type-safe and only allowed in certain cases. JsonString will refuse to produce invalid JSON. All duplicative functions have been replaced with templates. Encoding of values uses json_spirit's value writer which should be no worse performance than format() and it will escape everything properly. Final string form is now built directly using knowledge of type, such as when an instance becomes an Array or Object the appropriate opening character is written. This avoids a full copy just to prepend the opening character later. Index interface for key/value pairs no longer makes a temporary copy of the key string. JsonString is now only needed by Status.actor.cpp. Still more work to be done here.
2018-09-08 07:15:28 -07:00
Evan Tschannen
d3c8d7ab4e
fix: status would generate invalid json
2018-09-07 18:26:05 -07:00
Bhaskar Muppana
70ba750bcd
dbName is not used anymore.
2018-09-06 14:31:26 -07:00
Bhaskar Muppana
920fd3fe97
Merge branch 'release-6.0'
2018-09-06 14:24:02 -07:00
Evan Tschannen
98651bafb1
removed _keyNames from JsonString
2018-09-05 22:51:15 -07:00
Evan Tschannen
90301f497f
Merge branch 'release-6.0'
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/TLSConnection.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# versions.target
2018-09-05 16:06:33 -07:00
Alvin Moore
221f73e69e
Merge branch 'release-6.0' of github.com:apple/foundationdb into status-json
...
# Conflicts:
# fdbclient/fdbclient.vcxproj
# fdbserver/Status.actor.cpp
2018-09-05 14:44:09 -07:00
Alvin Moore
6aa22af83b
Added explicit keyword for appropriate constructor
...
Templatized some methods
Removed unused hash function
2018-09-05 12:22:04 -07:00
Evan Tschannen
4eaff42e4f
Merge pull request #712 from ajbeamon/remove-database-name-internal
...
Eliminate use of database names (phase 1)
2018-09-05 10:35:00 -07:00
Alvin Moore
43a2afc3b6
Added TODO comment
...
Removed debug comments
2018-09-05 08:06:19 -07:00
Alvin Moore
04a768042a
Added TraceEvent to measure time to create Status Json
...
Simplified JsonString class to use implementation method for reuse of methods
Removed quotes from non-string values within json
Added Tests for jsonstring
Removed hashing of names for JsonString
Switched name tracker to unordered set
2018-09-05 03:50:53 -07:00
Evan Tschannen
1e2ce75ce4
fix: if usable_regions=1 extraTlogEligibleMachines was calculated incorrectly
2018-08-31 13:04:00 -07:00
Alvin Moore
affd7423b4
Added class to write json as objects are added
...
Integrated JsonString class into status
2018-08-31 01:21:24 -07:00
Evan Tschannen
717c43a69f
merge 6.0 into master
2018-08-22 00:28:04 -07:00
A.J. Beamon
2a97139d5d
This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.
2018-08-16 10:24:12 -07:00
Evan Tschannen
e770629229
fix: json_spirit::write_string is very CPU intensive, especially for large JSON documents. The cluster controller would call this function for each status reply it needed to send, resulting in a slow task.
2018-08-15 19:39:06 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
9c918a28f6
fix: status was reporting no replicas remaining when the remote datacenter was initially configured with usable_regions=2
2018-08-09 13:16:09 -07:00
A.J. Beamon
7d831ef9c3
Revert change that prints lag with 2 decimal points of precision.
2018-08-07 15:41:51 -07:00
A.J. Beamon
e0cf525951
Fix: use new data lag fields when making storage server message indicating high lag.
2018-08-07 11:02:09 -07:00
Evan Tschannen
6d76ff67a3
added the connection string to status
2018-07-09 22:11:58 -07:00
Evan Tschannen
507b3bacb0
fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
...
added more recovery states.
2018-07-05 00:08:51 -07:00
Evan Tschannen
a288d5b9a9
added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down
2018-06-28 23:15:32 -07:00
A.J. Beamon
1f0561a9c0
Missed a couple requested changes
2018-06-26 15:22:39 -07:00
A.J. Beamon
fec225075f
Merge branch 'master' into trace-log-refactor
2018-06-26 14:54:42 -07:00
A.J. Beamon
fe956bc35a
Address review comments
2018-06-26 14:37:21 -07:00
A.J. Beamon
9f545ce002
Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor
2018-06-26 11:37:23 -07:00
A.J. Beamon
e8f66df001
Add metrics for watches and mutations on the storage server. The storage server tracks its lag with the logs, and status tries to report a more accurate measure of this lag.
2018-06-21 15:59:43 -07:00
A.J. Beamon
5e81f4ac7e
Track unused allocated memory in ProcessMetrics and report it in status.
2018-06-20 10:10:51 -07:00
Evan Tschannen
0913368651
added usable_regions to specify if we will replicate into a remote region
...
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Evan Tschannen
f694f7c9ca
removed hasBestPolicy
2018-06-15 12:36:19 -07:00
Evan Tschannen
09c92c887b
fix: extraTlogEligibleMachines was not calculated correctly in all cases
2018-06-15 10:23:33 -07:00
Evan Tschannen
246abd1207
added full_replication to status
2018-06-14 21:14:18 -07:00
Evan Tschannen
0103b6f5ed
added datacenter_version_difference to status
2018-06-14 19:09:25 -07:00
Evan Tschannen
99e21c869c
fixed a number of status calculations, and re-enabled the status workload
2018-06-14 17:58:57 -07:00
Richard Low
39894ea798
Merge remote-tracking branch 'apple/release-5.2'
2018-06-12 18:31:20 -07:00
Balachandar Namasivayam
819929e1be
Address review comments.
2018-06-12 11:59:47 -07:00
Balachandar Namasivayam
7db928ccec
Cluster file and its parent directory needs to be writable for operation of fdb cluster.
...
Document this requirement and also add relevant details to status output.
2018-06-11 16:47:24 -07:00
A.J. Beamon
f965954122
Merge commit '82be52205b95464e355c449fdf3e7d483fa06677' into trace-log-refactor
...
# Conflicts:
# fdbserver/Status.actor.cpp
# fdbserver/workloads/DDMetrics.actor.cpp
# flow/Trace.cpp
2018-06-08 16:22:22 -07:00
A.J. Beamon
99c9958db7
Some more trace event normalization
2018-06-08 13:57:00 -07:00
A.J. Beamon
0ca51989bb
Merge branch 'master' into trace-log-refactor
...
# Conflicts:
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/Status.actor.cpp
# flow/Trace.cpp
2018-06-08 13:24:30 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
A.J. Beamon
78839b20fd
Merge branch 'master' into trace-log-refactor
...
# Conflicts:
# flow/Trace.cpp
2018-05-31 10:46:20 -07:00
A.J. Beamon
54b4c9e061
Merge branch 'release-5.2' into trace-log-refactor
...
# Conflicts:
# fdbserver/Status.actor.cpp
2018-05-08 15:51:54 -07:00
A.J. Beamon
ca720e1540
Merge pull request #297 from apple/release-5.2
...
Merge 5.2 to Master
2018-05-08 12:04:20 -07:00
A.J. Beamon
432a295bc2
Add read bytes and read keys info to status. Collect this information directly from StorageMetrics rather than through ratekeeper.
2018-05-04 12:01:40 -07:00
A.J. Beamon
ce0c991e78
Refactor trace events to store a vector of fields that aren't encoded until write time. Better support for pre-network trace events. Rework how trace events are queried. Some initial work towards pluggable formatting of logs.
2018-05-02 10:44:38 -07:00
Evan Tschannen
3abf4d7fdf
Merge branch 'master' into feature-remote-logs
2018-03-09 14:50:04 -08:00
Evan Tschannen
91bb8faa45
Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
A.J. Beamon
bb9f51bb5c
Don't try to extract attributes from the program start trace events if they couldn't be collected.
2018-03-09 11:55:57 -08:00
Evan Tschannen
1194e3a361
added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed.
2018-03-05 19:27:46 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
cb25564d38
simulated cluster supports fearless configurations
...
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00