Commit Graph

61 Commits

Author SHA1 Message Date
Alvin Moore 43a2afc3b6 Added TODO comment
Removed debug comments
2018-09-05 08:06:19 -07:00
Alvin Moore 04a768042a Added TraceEvent to measure time to create Status Json
Simplified JsonString class to use implementation method for reuse of methods
Removed quotes from non-string values within json
Added Tests for jsonstring
Removed hashing of names for JsonString
Switched name tracker to unordered set
2018-09-05 03:50:53 -07:00
Alvin Moore affd7423b4 Added class to write json as objects are added
Integrated JsonString class into status
2018-08-31 01:21:24 -07:00
Evan Tschannen e770629229 fix: json_spirit::write_string is very CPU intensive, especially for large JSON documents. The cluster controller would call this function for each status reply it needed to send, resulting in a slow task. 2018-08-15 19:39:06 -07:00
Evan Tschannen 9c918a28f6 fix: status was reporting no replicas remaining when the remote datacenter was initially configured with usable_regions=2 2018-08-09 13:16:09 -07:00
A.J. Beamon 7d831ef9c3 Revert change that prints lag with 2 decimal points of precision. 2018-08-07 15:41:51 -07:00
A.J. Beamon e0cf525951 Fix: use new data lag fields when making storage server message indicating high lag. 2018-08-07 11:02:09 -07:00
Evan Tschannen 6d76ff67a3 added the connection string to status 2018-07-09 22:11:58 -07:00
Evan Tschannen 507b3bacb0 fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
added more recovery states.
2018-07-05 00:08:51 -07:00
Evan Tschannen a288d5b9a9 added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down 2018-06-28 23:15:32 -07:00
A.J. Beamon 1f0561a9c0 Missed a couple requested changes 2018-06-26 15:22:39 -07:00
A.J. Beamon fec225075f Merge branch 'master' into trace-log-refactor 2018-06-26 14:54:42 -07:00
A.J. Beamon fe956bc35a Address review comments 2018-06-26 14:37:21 -07:00
A.J. Beamon 9f545ce002 Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor 2018-06-26 11:37:23 -07:00
A.J. Beamon e8f66df001 Add metrics for watches and mutations on the storage server. The storage server tracks its lag with the logs, and status tries to report a more accurate measure of this lag. 2018-06-21 15:59:43 -07:00
A.J. Beamon 5e81f4ac7e Track unused allocated memory in ProcessMetrics and report it in status. 2018-06-20 10:10:51 -07:00
Evan Tschannen 0913368651 added usable_regions to specify if we will replicate into a remote region
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Evan Tschannen f694f7c9ca removed hasBestPolicy 2018-06-15 12:36:19 -07:00
Evan Tschannen 09c92c887b fix: extraTlogEligibleMachines was not calculated correctly in all cases 2018-06-15 10:23:33 -07:00
Evan Tschannen 246abd1207 added full_replication to status 2018-06-14 21:14:18 -07:00
Evan Tschannen 0103b6f5ed added datacenter_version_difference to status 2018-06-14 19:09:25 -07:00
Evan Tschannen 99e21c869c fixed a number of status calculations, and re-enabled the status workload 2018-06-14 17:58:57 -07:00
Richard Low 39894ea798 Merge remote-tracking branch 'apple/release-5.2' 2018-06-12 18:31:20 -07:00
Balachandar Namasivayam 819929e1be Address review comments. 2018-06-12 11:59:47 -07:00
Balachandar Namasivayam 7db928ccec Cluster file and its parent directory needs to be writable for operation of fdb cluster.
Document this requirement and also add relevant details to status output.
2018-06-11 16:47:24 -07:00
A.J. Beamon f965954122 Merge commit '82be52205b95464e355c449fdf3e7d483fa06677' into trace-log-refactor
# Conflicts:
#	fdbserver/Status.actor.cpp
#	fdbserver/workloads/DDMetrics.actor.cpp
#	flow/Trace.cpp
2018-06-08 16:22:22 -07:00
A.J. Beamon 99c9958db7 Some more trace event normalization 2018-06-08 13:57:00 -07:00
A.J. Beamon 0ca51989bb Merge branch 'master' into trace-log-refactor
# Conflicts:
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/Status.actor.cpp
#	flow/Trace.cpp
2018-06-08 13:24:30 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
A.J. Beamon 78839b20fd Merge branch 'master' into trace-log-refactor
# Conflicts:
#	flow/Trace.cpp
2018-05-31 10:46:20 -07:00
A.J. Beamon 54b4c9e061 Merge branch 'release-5.2' into trace-log-refactor
# Conflicts:
#	fdbserver/Status.actor.cpp
2018-05-08 15:51:54 -07:00
A.J. Beamon ca720e1540
Merge pull request #297 from apple/release-5.2
Merge 5.2 to Master
2018-05-08 12:04:20 -07:00
A.J. Beamon 432a295bc2 Add read bytes and read keys info to status. Collect this information directly from StorageMetrics rather than through ratekeeper. 2018-05-04 12:01:40 -07:00
A.J. Beamon ce0c991e78 Refactor trace events to store a vector of fields that aren't encoded until write time. Better support for pre-network trace events. Rework how trace events are queried. Some initial work towards pluggable formatting of logs. 2018-05-02 10:44:38 -07:00
Evan Tschannen 3abf4d7fdf Merge branch 'master' into feature-remote-logs 2018-03-09 14:50:04 -08:00
Evan Tschannen 91bb8faa45 Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
A.J. Beamon bb9f51bb5c Don't try to extract attributes from the program start trace events if they couldn't be collected. 2018-03-09 11:55:57 -08:00
Evan Tschannen 1194e3a361 added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed. 2018-03-05 19:27:46 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen cb25564d38 simulated cluster supports fearless configurations
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00
Evan Tschannen 42405c78a5 Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/Knobs.cpp
2018-02-10 12:08:52 -08:00
Stephen Atherton 0792d5e3dd Fix: last restorable version for a backup tag name (a separate value from the latest restorable version for a configured backup) was not being updated.
Fix: backup blob speed was sometimes an error because the JSON $sum merge operator did not support mixed numeric types.
Fix: JSON merge operator handling was squashing errors in some cases, which was generally obscuring the backup speed metric issue.
Cleaned up some of the JSON object merging logic.
Improved error messages in JSON merge operators.  Added JSON merge operator tests for mixed numeric math and improved readability of test output.
2018-02-06 13:44:04 -08:00
Evan Tschannen 79d94214a4 Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs 2018-01-25 10:12:06 -08:00
A.J. Beamon 2744646090 Merge branch 'release-5.0' into release-5.1 2018-01-22 11:57:58 -08:00
A.J. Beamon 188562ccbc fix: Status should create its DatabaseConfiguration using fromKeyValues(). This makes sure that various state is correctly set if not specified in the configuration. 2018-01-22 11:40:08 -08:00
Evan Tschannen 3ec45d38a0 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
A.J. Beamon 5015119115 Generalize the message that gets displayed in status if a cluster file's contents are incorrect. 2018-01-05 10:29:47 -08:00
Yichi Chiang c033d8efd8 Fix typo message and remove extra TraceEvent which overwrites the expected one 2017-11-02 10:47:51 -07:00