Alvin Moore
43a2afc3b6
Added TODO comment
...
Removed debug comments
2018-09-05 08:06:19 -07:00
Alvin Moore
04a768042a
Added TraceEvent to measure time to create Status Json
...
Simplified JsonString class to use implementation method for reuse of methods
Removed quotes from non-string values within json
Added Tests for jsonstring
Removed hashing of names for JsonString
Switched name tracker to unordered set
2018-09-05 03:50:53 -07:00
Alvin Moore
affd7423b4
Added class to write json as objects are added
...
Integrated JsonString class into status
2018-08-31 01:21:24 -07:00
Evan Tschannen
e770629229
fix: json_spirit::write_string is very CPU intensive, especially for large JSON documents. The cluster controller would call this function for each status reply it needed to send, resulting in a slow task.
2018-08-15 19:39:06 -07:00
Evan Tschannen
9c918a28f6
fix: status was reporting no replicas remaining when the remote datacenter was initially configured with usable_regions=2
2018-08-09 13:16:09 -07:00
A.J. Beamon
7d831ef9c3
Revert change that prints lag with 2 decimal points of precision.
2018-08-07 15:41:51 -07:00
A.J. Beamon
e0cf525951
Fix: use new data lag fields when making storage server message indicating high lag.
2018-08-07 11:02:09 -07:00
Evan Tschannen
6d76ff67a3
added the connection string to status
2018-07-09 22:11:58 -07:00
Evan Tschannen
507b3bacb0
fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
...
added more recovery states.
2018-07-05 00:08:51 -07:00
Evan Tschannen
a288d5b9a9
added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down
2018-06-28 23:15:32 -07:00
A.J. Beamon
1f0561a9c0
Missed a couple requested changes
2018-06-26 15:22:39 -07:00
A.J. Beamon
fec225075f
Merge branch 'master' into trace-log-refactor
2018-06-26 14:54:42 -07:00
A.J. Beamon
fe956bc35a
Address review comments
2018-06-26 14:37:21 -07:00
A.J. Beamon
9f545ce002
Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor
2018-06-26 11:37:23 -07:00
A.J. Beamon
e8f66df001
Add metrics for watches and mutations on the storage server. The storage server tracks its lag with the logs, and status tries to report a more accurate measure of this lag.
2018-06-21 15:59:43 -07:00
A.J. Beamon
5e81f4ac7e
Track unused allocated memory in ProcessMetrics and report it in status.
2018-06-20 10:10:51 -07:00
Evan Tschannen
0913368651
added usable_regions to specify if we will replicate into a remote region
...
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Evan Tschannen
f694f7c9ca
removed hasBestPolicy
2018-06-15 12:36:19 -07:00
Evan Tschannen
09c92c887b
fix: extraTlogEligibleMachines was not calculated correctly in all cases
2018-06-15 10:23:33 -07:00
Evan Tschannen
246abd1207
added full_replication to status
2018-06-14 21:14:18 -07:00
Evan Tschannen
0103b6f5ed
added datacenter_version_difference to status
2018-06-14 19:09:25 -07:00
Evan Tschannen
99e21c869c
fixed a number of status calculations, and re-enabled the status workload
2018-06-14 17:58:57 -07:00
Richard Low
39894ea798
Merge remote-tracking branch 'apple/release-5.2'
2018-06-12 18:31:20 -07:00
Balachandar Namasivayam
819929e1be
Address review comments.
2018-06-12 11:59:47 -07:00
Balachandar Namasivayam
7db928ccec
Cluster file and its parent directory needs to be writable for operation of fdb cluster.
...
Document this requirement and also add relevant details to status output.
2018-06-11 16:47:24 -07:00
A.J. Beamon
f965954122
Merge commit '82be52205b95464e355c449fdf3e7d483fa06677' into trace-log-refactor
...
# Conflicts:
# fdbserver/Status.actor.cpp
# fdbserver/workloads/DDMetrics.actor.cpp
# flow/Trace.cpp
2018-06-08 16:22:22 -07:00
A.J. Beamon
99c9958db7
Some more trace event normalization
2018-06-08 13:57:00 -07:00
A.J. Beamon
0ca51989bb
Merge branch 'master' into trace-log-refactor
...
# Conflicts:
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/Status.actor.cpp
# flow/Trace.cpp
2018-06-08 13:24:30 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
A.J. Beamon
78839b20fd
Merge branch 'master' into trace-log-refactor
...
# Conflicts:
# flow/Trace.cpp
2018-05-31 10:46:20 -07:00
A.J. Beamon
54b4c9e061
Merge branch 'release-5.2' into trace-log-refactor
...
# Conflicts:
# fdbserver/Status.actor.cpp
2018-05-08 15:51:54 -07:00
A.J. Beamon
ca720e1540
Merge pull request #297 from apple/release-5.2
...
Merge 5.2 to Master
2018-05-08 12:04:20 -07:00
A.J. Beamon
432a295bc2
Add read bytes and read keys info to status. Collect this information directly from StorageMetrics rather than through ratekeeper.
2018-05-04 12:01:40 -07:00
A.J. Beamon
ce0c991e78
Refactor trace events to store a vector of fields that aren't encoded until write time. Better support for pre-network trace events. Rework how trace events are queried. Some initial work towards pluggable formatting of logs.
2018-05-02 10:44:38 -07:00
Evan Tschannen
3abf4d7fdf
Merge branch 'master' into feature-remote-logs
2018-03-09 14:50:04 -08:00
Evan Tschannen
91bb8faa45
Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
A.J. Beamon
bb9f51bb5c
Don't try to extract attributes from the program start trace events if they couldn't be collected.
2018-03-09 11:55:57 -08:00
Evan Tschannen
1194e3a361
added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed.
2018-03-05 19:27:46 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
cb25564d38
simulated cluster supports fearless configurations
...
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00
Evan Tschannen
42405c78a5
Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/Knobs.cpp
2018-02-10 12:08:52 -08:00
Stephen Atherton
0792d5e3dd
Fix: last restorable version for a backup tag name (a separate value from the latest restorable version for a configured backup) was not being updated.
...
Fix: backup blob speed was sometimes an error because the JSON $sum merge operator did not support mixed numeric types.
Fix: JSON merge operator handling was squashing errors in some cases, which was generally obscuring the backup speed metric issue.
Cleaned up some of the JSON object merging logic.
Improved error messages in JSON merge operators. Added JSON merge operator tests for mixed numeric math and improved readability of test output.
2018-02-06 13:44:04 -08:00
Evan Tschannen
79d94214a4
Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs
2018-01-25 10:12:06 -08:00
A.J. Beamon
2744646090
Merge branch 'release-5.0' into release-5.1
2018-01-22 11:57:58 -08:00
A.J. Beamon
188562ccbc
fix: Status should create its DatabaseConfiguration using fromKeyValues(). This makes sure that various state is correctly set if not specified in the configuration.
2018-01-22 11:40:08 -08:00
Evan Tschannen
3ec45d38a0
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
A.J. Beamon
5015119115
Generalize the message that gets displayed in status if a cluster file's contents are incorrect.
2018-01-05 10:29:47 -08:00
Yichi Chiang
c033d8efd8
Fix typo message and remove extra TraceEvent which overwrites the expected one
2017-11-02 10:47:51 -07:00