A.J. Beamon
69d7c4f79c
Merge branch 'master' into track-run-loop-busyness
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# flow/Net2.actor.cpp
# flow/network.h
2019-07-09 18:39:23 -07:00
Evan Tschannen
c8d86516f0
Merge pull request #1800 from ajbeamon/rename-datacenter-version-difference
...
Rename datacenter_version_difference to datacenter_lag and include bo…
2019-07-09 17:29:27 -07:00
Trevor Clinkenbeard
1bac04509e
Track the local ratekeeper rate as a percentage
...
This value is reported in status for each storage server.
2019-07-09 12:46:53 -07:00
A.J. Beamon
4be08d9b2d
Rename datacenter_version_difference to datacenter_lag and include both seconds and versions.
2019-07-05 14:36:18 -07:00
A.J. Beamon
7f23814841
Track run loop busyness and report it in status.
2019-06-26 14:03:02 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Balachandar Namasivayam
7489f83a7f
Disable/Re-enable consistency check through a database key.
...
fdbcli has a new command 'consistencycheck' to disable/re-enable consistency check.
cluster_healthy metric in status becomes false if consistencycheck is disabled.
2019-06-20 21:38:45 -07:00
mpilman
844dd60202
FDB compiling with intel compiler
2019-06-20 09:29:01 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Evan Tschannen
22499666d0
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/LogRouter.actor.cpp
# flow/Trace.cpp
# versions.target
2019-05-08 18:19:35 -07:00
Evan Tschannen
d9a4553270
fix: The team tracker does not provide data movement priority information for non-failure related data movement
2019-05-07 17:06:54 -07:00
Austin Seipp
bf378952cb
fdbserver: fix some print/scan format warnings
...
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Evan Tschannen
f0fe0d7858
added additional logging on the logs and log routers
2019-05-02 16:16:25 -07:00
Andrew Noyes
ef04471a66
Fix more unused-variable warnings
2019-04-17 16:04:10 -07:00
Evan Tschannen
6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
...
Remove unused functions
2019-04-09 11:49:45 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
Evan Tschannen
39c595223b
Merge branch 'release-6.1'
2019-04-02 22:30:02 -07:00
Evan Tschannen
1d4a6ab551
cleaned up status to keep the healthyZone read separated from relicaFutures
2019-04-02 14:46:56 -07:00
Evan Tschannen
628fec8c8b
updated status with information about ongoing maintenance
...
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
Jingyu Zhou
3f76be8f45
Merge remote-tracking branch 'apple/master' into fix-unreferenced
2019-04-01 14:00:43 -07:00
Jingyu Zhou
f7f8ddd894
Fix warnings on unused variables
...
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
Evan Tschannen
836bb95a7a
Merge pull request #1372 from etschannen/master
...
Merge 6.1 into master
2019-03-27 21:00:49 -07:00
Evan Tschannen
34b9d5e722
Merge pull request #1364 from etschannen/feature-fast-serialize
...
A few performance optimizations
2019-03-27 20:57:25 -07:00
Evan Tschannen
e5a80f2c94
optimized IPaddress
2019-03-27 18:21:13 -07:00
A.J. Beamon
71e2fdafb8
Changes to ratekeeper camel case
2019-03-27 08:24:25 -07:00
A.J. Beamon
d508658569
Make ratekeeper one word to match our existing convention
2019-03-27 08:15:19 -07:00
Jingyu Zhou
7c02ee6fdd
Fix compiler warning about unreferenced exception variable
2019-03-26 13:43:47 -07:00
Jingyu Zhou
466a59a99d
Merge remote-tracking branch 'apple/release-6.1' into ratekeeper
2019-03-25 15:27:38 -07:00
Jingyu Zhou
f57a22e2ed
Add data distributor and ratekeeper to status output
2019-03-25 15:11:29 -07:00
Evan Tschannen
5e03e178de
Merge pull request #1345 from ajbeamon/support-multiple-client-or-worker-issues
...
Add support for a client or worker having multiple issues.
2019-03-24 17:27:50 -07:00
A.J. Beamon
fc48b6050e
When tabulating read workload metrics, ignore the absence of any particular storage server.
2019-03-22 14:22:22 -07:00
A.J. Beamon
4eb5715689
Add support for a client or worker having multiple issues.
2019-03-22 08:29:41 -07:00
Evan Tschannen
f9aad46573
made use_provisional_proxies a transaction option
2019-03-19 18:44:37 -07:00
Meng Xu
5a10bf5dfc
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-14 10:35:12 -07:00
Evan Tschannen
e068c478b5
merge master
2019-03-12 18:31:25 -07:00
Meng Xu
435e515985
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-11 11:17:40 -07:00
Evan Tschannen
80c3f2f8e2
added status fields detailing which processes are degraded, and also the total number of degraded processes
2019-03-10 22:58:15 -07:00
Evan Tschannen
044b6b4f8a
Merge branch 'master' into feature-degraded-tlog
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
Evan Tschannen
710a64dc4e
replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails
2019-03-08 11:25:07 -05:00
Jingyu Zhou
7340998261
Fix status message for ratekeeper
2019-03-07 13:16:20 -08:00
Meng Xu
845f8fdcbc
Status:healthy: Add optimizing_team_collections
...
Change removing_redundant_teams status name to
optimizing_team_collections.
The new name is more general and can be applied in the future
when we switch storage engines.
2019-03-06 15:05:23 -08:00
Meng Xu
04880e3d4d
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-06 13:41:16 -08:00
Meng Xu
820548223a
Status: connected_coordinators misc minor changes
...
Change the rst document file;
Change the coding style to be consistent with the nearby code;
Ensure we always initilize the connectedCoordinatesNum to 0
even when the variable is not used.
2019-03-05 21:45:18 -08:00
Meng Xu
b7a52e81e2
Status: Count connected coordinators per client
...
A client will always try to connect all coordinators.
This commit let Status track the number of connected coordinators
for each client.
This allows us to do canary in coordinators. For example,
when we switch from non-TLS to TLS, we can switch 1 coordinator
from non-TLS to TLS. This can help check if a client has the ability
to connect through TLS.
We can make the non-TLS to TLS switch for each coordinators
one by one. This avoid the risk of losing connection in the switch.
2019-03-05 21:21:23 -08:00
anoyes
981426bac9
More ide fixes
2019-03-05 18:03:57 -08:00
Meng Xu
c0535c49bb
Status: TLS client status
...
Use ClientStatusInfo structure for each network address (client),
instead of passing each status info as a parameter.
2019-03-04 16:35:10 -08:00
Vishesh Yadav
592e224155
net: add/use formatIpPort to format IP:PORT pairs #963
2019-03-04 14:12:45 -08:00
Vishesh Yadav
57832e625d
net: Support IPv6 #963
...
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.
- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.
- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.
- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
Meng Xu
94385447bc
Status: Get if client configured TLS
...
To understand if all clients have configured TLS,
we check the tlsoption when a client tries to open database.
This is similar to how we track the versions of multi-version clients.
2019-03-01 15:17:01 -08:00
A.J. Beamon
3e6a6a6569
Update status schema for correctness. Send the count of batch transactions started back to ratekeeper so that it can be logged with other ratekeeper metrics.
2019-02-28 12:00:58 -08:00
A.J. Beamon
eb629d87a5
Add information about batch ratekeeper to status. Make it possible to track latencies in the ReadWrite workload for concurrently run instances separately.
2019-02-28 09:53:16 -08:00
Evan Tschannen
d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
...
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Meng Xu
9445ac0b0c
Status: Use new data distributor worker to publish status
...
After we add a new data distributor role, we publish the data
related to data distributor and rate keeper through the new
role (and new worker).
So the status needs to contact the data distributor, instead of master,
to get the status information.
2019-02-21 18:05:50 -08:00
Meng Xu
7cca439e00
TeamRemover: Add status to show redundant team removing
...
Distinguish the removal of unhealthy team and redundant team.
Change status report to include redundant team removal report.
2019-02-21 14:16:46 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Jingyu Zhou
62c67a50e5
Fix segfault error
...
The usedIds is updated by master registration request, which populates the
usedIds map. However, this request may contain processes that cluster controller
is not aware, i.e., not in id_worker map.
This is ok until I added tracing the usedIds, which silently insert an empty
entry into id_worker map for the unknown process. This new entry can cause
crashing failure when trying to access its LocalityData.
Remove AsyncTrigger for usedIds, and change to serverInfo->onChange.
Use const & to avoid unnecessary copies in WorkerInterface's LocalityData
and getExtraTLogEligibleMachines().
2019-02-14 16:37:16 -08:00
A.J. Beamon
b435d51061
Merge branch 'master' into track-server-request-latencies
2019-02-14 08:07:32 -08:00
Andrew Noyes
067a445e06
Replace unused _ variables with wait(success(...))
2019-02-12 17:30:30 -08:00
A.J. Beamon
d4349293b9
Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing.
2019-02-07 13:39:22 -08:00
A.J. Beamon
2198d24ce1
Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
...
# Conflicts:
# fdbclient/MasterProxyInterface.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/ServerDBInfo.h
# fdbserver/Status.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
A.J. Beamon
8e05e95045
Added the ability to configure the latency band settings by setting a special key in \xff keyspace.
2019-01-18 16:18:34 -08:00
A.J. Beamon
eb2f27b8e5
Work in progress implementation of server-side latency tracking. The intent of this is to be able to measure the number of requests that achieve certain latency targets across the system relative to the total number of requests.
2018-11-30 10:46:04 -08:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
db71b60d72
Merge pull request #819 from satherton/feature-redwood
...
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen
0217aed74c
Merge branch 'release-6.0'
...
# Conflicts:
# bindings/go/README.md
# documentation/sphinx/source/release-notes.rst
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2018-10-15 18:38:51 -07:00
A.J. Beamon
419231d798
Fix: status was trying to read a metric under the wrong name, leading to an error that caused the cluster to report itself unhealthy and some metrics to be missing.
2018-10-10 13:33:28 -07:00
Stephen Atherton
22f8a4efa9
Normalized all unit test names to begin with "/" if they should be included in random unit testing.
2018-10-05 22:09:58 -07:00
Evan Tschannen
3922e477a5
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/LogSystemDiskQueueAdapter.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
A.J. Beamon
84c2e3567f
Fix keys queried to use the RowsQueried metric instead of BytesQueried.
2018-10-01 11:19:28 -07:00
A.J. Beamon
a98fcf5972
Rename durable_lag to durability_lag
2018-10-01 09:58:49 -07:00
A.J. Beamon
f196e2d4dc
Lot metrics about read requests as well as completed reads.
2018-09-27 15:32:39 -07:00
A.J. Beamon
118e21c446
Add new metrics for bytes queried, keys queried, mutation bytes, mutations, and durable lag to the storage role in status.
2018-09-27 14:33:21 -07:00
Alec Grieser
10a8e67266
Merge remote-tracking branch 'upstream/release-6.0' into merge-release-6.0
2018-09-11 21:49:59 -07:00
Stephen Atherton
b47902584b
Added comparison of serialization speed of json_spirit and JsonBuilder to JsonBuilder perf test.
2018-09-10 19:54:27 -07:00
Stephen Atherton
5b8971ef18
Rename JsonString to JsonBuilder
2018-09-10 19:01:24 -07:00
Stephen Atherton
8ed8f290c4
Merge branch 'release-6.0' of github.com:etschannen/foundationdb into evan-6
2018-09-10 13:28:49 -07:00
Stephen Atherton
4753172991
Added support for NaN and infinity in JsonBuilder in double and ascii number interfaces.
2018-09-10 13:28:31 -07:00
Evan Tschannen
63271072e8
fixed linker problem
2018-09-10 12:59:22 -07:00
Stephen Atherton
9febc01106
In simulation, strict JSON parser will print bad input to stdout (not std since whether it's an error depends on context).
2018-09-10 12:07:03 -07:00
Evan Tschannen
ce38ddbd4b
added tests with the + sign
2018-09-10 10:52:15 -07:00
Stephen Atherton
4510881ba1
Added JSON reparsing and error output to all JsonBuilder output tests.
2018-09-10 03:21:55 -07:00
Stephen Atherton
9828db4399
Rewrote ASCII -> JSON number support as sort of a filter which cleans up ASCII numbers into valid JSON. Added tests for strange numbers which tests the generated output by parsing it as JSON.
2018-09-10 03:07:11 -07:00
Evan Tschannen
5bff967763
avoid parsing numbers that are never used as numbers
2018-09-09 23:19:00 -07:00
Stephen Atherton
2afd4cffb9
Simplified raw string writers and added StringRef version so that locality field names in status do not have to be repeatedly copied.
2018-09-09 22:47:12 -07:00
Stephen Atherton
41b03f6f68
Some refactoring in JsonBuilder to remove code duplication and make some expressions simpler. Some performance improvements, switched vector to VectorRef and used its bulk append method, and redirected basic mValue types to faster serialization. Added StringRef writeValue() method so that locality field writing in status does not have to use toString().
2018-09-09 22:39:51 -07:00
Stephen Atherton
7ca304d209
Bug fix, adding empty JsonBuilder object or array to same would cause an errant comma at the end of the target.
2018-09-09 22:11:11 -07:00
Stephen Atherton
2d6632dfc6
Added more JsonBuilder unit tests, some of which fail.
2018-09-09 22:05:04 -07:00
Evan Tschannen
84737b1fbe
speed up status
2018-09-09 18:12:41 -07:00
Stephen Atherton
ce6a9da423
Added JsonBuilder performance unit test for tracking optimizations.
2018-09-08 18:05:16 -07:00
Stephen Atherton
21c7ed2b50
JsonString refactored so that type compatibility is enforced at compile time. Specific JSON types are handled with subclasses that have type-specific access methods and a caller's intentions can no longer be ambiguous. Nothing that compiles should be able to produce malformed JSON.
2018-09-08 15:44:48 -07:00
Stephen Atherton
ce3f01a0cf
Added concept of type to JsonString. Appending single items or key/value pairs is now type-safe and only allowed in certain cases. JsonString will refuse to produce invalid JSON. All duplicative functions have been replaced with templates. Encoding of values uses json_spirit's value writer which should be no worse performance than format() and it will escape everything properly. Final string form is now built directly using knowledge of type, such as when an instance becomes an Array or Object the appropriate opening character is written. This avoids a full copy just to prepend the opening character later. Index interface for key/value pairs no longer makes a temporary copy of the key string. JsonString is now only needed by Status.actor.cpp. Still more work to be done here.
2018-09-08 07:15:28 -07:00
Evan Tschannen
d3c8d7ab4e
fix: status would generate invalid json
2018-09-07 18:26:05 -07:00
Bhaskar Muppana
70ba750bcd
dbName is not used anymore.
2018-09-06 14:31:26 -07:00
Bhaskar Muppana
920fd3fe97
Merge branch 'release-6.0'
2018-09-06 14:24:02 -07:00
Evan Tschannen
98651bafb1
removed _keyNames from JsonString
2018-09-05 22:51:15 -07:00
Evan Tschannen
90301f497f
Merge branch 'release-6.0'
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/TLSConnection.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# versions.target
2018-09-05 16:06:33 -07:00
Alvin Moore
221f73e69e
Merge branch 'release-6.0' of github.com:apple/foundationdb into status-json
...
# Conflicts:
# fdbclient/fdbclient.vcxproj
# fdbserver/Status.actor.cpp
2018-09-05 14:44:09 -07:00
Alvin Moore
6aa22af83b
Added explicit keyword for appropriate constructor
...
Templatized some methods
Removed unused hash function
2018-09-05 12:22:04 -07:00
Evan Tschannen
4eaff42e4f
Merge pull request #712 from ajbeamon/remove-database-name-internal
...
Eliminate use of database names (phase 1)
2018-09-05 10:35:00 -07:00
Alvin Moore
43a2afc3b6
Added TODO comment
...
Removed debug comments
2018-09-05 08:06:19 -07:00
Alvin Moore
04a768042a
Added TraceEvent to measure time to create Status Json
...
Simplified JsonString class to use implementation method for reuse of methods
Removed quotes from non-string values within json
Added Tests for jsonstring
Removed hashing of names for JsonString
Switched name tracker to unordered set
2018-09-05 03:50:53 -07:00
Evan Tschannen
1e2ce75ce4
fix: if usable_regions=1 extraTlogEligibleMachines was calculated incorrectly
2018-08-31 13:04:00 -07:00
Alvin Moore
affd7423b4
Added class to write json as objects are added
...
Integrated JsonString class into status
2018-08-31 01:21:24 -07:00
Evan Tschannen
717c43a69f
merge 6.0 into master
2018-08-22 00:28:04 -07:00
A.J. Beamon
2a97139d5d
This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.
2018-08-16 10:24:12 -07:00
Evan Tschannen
e770629229
fix: json_spirit::write_string is very CPU intensive, especially for large JSON documents. The cluster controller would call this function for each status reply it needed to send, resulting in a slow task.
2018-08-15 19:39:06 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
9c918a28f6
fix: status was reporting no replicas remaining when the remote datacenter was initially configured with usable_regions=2
2018-08-09 13:16:09 -07:00
A.J. Beamon
7d831ef9c3
Revert change that prints lag with 2 decimal points of precision.
2018-08-07 15:41:51 -07:00
A.J. Beamon
e0cf525951
Fix: use new data lag fields when making storage server message indicating high lag.
2018-08-07 11:02:09 -07:00
Evan Tschannen
6d76ff67a3
added the connection string to status
2018-07-09 22:11:58 -07:00
Evan Tschannen
507b3bacb0
fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
...
added more recovery states.
2018-07-05 00:08:51 -07:00
Evan Tschannen
a288d5b9a9
added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down
2018-06-28 23:15:32 -07:00
A.J. Beamon
1f0561a9c0
Missed a couple requested changes
2018-06-26 15:22:39 -07:00
A.J. Beamon
fec225075f
Merge branch 'master' into trace-log-refactor
2018-06-26 14:54:42 -07:00
A.J. Beamon
fe956bc35a
Address review comments
2018-06-26 14:37:21 -07:00
A.J. Beamon
9f545ce002
Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor
2018-06-26 11:37:23 -07:00
A.J. Beamon
e8f66df001
Add metrics for watches and mutations on the storage server. The storage server tracks its lag with the logs, and status tries to report a more accurate measure of this lag.
2018-06-21 15:59:43 -07:00
A.J. Beamon
5e81f4ac7e
Track unused allocated memory in ProcessMetrics and report it in status.
2018-06-20 10:10:51 -07:00
Evan Tschannen
0913368651
added usable_regions to specify if we will replicate into a remote region
...
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Evan Tschannen
f694f7c9ca
removed hasBestPolicy
2018-06-15 12:36:19 -07:00
Evan Tschannen
09c92c887b
fix: extraTlogEligibleMachines was not calculated correctly in all cases
2018-06-15 10:23:33 -07:00
Evan Tschannen
246abd1207
added full_replication to status
2018-06-14 21:14:18 -07:00
Evan Tschannen
0103b6f5ed
added datacenter_version_difference to status
2018-06-14 19:09:25 -07:00
Evan Tschannen
99e21c869c
fixed a number of status calculations, and re-enabled the status workload
2018-06-14 17:58:57 -07:00
Richard Low
39894ea798
Merge remote-tracking branch 'apple/release-5.2'
2018-06-12 18:31:20 -07:00
Balachandar Namasivayam
819929e1be
Address review comments.
2018-06-12 11:59:47 -07:00
Balachandar Namasivayam
7db928ccec
Cluster file and its parent directory needs to be writable for operation of fdb cluster.
...
Document this requirement and also add relevant details to status output.
2018-06-11 16:47:24 -07:00
A.J. Beamon
f965954122
Merge commit '82be52205b95464e355c449fdf3e7d483fa06677' into trace-log-refactor
...
# Conflicts:
# fdbserver/Status.actor.cpp
# fdbserver/workloads/DDMetrics.actor.cpp
# flow/Trace.cpp
2018-06-08 16:22:22 -07:00
A.J. Beamon
99c9958db7
Some more trace event normalization
2018-06-08 13:57:00 -07:00
A.J. Beamon
0ca51989bb
Merge branch 'master' into trace-log-refactor
...
# Conflicts:
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/Status.actor.cpp
# flow/Trace.cpp
2018-06-08 13:24:30 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
A.J. Beamon
78839b20fd
Merge branch 'master' into trace-log-refactor
...
# Conflicts:
# flow/Trace.cpp
2018-05-31 10:46:20 -07:00
A.J. Beamon
54b4c9e061
Merge branch 'release-5.2' into trace-log-refactor
...
# Conflicts:
# fdbserver/Status.actor.cpp
2018-05-08 15:51:54 -07:00
A.J. Beamon
ca720e1540
Merge pull request #297 from apple/release-5.2
...
Merge 5.2 to Master
2018-05-08 12:04:20 -07:00
A.J. Beamon
432a295bc2
Add read bytes and read keys info to status. Collect this information directly from StorageMetrics rather than through ratekeeper.
2018-05-04 12:01:40 -07:00
A.J. Beamon
ce0c991e78
Refactor trace events to store a vector of fields that aren't encoded until write time. Better support for pre-network trace events. Rework how trace events are queried. Some initial work towards pluggable formatting of logs.
2018-05-02 10:44:38 -07:00
Evan Tschannen
3abf4d7fdf
Merge branch 'master' into feature-remote-logs
2018-03-09 14:50:04 -08:00
Evan Tschannen
91bb8faa45
Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
A.J. Beamon
bb9f51bb5c
Don't try to extract attributes from the program start trace events if they couldn't be collected.
2018-03-09 11:55:57 -08:00
Evan Tschannen
1194e3a361
added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed.
2018-03-05 19:27:46 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
cb25564d38
simulated cluster supports fearless configurations
...
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00
Evan Tschannen
42405c78a5
Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/Knobs.cpp
2018-02-10 12:08:52 -08:00
Stephen Atherton
0792d5e3dd
Fix: last restorable version for a backup tag name (a separate value from the latest restorable version for a configured backup) was not being updated.
...
Fix: backup blob speed was sometimes an error because the JSON $sum merge operator did not support mixed numeric types.
Fix: JSON merge operator handling was squashing errors in some cases, which was generally obscuring the backup speed metric issue.
Cleaned up some of the JSON object merging logic.
Improved error messages in JSON merge operators. Added JSON merge operator tests for mixed numeric math and improved readability of test output.
2018-02-06 13:44:04 -08:00
Evan Tschannen
79d94214a4
Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs
2018-01-25 10:12:06 -08:00
A.J. Beamon
2744646090
Merge branch 'release-5.0' into release-5.1
2018-01-22 11:57:58 -08:00
A.J. Beamon
188562ccbc
fix: Status should create its DatabaseConfiguration using fromKeyValues(). This makes sure that various state is correctly set if not specified in the configuration.
2018-01-22 11:40:08 -08:00
Evan Tschannen
3ec45d38a0
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
A.J. Beamon
5015119115
Generalize the message that gets displayed in status if a cluster file's contents are incorrect.
2018-01-05 10:29:47 -08:00
Yichi Chiang
c033d8efd8
Fix typo message and remove extra TraceEvent which overwrites the expected one
2017-11-02 10:47:51 -07:00
A.J. Beamon
7cf17df821
Merge branch 'master' into log-group-for-unsupported-clients
...
# Conflicts:
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-11-01 11:31:02 -07:00
A.J. Beamon
31caac67dc
Rename supported_versions[x].clients to supported_versions[x].connected_clients
2017-11-01 10:41:30 -07:00
Evan Tschannen
48901a9223
added a list of tlog IDs that are missing to status
2017-10-24 16:28:50 -07:00
Evan Tschannen
df74e2a373
re-added support for non-copying tlog recovery
2017-10-24 15:09:31 -07:00
Yichi Chiang
d4f75630de
Support log group field in status json
2017-09-28 16:31:29 -07:00
Evan Tschannen
76e7988663
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.h
# flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Evan Tschannen
ea26bc1c43
passed first tests which kill entire datacenters
...
added configuration options for the remote data center and satellite data centers
updated cluster controller recruitment logic
refactors how master writes core state
updated log recovery, and log system peeking
2017-09-07 15:32:08 -07:00
A.J. Beamon
03478561b9
fix: Set lock aware at the transaction level for latency probe to avoid having to fill the shard cache every time.
2017-08-28 17:16:46 -07:00
Evan Tschannen
81ae263ad9
implemented setPeekCursor
...
removed oldTLogServer
first compiling version
2017-07-10 17:41:32 -07:00
Evan Tschannen
0906250e78
merged everything from feature-remote-logs besides the tlog and tagpartitionedlogsystem
...
re-included tags in messages to the tlog
previously never committed the LogRouter
2017-06-29 15:50:19 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00