Evan Tschannen
c348b3da51
After a proxy dies, it will remain alive for an additional 10 seconds to forward clients to the new proxies
2019-07-08 12:53:40 -07:00
Jingyu Zhou
50e7593c5b
Merge pull request #1796 from ajbeamon/remove-trace-event-underscores
...
Remove trace event underscores
2019-07-05 21:45:55 -07:00
Evan Tschannen
15e894c724
Merge in master
2019-07-05 15:49:24 -07:00
A.J. Beamon
9f4b6fd770
Remove additional underscores
2019-07-05 08:12:25 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen
e0be631414
shard the txs tag so that more transaction logs are involved in its recovery
2019-06-19 18:15:09 -07:00
sramamoorthy
2a68b28590
rebase related changes
2019-05-28 22:07:46 -07:00
sramamoorthy
b17ad85497
exec op not supported when log_anti_quorum > 0
2019-05-28 22:07:46 -07:00
sramamoorthy
d3a179b6f9
Multiple bug fixes
...
- wait for snapTLogFailKeys in a loop, otherwise in some race
condition it can cause a false assert
- in single region, there does not seem to be a guarantee of
tagLocalityListKey for a given DC ID, avoiding that assert for now
- to find the workers that are coordinators, looking up by primary
address is not sufficient in some cases, hence looking by both
primary and secondary address
- test make files to reflect the location of the new test cases
2019-05-28 22:07:46 -07:00
sramamoorthy
bb474dc323
if recovery < fully_recovered then fail the exec
...
Will do more cleanup, pushing it for a test run in CI
2019-05-28 22:07:46 -07:00
sramamoorthy
925499954b
New status cluster_not_fully_recovered
2019-05-28 22:07:46 -07:00
sramamoorthy
6f42337c09
TransactionNotPermitted instead of conflict error
...
When the cluster has not recovered completely, return op not
permitted instead of conflict error
2019-05-28 22:07:46 -07:00
sramamoorthy
4083af0b01
Avoid using trackLatest for TLog pop test cases
2019-05-28 22:07:46 -07:00
sramamoorthy
936ffc2dde
rebase related changes
2019-05-28 22:07:46 -07:00
sramamoorthy
ec7834e2f7
code re-orgnaization and address comments
2019-05-28 22:07:46 -07:00
sramamoorthy
c76cc84ded
execute coordinators code reorganized
2019-05-28 22:07:46 -07:00
sramamoorthy
61e93a9304
Address review comments and minor fixes
2019-05-28 22:07:46 -07:00
sramamoorthy
00ccee8a6c
workaround for log giving remote log and others
...
logSystemConfig.allLocalLogs() sometimes returns remote TLog interface
and a workaround is implemented here. Other minor cleanup.
2019-05-28 22:07:46 -07:00
sramamoorthy
89b7a052f5
Bug fixes for snapping coordinators
2019-05-28 22:07:46 -07:00
sramamoorthy
17ecba8313
trace cleanup and other indentation changes
2019-05-28 22:07:46 -07:00
sramamoorthy
898bed66c1
Allow only whitelisted binary path for exec op
2019-05-28 22:07:46 -07:00
sramamoorthy
d282016f93
Exec op to tag only local storage nodes
2019-05-28 22:07:46 -07:00
sramamoorthy
6431513ad0
Fail exec req until the cluster is fully_recovered
2019-05-28 22:07:46 -07:00
sramamoorthy
4016f16c76
Fix few compilation and bugs in rebase
2019-05-28 22:07:46 -07:00
sramamoorthy
4bc4c615da
exec op to all tlog, restore change in test &other
...
- exec operation to go to all the TLogs
- minor bug fix in tlog
- restore implementation for the simulator
- restore snap UID to be stored in restartInfo.ini
- test cases added
- indentation and trace file fixes
2019-05-28 22:07:46 -07:00
sramamoorthy
72dd067173
Trace message changes and fix few FIXMEs
2019-05-28 22:07:46 -07:00
sramamoorthy
69edefe68b
Snapshot based backup and resotre implementation
2019-05-28 22:07:46 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Jingyu Zhou
e193cac5ef
Merge remote-tracking branch 'apple/master' into tlog
...
Resolve Conflicts: fdbserver/MasterProxyServer.actor.cpp
2019-05-01 17:18:00 -07:00
Evan Tschannen
2d5043c665
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-04-30 18:27:04 -07:00
Evan Tschannen
6c77864731
separate GetStorageServerRejoinInfoRequest from GetKeyServerLocationsRequest, to avoid yielding for the rejoin requests
2019-04-25 17:07:35 -07:00
Jingyu Zhou
439d5a3843
Use emplace_back instead of push_back in Proxy
2019-04-22 14:03:48 -07:00
Jingyu Zhou
d2b215b926
Refactor tag population of ServerCacheInfo
2019-04-22 11:55:04 -07:00
Jingyu Zhou
966ec30fcc
Add pseudoLocalities for special tag consumers
2019-04-21 10:41:07 -07:00
Jingyu Zhou
b4e7e7a85b
Refactor StorageCache updates
2019-04-21 10:41:07 -07:00
Evan Tschannen
6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
...
Remove unused functions
2019-04-09 11:49:45 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
Jingyu Zhou
47b4b82628
Merge branch 'master' into fix-unreferenced
2019-04-01 14:07:19 -07:00
Jingyu Zhou
f7f8ddd894
Fix warnings on unused variables
...
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
Evan Tschannen
b6008558d3
renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
...
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen
87e2a1a029
The proxy budget is implemented to let one request over its limit through, and then pay back what was over the limit in the next update
2019-03-18 16:09:57 -07:00
Evan Tschannen
ec6c843124
increased the GRV client batch size, similarly increased the proxy limits related to the number of transactions started in a batch
2019-03-16 16:18:58 -07:00
Evan Tschannen
a7e45cff91
Merge pull request #1176 from jzhou77/ratekeeper
...
Make Ratekeeper a separate role
2019-03-12 15:58:59 -07:00
Jingyu Zhou
dc129207a9
Minor fix after rebase.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
3c86643822
Separate Ratekeeper from data distribution.
...
Add a new role for ratekeeper.
Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Evan Tschannen
988add9fb5
cache the metadataVersion for commits, so that doing setVersion with the commit version of a different transaction will
2019-03-04 16:48:34 -08:00
Evan Tschannen
075fdef31a
Merge branch 'master' into feature-metadata-version
...
# Conflicts:
# fdbclient/DatabaseContext.h
2019-03-03 22:58:45 -08:00
Trevor Clinkenbeard
39f612d132
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-03-02 17:07:00 -08:00
Evan Tschannen
841eb10af0
Replace g_network->now() with now()
...
Co-Authored-By: tclinken <trevorclinkenbeard@gmail.com>
2019-03-02 08:15:56 -08:00
Evan Tschannen
a37a65a7ff
Replace g_network->now() with now()
...
Co-Authored-By: tclinken <trevorclinkenbeard@gmail.com>
2019-03-02 08:15:05 -08:00
Evan Tschannen
3da85f3acd
implemented the \xff/metadataVersion key, which can be used by layers to help them cheaply cache metadata and know when their cache is invalid
2019-02-28 17:45:00 -08:00
A.J. Beamon
3e6a6a6569
Update status schema for correctness. Send the count of batch transactions started back to ratekeeper so that it can be logged with other ratekeeper metrics.
2019-02-28 12:00:58 -08:00
A.J. Beamon
d48a2d7b54
Fix problem caused by merge
2019-02-27 12:21:07 -08:00
A.J. Beamon
a051055caf
Initial implementation of adding separate limits for batch priority in ratekeeper
2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard
abfe057805
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-25 13:47:16 -08:00
Trevor Clinkenbeard
07f800eeee
Got rid of detailed field in GetRateInfoReply message
2019-02-23 17:52:11 -08:00
Trevor Clinkenbeard
f3a73963b4
Got rid of detailedLeaseDuration in GetRateInfoReply message
2019-02-23 16:42:11 -08:00
Trevor Clinkenbeard
ff9a7cb2f1
Combined proxy health metrics replies into single message type
2019-02-23 10:13:43 -08:00
Evan Tschannen
b8910ba7cd
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.h
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen
0e19b5a935
fix: allow the txnStateStore to be recovered from a process in a down datacenter, so that the cluster controller can know to switch to the other region
2019-02-21 16:52:27 -08:00
Trevor Clinkenbeard
fa96b8dd33
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-20 16:56:16 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Jingyu Zhou
5e6577cc82
Final cleanup per review comments
...
Make distributor interface optional in ServerDBInfo and many other small
changes.
2019-02-14 16:37:17 -08:00
Jingyu Zhou
b3d1633114
Fix bugs of missing request
...
The quite database can fail to send out requests and report timeout. This seems
to be caused by reusing a request that uses the same ReplyPromise. Another bug
is Proxy can wait for unneeded time for a dabase change, while the distributor
is already known to itself.
2019-02-14 16:37:16 -08:00
Jingyu Zhou
ef868f599c
Add DataDistributorInterface to ServerDBInfo
...
Also change the Proxy and QuietDatabase to use the DataDistributorInterface.
2019-02-14 16:37:16 -08:00
Jingyu Zhou
0490160714
Fix according to Evan's comments
...
Use getRateInfo's endpoint as the ID for the DataDistributorInterface.
For now, added a "rejoined" flag for ClusterControllerData and Proxy.
TODO: move DataDistributorInterface into ServerDBInfo.
2019-02-14 16:30:13 -08:00
Jingyu Zhou
886e7ab2ba
Add a new DataDistributor role.
...
Let cluster controller to start a new data distributor role by sending a
message to a chosen worker.
Change MasterInterface usage in DataDistribution to masterId
Add DataDistributor rejoin handling.
This allows the data distributor to tell the new cluster controller of its
existence so that the controller doesn't spawn a new one. I.e., there should
be only ONE data distributor in the cluster.
If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries
to recruit one as DD. CC also monitors DD and restarts one if it failed.
The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for
the new DD.
Add GetRecoveryInfo RPC to master server, which is called by data distributor
to obtain the recovery Transaction version from the master server.
2019-02-14 16:30:13 -08:00
A.J. Beamon
b435d51061
Merge branch 'master' into track-server-request-latencies
2019-02-14 08:07:32 -08:00
Andrew Noyes
067a445e06
Replace unused _ variables with wait(success(...))
2019-02-12 17:30:30 -08:00
A.J. Beamon
d4349293b9
Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing.
2019-02-07 13:39:22 -08:00
Trevor Clinkenbeard
b7eaaaf1e5
Proxy must update health metrics after receiving GetRateInfoReply
2019-02-02 17:08:54 -08:00
Trevor Clinkenbeard
4daf49ff4d
Proxy runs healthMetricsRequestServer to handle incoming health metrics requests
2019-02-01 10:58:42 -08:00
Trevor Clinkenbeard
03e5e3ccbc
Proxies periodically request health metrics from Ratekeeper in the getRate function. Occassionally (determined by DETAILED_METRIC_UPDATE_RATE), requests are for detailed per-process metrics.
2019-01-31 13:25:57 -08:00
Trevor Clinkenbeard
5822bd65bf
Track health metrics in Ratekeeper and send these metrics to proxies in GetRateInfoReply messages
2019-01-31 12:56:58 -08:00
A.J. Beamon
2198d24ce1
Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
...
# Conflicts:
# fdbclient/MasterProxyInterface.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/ServerDBInfo.h
# fdbserver/Status.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
A.J. Beamon
8e05e95045
Added the ability to configure the latency band settings by setting a special key in \xff keyspace.
2019-01-18 16:18:34 -08:00
A.J. Beamon
eb2f27b8e5
Work in progress implementation of server-side latency tracking. The intent of this is to be able to measure the number of requests that achieve certain latency targets across the system relative to the total number of requests.
2018-11-30 10:46:04 -08:00
A.J. Beamon
975711c389
Merge branch 'release-6.0' of github.com:apple/foundationdb
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2018-11-27 09:50:39 -08:00
Evan Tschannen
530b5e3763
fix: do not track txsPopVersions unless there are remote logs to pop from
2018-11-26 15:17:17 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
6f4ad84777
Merge pull request #903 from ajbeamon/move-batcher-into-proxy
...
Move the sort of generic batcher from fdbrpc and make it specific to …
2018-11-10 09:56:03 -08:00
A.J. Beamon
c3a06aa6f1
Fix indentation
2018-11-09 14:25:40 -08:00
A.J. Beamon
67a152ae9f
Move the sort of generic batcher from fdbrpc and make it specific to batching commits in master proxy. Also a couple minor formatting changes.
2018-11-09 14:19:18 -08:00
Evan Tschannen
6bb283aebc
fix: dcId to Locality changes could be lost if an emergency transaction happened that did not change the configuration
...
fix: master proxy was starting dcId’s at 1 number too large
2018-11-05 11:12:43 -08:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
ed7036139a
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
2018-10-18 17:00:52 -07:00
Evan Tschannen
9b6c7f253c
changed a knob
2018-10-18 15:26:19 -07:00
Evan Tschannen
0b304495ad
added a yield to the proxy when committing a large batch of mutations
2018-10-18 15:26:00 -07:00
Evan Tschannen
0217aed74c
Merge branch 'release-6.0'
...
# Conflicts:
# bindings/go/README.md
# documentation/sphinx/source/release-notes.rst
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2018-10-15 18:38:51 -07:00
Evan Tschannen
ecddeab2ae
fixed review comments; demote killRegionCycle test for now
2018-10-08 10:39:39 -07:00
Evan Tschannen
3922e477a5
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/LogSystemDiskQueueAdapter.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
Evan Tschannen
e7e1c634e0
fix: we need to restart the peek cursor when the known committed version becomes available
2018-10-02 17:44:14 -07:00
Evan Tschannen
05e7f08b26
added a peek method which will attempt to read the txsTag from the local region as much as possible
2018-09-28 12:21:08 -07:00
Evan Tschannen
200e65fe61
added a workload which tests killing an entire region, and recovering from the failure with data loss.
...
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Evan Tschannen
90301f497f
Merge branch 'release-6.0'
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/TLSConnection.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# versions.target
2018-09-05 16:06:33 -07:00
Evan Tschannen
90bf277206
require key value store memory to recover cleanly when recovering the txnStateStore, since all of the data it is recovering has been fsync’ed
2018-08-31 13:07:48 -07:00
Evan Tschannen
717c43a69f
merge 6.0 into master
2018-08-22 00:28:04 -07:00
Evan Tschannen
ffde1a0e28
renamed onlySystem to mustContainSystemMutations, to accurately represent what setting the key does
2018-08-21 22:15:45 -07:00
Evan Tschannen
cb60002944
Added the ability to disable all commits which do not modify the system keys by setting \xff/onlySystem = 1 in the database
2018-08-21 21:09:50 -07:00