negoyal
a4a0bf18f9
Merging with Master.
2019-11-12 13:01:29 -08:00
Jon Fu
d96a7b2c69
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-03 09:47:45 -07:00
Evan Tschannen
3cc5d484a5
the include and exclude commands do not need to set the moveKeysLockOwnerKey, which will kill the data distribution algorithm
2019-09-27 18:33:56 -07:00
A.J. Beamon
1f8a157b35
Extend the length allowed for configuration fields. Log the config if recovery fails due to invalid config.
2019-09-05 15:36:37 -07:00
Andrew Noyes
6aa0ada7b1
Replace scalar root types with proper messages
2019-08-28 14:40:50 -07:00
Evan Tschannen
4c9a392f05
the master checks the popped version of the txsTag before recovering the txnStateStore, to avoid restoring data that is later found to be popped
2019-08-05 17:01:48 -07:00
Evan Tschannen
5c98dcce6d
revert the proxy forwarding path, because it is no longer necessary as clients keep a persistent connection open with coordinators
2019-07-27 16:46:22 -07:00
Evan Tschannen
b509a441e7
Merge branch 'master' into feature-skip-confirm
...
# Conflicts:
# bindings/flow/tester/Tester.actor.cpp
# bindings/go/src/_stacktester/stacktester.go
# bindings/java/src/test/com/apple/foundationdb/test/AsyncStackTester.java
# bindings/java/src/test/com/apple/foundationdb/test/StackTester.java
# bindings/python/tests/tester.py
# bindings/ruby/tests/tester.rb
# documentation/sphinx/source/api-c.rst
# documentation/sphinx/source/api-python.rst
# documentation/sphinx/source/api-ruby.rst
# documentation/sphinx/source/data-modeling.rst
# documentation/sphinx/source/developer-guide.rst
# fdbclient/vexillographer/fdb.options
# fdbserver/MasterProxyServer.actor.cpp
2019-07-27 15:08:13 -07:00
Evan Tschannen
02de53160d
only skip confirm epoch live if CAUSAL_READ_RISKY is enabled
...
time checked on the proxy should be less than the time waited by the master to account for clock speed differences
setting REQUIRED_MIN_RECOVERY_DURATION and ENFORCED_MIN_RECOVERY_DURATION to 0 will go back to the old behavior
2019-07-12 17:58:16 -07:00
Evan Tschannen
a63969afb3
enforce a minimum recovery duration, which allows proxies to avoid checking if the epoch is alive as long as its last commit has been less than MINIMUM_RECOVERY_DURATION ago
2019-07-12 13:10:21 -07:00
Evan Tschannen
d8948c8be1
Merge branch 'master' into feature-fast-txs-recovery
...
# Conflicts:
# fdbserver/TagPartitionedLogSystem.actor.cpp
2019-07-10 13:59:52 -07:00
Evan Tschannen
c348b3da51
After a proxy dies, it will remain alive for an additional 10 seconds to forward clients to the new proxies
2019-07-08 12:53:40 -07:00
Evan Tschannen
15e894c724
Merge in master
2019-07-05 15:49:24 -07:00
Alex Miller
ea6898144d
Merge remote-tracking branch 'upstream/master' into flowlock-api
2019-07-03 20:44:15 -07:00
Jingyu Zhou
b69d7adabc
Remove unused remoteRecovered from master server
2019-07-01 15:41:35 -07:00
Evan Tschannen
52efcfd136
fix: properly create the right number for txsTags when changing between different numbers of logs
2019-06-27 15:15:05 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen
e0be631414
shard the txs tag so that more transaction logs are involved in its recovery
2019-06-19 18:15:09 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Jingyu Zhou
8b5449e608
Fix review comments for PR #1473
2019-04-29 16:45:42 -07:00
Jingyu Zhou
966ec30fcc
Add pseudoLocalities for special tag consumers
2019-04-21 10:41:07 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
Evan Tschannen
f5de52de91
fix: cancel the previous log system recruitment before calling newEpoch, to avoid multiple actors attempting to modify oldLogSystem at the same time
2019-04-01 16:38:25 -07:00
Evan Tschannen
b6008558d3
renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
...
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen
6254a1a8e4
fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop
2019-03-22 18:37:39 -07:00
Evan Tschannen
2605257737
Merge branch 'master' of github.com:apple/foundationdb
2019-03-19 18:47:29 -07:00
Evan Tschannen
5b9c45ea0b
clients do not attempt to connect to provisional proxies
2019-03-19 13:37:50 -07:00
Balachandar Namasivayam
5471725db5
Support config where the primary and remote DC's can be used as satellites.
2019-03-18 12:17:59 -07:00
Evan Tschannen
a7e45cff91
Merge pull request #1176 from jzhou77/ratekeeper
...
Make Ratekeeper a separate role
2019-03-12 15:58:59 -07:00
Evan Tschannen
2627bcd35e
Merge branch 'master' into feature-metadata-version
2019-03-10 21:13:28 -07:00
Jingyu Zhou
3c86643822
Separate Ratekeeper from data distribution.
...
Add a new role for ratekeeper.
Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Alex Miller
c6a65389ae
Remove noexcept macro and replace with BOOST_NOEXCEPT.
...
BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a
way that is correctly maintained over time.
2019-03-05 22:06:12 -08:00
anoyes
981426bac9
More ide fixes
2019-03-05 18:03:57 -08:00
Evan Tschannen
3da85f3acd
implemented the \xff/metadataVersion key, which can be used by layers to help them cheaply cache metadata and know when their cache is invalid
2019-02-28 17:45:00 -08:00
Evan Tschannen
b8910ba7cd
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.h
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen
0e19b5a935
fix: allow the txnStateStore to be recovered from a process in a down datacenter, so that the cluster controller can know to switch to the other region
2019-02-21 16:52:27 -08:00
Evan Tschannen
3a572b010f
fix: a forced recovery needed to force the data distributor to restart
2019-02-19 16:04:52 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen
8ed89fd711
fixed review comments
2019-02-19 11:26:53 -08:00
Evan Tschannen
065a45e05f
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen
ccaa860ffc
fix: all storage servers must reboot during a forced recovery, because their rejoin commit might have been lost
2019-02-18 15:27:18 -08:00
Evan Tschannen
9cfadad41b
fix: if the tagPartitionedLogSystem cannot do a forced recovery, the master should not execute it forced recovery based modifications either
2019-02-18 15:13:18 -08:00
Evan Tschannen
8f2af8bed1
fix: forced recoveries now require a target dcid which will become the new primary location. During the forced recovery, the configuration will be changed to make that location primary, and usable_regions will be set to 1. If the target dcid is already the primary location, the forced recovery will do nothing. This makes forced recoveries idempotent, so it is safe to the client to re-send forced recovery commands to the cluster controller.
...
fix: the cluster controller attempts to do a commit to determine if the cluster is alive, since its own internal recoveryState might not be up-to-date.
fix: forceMasterFailure on the cluster controller did not always cause the current master to be re-recruited
2019-02-18 14:54:28 -08:00
Evan Tschannen
4c35ebdcc6
fix: because of forced recoveries, storage servers in remote regions cannot update their durable version to (lastLogVersion - 5e6), because the lastLogVersion might have jumped due to an epoch end and the recovery version after the forced recovery could be before the epoch end, causing the storage server to want to rollback to a version it does not have on disk
2019-02-18 14:40:30 -08:00
Evan Tschannen
05ca0a10d8
fix: kill all storage servers which are not in the safe locality after a forced recovery
2019-02-18 14:30:51 -08:00
Jingyu Zhou
6a655143e8
A follow-on fix for config key usage
...
And some trace event cleanups.
2019-02-14 16:37:16 -08:00
Jingyu Zhou
aea602d9c7
Remove getRecoveryInfo from master interface.
2019-02-14 16:37:16 -08:00
Jingyu Zhou
886e7ab2ba
Add a new DataDistributor role.
...
Let cluster controller to start a new data distributor role by sending a
message to a chosen worker.
Change MasterInterface usage in DataDistribution to masterId
Add DataDistributor rejoin handling.
This allows the data distributor to tell the new cluster controller of its
existence so that the controller doesn't spawn a new one. I.e., there should
be only ONE data distributor in the cluster.
If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries
to recruit one as DD. CC also monitors DD and restarts one if it failed.
The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for
the new DD.
Add GetRecoveryInfo RPC to master server, which is called by data distributor
to obtain the recovery Transaction version from the master server.
2019-02-14 16:30:13 -08:00
Evan Tschannen
e45952bc53
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/BackupContainer.actor.cpp
# fdbclient/BlobStore.actor.cpp
# fdbclient/HTTP.actor.cpp
# tests/BlobStore.txt
# versions.target
2018-11-13 16:06:39 -08:00