Commit Graph

197 Commits

Author SHA1 Message Date
mpilman 3cb2391b58 use proper fwd declarations in ManagementAPI
Also ManagementAPI.h -> ManagementAPI.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen ed9e20ce17 forgot to fix merge conflicts 2019-02-18 17:09:55 -08:00
Evan Tschannen 065a45e05f Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen 8f2af8bed1 fix: forced recoveries now require a target dcid which will become the new primary location. During the forced recovery, the configuration will be changed to make that location primary, and usable_regions will be set to 1. If the target dcid is already the primary location, the forced recovery will do nothing. This makes forced recoveries idempotent, so it is safe to the client to re-send forced recovery commands to the cluster controller.
fix: the cluster controller attempts to do a commit to determine if the cluster is alive, since its own internal recoveryState might not be up-to-date.

fix: forceMasterFailure on the cluster controller did not always cause the current master to be re-recruited
2019-02-18 14:54:28 -08:00
Evan Tschannen 83060c6e56
Merge pull request #1062 from jzhou77/PR
Add a new DataDistributor role.
2019-02-15 13:51:27 -08:00
Jingyu Zhou 6a655143e8 A follow-on fix for config key usage
And some trace event cleanups.
2019-02-14 16:37:16 -08:00
Jingyu Zhou be5c962bb7 Add a new configuration version key \xff/conf/version
This fixed a bug found by upgrade test, where the configuration monitor of the
data distributor was monitoring excludedServersVersionKey, which doesn't
change in ChangeConfig workload. As a result, data distributor was not aware of
configuration changes.

Adding this new key and make sure this key is updated in configuration changes
so that the monitor can detect configuration changes.
2019-02-14 16:37:16 -08:00
Alex Miller 21ce26a681 Merge release-6.0 into master 2019-02-14 14:24:35 -08:00
A.J. Beamon b435d51061 Merge branch 'master' into track-server-request-latencies 2019-02-14 08:07:32 -08:00
Andrew Noyes 067a445e06 Replace unused _ variables with wait(success(...)) 2019-02-12 17:30:30 -08:00
Markus Pilman c4fa69c998 Remove duplicate comment 2019-02-06 09:38:08 -08:00
mpilman b11d2ce1ec Fix bug that in `include` command
This bug results in `include` to include all
machines that have a prefix of a given machine.

This is a weird bug that has to do with ASCII
ordering. Instead this command executes two
clears: one for the IP address and one for
all IP:PORT pairs
2019-02-06 09:38:08 -08:00
Meng Xu 550f2e2682 Merge with master to use the latest backup container
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
Evan Tschannen 6d7d40bcb3
Update fdbclient/ManagementAPI.actor.cpp 2019-01-29 11:37:08 -08:00
A.J. Beamon a49aea29d1
Update fdbclient/ManagementAPI.actor.cpp
Co-Authored-By: etschannen <36455792+etschannen@users.noreply.github.com>
2019-01-29 11:34:50 -08:00
Trevor Clinkenbeard da1a8a9919 Prioritize processes with class coordinator when "coordinators auto" is run 2019-01-25 11:33:10 -08:00
A.J. Beamon 2198d24ce1 Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
# Conflicts:
#	fdbclient/MasterProxyInterface.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/ServerDBInfo.h
#	fdbserver/Status.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
A.J. Beamon 8e05e95045 Added the ability to configure the latency band settings by setting a special key in \xff keyspace. 2019-01-18 16:18:34 -08:00
Evan Tschannen 699f8dd617 fix: coordinators auto could put two coordinators in the same zone
simulation now tests two machines in the same zone
2019-01-18 15:42:48 -08:00
Meng Xu 80b2f75187 debug why restore did not restore the complete data 2018-12-03 19:29:17 -08:00
Evan Tschannen d2d68aa171 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/ManagementAPI.actor.cpp
#	versions.target
2018-12-03 18:26:52 -08:00
Meng Xu 1b085a9817 sequantial restore: pass 1 test case
-r simulation --logsize 1024MiB -f foundationdb/tests/fast/ParallelRestoreCorrectness.txt -b off -s 95208406
2018-12-03 10:57:30 -08:00
Evan Tschannen 6e94d7d71a changing the database configuration now checks that enough processes exist to support the new configuration 2018-11-30 18:52:24 -08:00
Evan Tschannen 4e54690005 Merge branch 'release-6.0'
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MoveKeys.actor.cpp
2018-11-12 20:26:58 -08:00
Evan Tschannen ccdc83036d simplified region configuration protections 2018-11-12 17:45:20 -08:00
Evan Tschannen 6353a6724b strengthened the protections related to changing regions 2018-11-12 17:40:40 -08:00
Evan Tschannen 4b5d0b4e2c Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/AsyncFileBlobStore.actor.cpp
#	fdbclient/AsyncFileBlobStore.actor.h
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/BlobStore.h
#	fdbclient/HTTP.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/batcher.actor.h
#	fdbrpc/fdbrpc.vcxproj
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen a654183f63
Merge pull request #791 from ajbeamon/remove-cluster-from-iclientapi
Remove cluster from IClientApi (phase 2 of removing DB names)
2018-11-10 10:16:18 -08:00
Evan Tschannen c02690471d added protection against configuration changes which cannot be immediately reverted
the configure database workload tests region configurations
2018-11-04 19:53:55 -08:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Stephen Atherton 22f8a4efa9 Normalized all unit test names to begin with "/" if they should be included in random unit testing. 2018-10-05 22:09:58 -07:00
Stephen Atherton 3ea9193fa7 Renamed redwood to redwood-experimental. UnitTest names can now be hidden using # as the first character so that random correctness tests will not run them. Excluded redwood tests from correctness testing. Reverted default storage engine to ssd. 2018-10-05 14:43:54 -07:00
Stephen Atherton 7c1dc305cb Merge commit 'a72c8f5cb2e79a673abc0ed3d27ef1c51028fb13' into feature-redwood 2018-10-05 10:15:10 -07:00
Evan Tschannen 3922e477a5 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/LogSystemDiskQueueAdapter.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
Evan Tschannen dd9c6856e1 fix: forced recovery is not safe to send multiple times 2018-10-02 17:29:39 -07:00
A.J. Beamon c831051474 This removes the idea of clusters from IClientApi. 2018-09-21 15:58:14 -07:00
Stephen Atherton 2fc86c5ff3 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbrpc/AsyncFileCached.actor.h
#	fdbserver/IKeyValueStore.h
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-09-20 03:39:55 -07:00
Evan Tschannen 200e65fe61 added a workload which tests killing an entire region, and recovering from the failure with data loss.
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Alec Grieser 10a8e67266
Merge remote-tracking branch 'upstream/release-6.0' into merge-release-6.0 2018-09-11 21:49:59 -07:00
Evan Tschannen d3c8d7ab4e fix: status would generate invalid json 2018-09-07 18:26:05 -07:00
Evan Tschannen 90301f497f Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/TLSConnection.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	versions.target
2018-09-05 16:06:33 -07:00
Evan Tschannen 40f5dbe423 fixed issues from review, added a safeguard to prevent configuring a cluster to an invalid configuration 2018-09-04 22:16:35 -07:00
Evan Tschannen d8ea3dbf9a Added the ability to configure a cluster from a JSON file 2018-08-16 17:34:59 -07:00
Alex Miller fb31a6999f Rewrite all files to have #include actorcompiler.h as the last include. 2018-08-14 15:50:26 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Stephen Atherton 9d85a05372 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-05 12:52:06 -07:00
Stephen Atherton 2925b9b984 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-07-03 23:03:56 -07:00
Evan Tschannen 866ccfe344 added the ability to allow the master to finish recovery before all storage servers in both regions have their mutations. This allows you to recover from scenarios where you lose all your tlogs in one dc. 2018-07-04 01:59:04 -04:00
Evan Tschannen 7a12d3e130 added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive. 2018-07-01 09:39:04 -04:00
Stephen Atherton b95a2bd6c1 Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
# Conflicts:
#	flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen a66eda8baa added the three_datacenter_fallback redundancy mode, which allows you to drop a down datacenter when configured in three_datacenter mode 2018-06-27 23:24:33 -07:00
Evan Tschannen 8a8914f046 re-added the ability to configure the number of log routers. Many log routers are needed to get a sufficient number of sockets involved in copying data across the WAN 2018-06-22 00:04:00 -07:00
Stephen Atherton e5c48d453a Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-18 22:45:27 -07:00
Evan Tschannen 1ccfb3a0f4 fix: log_anti_quorum was always 0 in simulation
removed durableStorageQuorum, because it is no longer a useful configuration parameter
2018-06-18 10:24:57 -07:00
Evan Tschannen e8c462882b re-added remote_logs as a parameter, because it could be useful to have a different number of logs between when recruited as primary and remote 2018-06-18 10:22:34 -07:00
Evan Tschannen 0913368651 added usable_regions to specify if we will replicate into a remote region
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Stephen Atherton 2878f30f29 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/IKeyValueStore.h
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen 6e48d93d39 backed out the healthy team check because it was unnecessary 2018-06-10 12:43:32 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen be06938d9d fix: dropping the remote replication will cause all remote storage servers to die. Make sure we are not restoring redundancy before doing this to prevent data loss in simulation. 2018-06-04 18:46:09 -07:00
Evan Tschannen 91338fc984 Merge branch 'master' into feature-remote-logs 2018-05-10 15:33:45 -07:00
Evan Tschannen 8f984cb2c9 Merge branch 'release-5.2'
# Conflicts:
#	fdbrpc/TLSConnection.h
2018-05-10 09:13:22 -07:00
Alec Grieser 464e2cdbf0
change SetVersionstampedKey and SetVersionstampedValue behavior based on API version to make them consistent 2018-05-08 08:57:09 -07:00
Evan Tschannen 10d25927cd Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
2018-04-30 22:15:39 -07:00
Stephen Atherton af61d3596d Merge branch 'public-master' into feature-redwood
# Conflicts:
#	fdbserver/DatabaseConfiguration.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
2018-04-24 17:22:21 -07:00
Evan Tschannen 19762b847d Merge branch 'release-5.2'
# Conflicts:
#	fdbserver/DatabaseConfiguration.cpp
#	fdbserver/SimulatedCluster.actor.cpp
2018-04-10 17:02:43 -07:00
Evan Tschannen 528caeb196 continue supporting multi_dc as a hidden option 2018-04-10 15:59:15 -07:00
Evan Tschannen 7af892f50b first working version of non-copying recovery working with fearless configurations 2018-04-08 21:24:05 -07:00
Stephen Atherton 2752a28611 Merge branch 'release-5.2' of github.com:apple/foundationdb into feature-redwood 2018-04-06 16:29:37 -07:00
Evan Tschannen b36e08f08f first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet 2018-03-29 15:12:38 -07:00
Evan Tschannen 82ed956c65 renamed the multi_dc configuration to three_datacenter. The old three_datacenter configuration was not a useful configuration. 2018-03-26 18:31:26 -07:00
A.J. Beamon f2c804e14f Reverting changes from merge of master into release-5.2 (b25810711c). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit. 2018-03-06 10:15:04 -08:00
Evan Tschannen 1194e3a361 added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed. 2018-03-05 19:27:46 -08:00
Evan Tschannen 470f5c01f3 changed remoteDcId to a vector of ids, to support future configurations where there are multiple remote databases 2018-02-26 17:09:09 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen 31b89a638f added satellite_none and remote_none options to unconfigure from a fearless setup
fix: log_router configuration was broken
2018-02-17 13:51:17 -08:00
Evan Tschannen cb25564d38 simulated cluster supports fearless configurations
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00
Stephen Atherton 0a35f167e4 Merge branch 'master' into feature-redwood
# Conflicts:
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/IDiskQueue.h
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/worker.actor.cpp
2018-02-12 01:30:02 -08:00
Evan Tschannen ebd94bb654 removed a separately configurable storage team size for the remote data center, because it did not make sense
fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
2018-02-02 11:46:04 -08:00
Evan Tschannen 15962cf079 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbrpc/Locality.cpp
#	fdbrpc/Locality.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/ClusterRecruitmentInterface.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/masterserver.actor.cpp
#	fdbserver/worker.actor.cpp
#	flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Evan Tschannen ef41b07bb3 renamed past_version to transaction_too_old
implemented read_lock_aware option
2017-09-28 16:35:08 -07:00
Evan Tschannen 73fca75239 added the ability to disable timeKeeper; disabled timeKeeper before consistency check in simulation 2017-09-28 13:13:24 -07:00
Stephen Atherton 248dab79b6 Created “redwood” storage engine option and many changes to support that including IKeyValueStore::init() and custom DiskQueue file extensions. 2017-09-21 23:51:55 -07:00
Stephen Atherton d880569d52 Checkpointing progress on KeyValueStoreMVBTree. All methods are implemented to a usable point, and everything compiles, but Worker does not yet try to use it. 2017-09-21 04:43:49 -07:00
Evan Tschannen 76e7988663 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.h
#	flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Evan Tschannen ea26bc1c43 passed first tests which kill entire datacenters
added configuration options for the remote data center and satellite data centers
updated cluster controller recruitment logic
refactors how master writes core state
updated log recovery, and log system peeking
2017-09-07 15:32:08 -07:00
Evan Tschannen 6e26ae2bb3 added a new multi_dc configuration 2017-09-01 15:45:27 -07:00
Yichi Chiang 6a8a5c41b0 Add a switch to turn off data distribution in CLI 2017-07-28 18:14:55 -07:00
Evan Tschannen 57ba9d36af fixed a large number of bugs 2017-07-13 12:29:21 -07:00
Alvin Moore 31d562ff7b Merge branch 'release-5.0'
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/DatabaseConfiguration.cpp
#	versions.target
2017-06-27 11:16:08 -07:00
Evan Tschannen 9fd5955e92 Merge branch 'master' into removing-old-dc-code 2017-06-26 16:27:10 -07:00
Evan Tschannen 15cb498aa7 removed fast_recovery_double and fast_recovery_triple from the fdbcli 2017-06-23 16:18:23 -07:00
Alvin Moore 9553458b78 Updated simulation to support managing exclusion and inclusion address
Added method for identifying acceptable availability process classes
Extended cluster availability function to ensure coordinators can be auto configured
Fixed availability function to allow protected processes to be considered as dead if not available
Added debug trace events for providing machine state when considering availability
Added trace event for protected coordinators
2017-06-19 16:48:15 -07:00
Evan Tschannen 766dc23e26 fix: do not use TLS in protectedAddresses 2017-06-02 13:52:21 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00