Commit Graph

511 Commits

Author SHA1 Message Date
Evan Tschannen 7b64e711c7 fix: deferredCleanup did not take into account deleting multiple roots simultaneously 2018-07-12 15:29:02 -07:00
Evan Tschannen 507b3bacb0 fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
added more recovery states.
2018-07-05 00:08:51 -07:00
Evan Tschannen 66a6fbb219 Merge branch 'master' into feature-remote-logs 2018-07-04 01:59:30 -04:00
Evan Tschannen 866ccfe344 added the ability to allow the master to finish recovery before all storage servers in both regions have their mutations. This allows you to recover from scenarios where you lose all your tlogs in one dc. 2018-07-04 01:59:04 -04:00
Alvin Moore 9ea0f0a5ae Fixed problem with stack initialization of TLS Options class 2018-07-03 15:02:53 -07:00
Evan Tschannen e67f951c06 Merge branch 'master' into feature-remote-logs 2018-07-02 02:18:20 -04:00
Alvin Moore c3f88dbfe1 Merge branch 'master' of github.com:apple/foundationdb into tls-static 2018-07-01 23:13:57 -07:00
Evan Tschannen 7a12d3e130 added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive. 2018-07-01 09:39:04 -04:00
Evan Tschannen a288d5b9a9 added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down 2018-06-28 23:15:32 -07:00
Evan Tschannen a66eda8baa added the three_datacenter_fallback redundancy mode, which allows you to drop a down datacenter when configured in three_datacenter mode 2018-06-27 23:24:33 -07:00
Evan Tschannen 58c2f67ff6 checking outstanding requests can be CPU intensive, so rate limit checking requests 2018-06-27 23:02:08 -07:00
Evan Tschannen dd72379363 reduced the failure detection times 2018-06-27 20:41:18 -07:00
Alvin Moore ef8de426d3 Changed the TLS_DISABLED macro
Disable TLS within Windows until working
2018-06-26 12:08:32 -07:00
Evan Tschannen 0123627d67 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-06-22 10:43:07 -07:00
Evan Tschannen 8a8914f046 re-added the ability to configure the number of log routers. Many log routers are needed to get a sufficient number of sockets involved in copying data across the WAN 2018-06-22 00:04:00 -07:00
Evan Tschannen 1dce97f28c Merge branch 'release-5.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/SimulatedCluster.actor.cpp
#	packaging/msi/FDBInstaller.wxs
#	versions.target
2018-06-21 17:05:11 -07:00
Stephen Atherton d9f3eb05a2 Change default delete operations per second. Updated release notes. 2018-06-21 11:13:31 -07:00
Stephen Atherton e9e1e194f0 Added operation-specific rate controls to blob store interface. 2018-06-20 20:34:34 -07:00
Evan Tschannen c6a2207577 fix: stop client sampling during the consistency check 2018-06-20 16:24:37 -07:00
Alvin Moore f8ce1de601 Added support for compiling TLS into binaries 2018-06-20 09:21:23 -07:00
Evan Tschannen 1ccfb3a0f4 fix: log_anti_quorum was always 0 in simulation
removed durableStorageQuorum, because it is no longer a useful configuration parameter
2018-06-18 10:24:57 -07:00
Evan Tschannen e8c462882b re-added remote_logs as a parameter, because it could be useful to have a different number of logs between when recruited as primary and remote 2018-06-18 10:22:34 -07:00
Evan Tschannen 0913368651 added usable_regions to specify if we will replicate into a remote region
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Evan Tschannen 0d87186821 use a specific locality for satellites 2018-06-15 11:06:38 -07:00
Evan Tschannen 284233baa1 added a key in the database with the locality of the current master 2018-06-14 19:36:02 -07:00
Evan Tschannen 0c6825eb43 allow multiple regions with the same priority
configurations must have at least one region with non-negative priority
2018-06-14 12:59:55 -07:00
Evan Tschannen 26b7dd32da fix: cluster controller did not respect usable dcs 2018-06-14 12:56:48 -07:00
Evan Tschannen 0059690502
Merge pull request #483 from alexmiller-apple/multidc_dcvector
MultiDC: Regions JSON should represent datacenters as an array.
2018-06-13 18:33:04 -07:00
Alex Miller 0042294566 MultiDC: Serialized JSON represents DC and Satellites in same array.
With a `"satellite": 1` property that differentiates them.  "satellite" will
likely also be deprecated in the future, but this is closer to what the final
serialized form will look like.
2018-06-13 17:55:55 -07:00
Richard Low 39894ea798 Merge remote-tracking branch 'apple/release-5.2' 2018-06-12 18:31:20 -07:00
Alex Miller ac1fc3660d MultiDC: Regions JSON should represent datacenters as an array.
This is so that the serialized format released in 6.0 will already include a
required change, which is to support >1 DC per region.
2018-06-12 16:18:54 -07:00
Steve Atherton 75de22bb08
Merge pull request #482 from satherton/release-5.2
Reduce default backup parallel tasks to decrease memory usage.
2018-06-12 13:20:30 -07:00
Steve Atherton 731c3b38e8
Merge pull request #481 from satherton/release-5.2
Reduce backup parallel tasks to decrease memory usage.
2018-06-12 13:16:24 -07:00
Steve Atherton 0481928bd8
Reduce backup parallel tasks to decrease memory usage. 2018-06-12 13:15:24 -07:00
Balachandar Namasivayam 819929e1be Address review comments. 2018-06-12 11:59:47 -07:00
Balachandar Namasivayam 7db928ccec Cluster file and its parent directory needs to be writable for operation of fdb cluster.
Document this requirement and also add relevant details to status output.
2018-06-11 16:47:24 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen 6e48d93d39 backed out the healthy team check because it was unnecessary 2018-06-10 12:43:32 -07:00
A.J. Beamon 1fdfe20908 Relax the rules on trace event Types a bit by allowing multiple underscores, as well as starting with an underscore and consecutive underscores. 2018-06-08 15:40:29 -07:00
A.J. Beamon 99c9958db7 Some more trace event normalization 2018-06-08 13:57:00 -07:00
A.J. Beamon c12b235080 Fix case in a few commented out trace events 2018-06-08 11:20:06 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen d7d38c3544
Merge pull request #430 from ajbeamon/rename-logGroup-attribute
Rename trace file logGroup attribute to LogGroup
2018-06-08 10:30:45 -07:00
Evan Tschannen b423d73b42 fix: do not finish a shard relocation until all of the storage servers have made the current recovery version durable. This is to prevent dropping a needed storage server as a source for a shard after dropping a remote configuration 2018-06-07 12:29:25 -07:00
A.J. Beamon 216404de45 Merge branch 'release-5.2' of github.com:apple/foundationdb
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2018-06-06 15:25:37 -07:00
Evan Tschannen e82985aea2 fix: continue setting beginVersion so that versions between 5.2.0 and 5.2.2 do not crash when decoding tasks created by 5.2.3 2018-06-06 13:34:22 -07:00
Evan Tschannen 4120062bb9 fix: backup initialized its begin version at 1 instead of the read version of the starting transaction
fix: erasing log ranges did not properly divide up work between transactions to prevent making transactions which were too large
2018-06-06 13:05:53 -07:00
A.J. Beamon e4e06321c7 fix: Read-only transactions that get committed would fail if the readOnly option is set. They would also be counted in the transactionsCommitStarted metric. 2018-06-05 12:10:28 -07:00
Evan Tschannen be06938d9d fix: dropping the remote replication will cause all remote storage servers to die. Make sure we are not restoring redundancy before doing this to prevent data loss in simulation. 2018-06-04 18:46:09 -07:00
Evan Tschannen ce6a2f0563
Merge pull request #425 from bnamasivayam/leader-election-optimize
Optimize client and server connection times to cluster controller, es…
2018-06-01 18:35:27 -07:00