Commit Graph

2241 Commits

Author SHA1 Message Date
Evan Tschannen 7a12d3e130 added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive. 2018-07-01 09:39:04 -04:00
Evan Tschannen b42e0541eb
Merge pull request #545 from etschannen/feature-remote-logs
Fixed a few problems with the consistency check
2018-06-30 10:40:55 -07:00
Evan Tschannen 4a3247da69 fixed a few problems with the consistency check 2018-06-30 10:39:28 -07:00
Evan Tschannen 1f02bdee0a do not buggify future version delay, because remote storage servers will be delayed getting data so they need additional time 2018-06-29 11:29:22 -07:00
Balachandar Namasivayam 899f8d8f4d
Merge pull request #544 from etschannen/feature-remote-logs
Reduce the number of cluster controller changes during a DC failover
2018-06-29 10:47:04 -07:00
Evan Tschannen 7e68bee692 update better machine classes first to give them a higher chance of becoming the next cluster controller 2018-06-29 01:11:59 -07:00
Evan Tschannen e9ac8a1039 when the cluster controller is changing itself to a better dc fitness, it should notify itself first so another process does not take over 2018-06-29 00:10:29 -07:00
Evan Tschannen 899f880ce0 fix: log router class did not have the proper fitness for becoming the cluster controller 2018-06-28 23:20:01 -07:00
Evan Tschannen 02f616eb68 fix: consistency check was broken when the key server key space is sharded 2018-06-28 23:16:32 -07:00
Evan Tschannen a288d5b9a9 added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down 2018-06-28 23:15:32 -07:00
Steve Atherton ddf1d15009
Merge pull request #543 from ajbeamon/fix-missing-trace-event-fields
The Machine field was missing in early trace events.
2018-06-28 16:02:11 -07:00
A.J. Beamon a680837ee4 The Machine field was missing in early trace events. The logGroup field was not being properly set. 2018-06-28 15:28:58 -07:00
A.J. Beamon 890b18505d
Merge pull request #542 from ajbeamon/master
Add missing include for Windows, remove throw from TraceEvent destructor.
2018-06-28 15:00:22 -07:00
A.J. Beamon 1ff42e078f Add missing include for Windows, remove throw from TraceEvent destructor. 2018-06-28 14:59:23 -07:00
Balachandar Namasivayam 8caa6eaecf
Merge pull request #541 from etschannen/feature-remote-logs
More multiple DC improvements
2018-06-28 11:22:08 -07:00
Evan Tschannen 45cf0067e4 fix: consistency check was not checking for data inconsistencies 2018-06-28 11:08:16 -07:00
A.J. Beamon 65e03555bc
Merge pull request #540 from ajbeamon/master
Add include statement for std::function to try to make Windows build happy.
2018-06-28 10:36:43 -07:00
A.J. Beamon 09624aeec9 Add include statement for std::function to try to make Windows build happy 2018-06-28 10:22:33 -07:00
Evan Tschannen a66eda8baa added the three_datacenter_fallback redundancy mode, which allows you to drop a down datacenter when configured in three_datacenter mode 2018-06-27 23:24:33 -07:00
Evan Tschannen 58c2f67ff6 checking outstanding requests can be CPU intensive, so rate limit checking requests 2018-06-27 23:02:08 -07:00
Evan Tschannen fb0d10635d the first location in a satellite team is the one that will serve peek requests. Make sure we probably balance peek traffic by having the first servers on each team be used an equal amount of times 2018-06-27 22:14:50 -07:00
Evan Tschannen a5b4698bc8 do not wait for good recruitment delay if the cluster controller is in the second best region 2018-06-27 21:05:55 -07:00
Evan Tschannen dd72379363 reduced the failure detection times 2018-06-27 20:41:18 -07:00
Evan Tschannen c74e43f2d0 fix: during upgrades, a storage server which does not have data for a shard could be in the source servers, so as a fallback if a fetch keys fails long enough disable locality based load balancing to allow the storage server to peek from someone different than itself 2018-06-27 20:35:51 -07:00
Steve Atherton 2203ba6c8f
Merge pull request #539 from ajbeamon/backstop-trace-event-throttle-in-constructor
Move the spammy trace event backstop from the destructor to the const…
2018-06-27 16:30:02 -07:00
A.J. Beamon ea8a288a20
Merge pull request #537 from apple/release-5.2
Merge Release-5.2 into master
2018-06-27 15:55:58 -07:00
A.J. Beamon cbc840ad0a Move the spammy trace event backstop from the destructor to the constructor. This allows us to avoid doing needless work on a trace event that is going to be throttled. 2018-06-27 15:51:30 -07:00
Alec Grieser ac9de812f4
Merge pull request #406 from ajbeamon/directory-tester-cleanup
Directory tester cleanup
2018-06-27 15:48:15 -07:00
Alex Miller 23b691b9c8
Merge pull request #536 from brownleej/ruby-doc-fixes-52
Add a documentation plugin for the formatting in our Ruby docs.
2018-06-27 15:40:08 -07:00
John Brownlee 9a51dec64e Add a documentation plugin for the formatting in our Ruby docs. 2018-06-27 14:58:34 -07:00
Steve Atherton cbcf5177eb
Merge pull request #429 from ajbeamon/trace-log-refactor
Trace log refactor
2018-06-27 14:52:09 -07:00
Alex Miller f6c6d79056
Merge pull request #534 from etschannen/feature-remote-logs
Durable known committed version was incorrect
2018-06-27 14:24:09 -07:00
A.J. Beamon d8ca7a766c Change tree node state to have references to parent nodes and update merge logic accordinly. 2018-06-27 14:12:51 -07:00
Evan Tschannen 2987f85177 fix: known committed version must be updated before creating the tlogQueueEntryRef 2018-06-26 23:21:30 -07:00
Evan Tschannen 00167b0157 renamed some uses of knownCommittedVersion to durableKnownCommittedVersion
epochEnd exclusively refers to the last version a set of logs is responsible for serving peek requests for
recoverAt and recoveredAt refer to the last committed version of the previous generation
2018-06-26 18:20:28 -07:00
Evan Tschannen 6e19622872 fix: durableKnownCommittedVersion was incorrect because we could update knownCommittedVersion before pushing a TLogQueueEntry with that known committed version 2018-06-26 18:02:55 -07:00
A.J. Beamon 1f0561a9c0 Missed a couple requested changes 2018-06-26 15:22:39 -07:00
A.J. Beamon a7158f96aa Address some review comments 2018-06-26 15:06:15 -07:00
A.J. Beamon 2ed452353f Merge branch 'release-5.2' into directory-tester-cleanup 2018-06-26 14:56:09 -07:00
A.J. Beamon fec225075f Merge branch 'master' into trace-log-refactor 2018-06-26 14:54:42 -07:00
A.J. Beamon fe956bc35a Address review comments 2018-06-26 14:37:21 -07:00
A.J. Beamon 9f545ce002 Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor 2018-06-26 11:37:23 -07:00
Alex Miller 4f6054a3e6
Merge pull request #530 from etschannen/feature-remote-logs
Bug fixes
2018-06-25 18:41:15 -07:00
Evan Tschannen c6313a79e3 fix: the cluster controller needs to continue to retry recruitment until after wait_for_good_remote_recruitment_delay 2018-06-25 18:20:16 -07:00
Evan Tschannen 1a8dac365d fix: poppedAllAfter was not set to a large enough version 2018-06-25 15:57:11 -07:00
Balachandar Namasivayam 5c9ef7763a
Merge pull request #528 from etschannen/feature-remote-logs
Fixed a correctness issue with parallel get more
2018-06-25 11:20:25 -07:00
Evan Tschannen 2ec8744ab3 fix: parallel get more needs to verify the begin version matches the end of the previous request, because when a peek cursor expires we lose all history, so the same sequence number could start at different versions 2018-06-25 11:15:49 -07:00
A.J. Beamon 203fd93fcc
Merge pull request #480 from fannix/master
Fix a concurrency bug in Java queue example
2018-06-25 08:22:58 -07:00
xmeng 1bd1d9562a Fix indentation 2018-06-24 21:54:23 +01:00
Balachandar Namasivayam d12c43b7ec
Merge pull request #527 from etschannen/feature-remote-logs
fix: wrong desired count used when checking good remote fitness
2018-06-22 12:42:08 -07:00