Evan Tschannen
7a12d3e130
added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive.
2018-07-01 09:39:04 -04:00
Evan Tschannen
b42e0541eb
Merge pull request #545 from etschannen/feature-remote-logs
...
Fixed a few problems with the consistency check
2018-06-30 10:40:55 -07:00
Evan Tschannen
4a3247da69
fixed a few problems with the consistency check
2018-06-30 10:39:28 -07:00
Evan Tschannen
1f02bdee0a
do not buggify future version delay, because remote storage servers will be delayed getting data so they need additional time
2018-06-29 11:29:22 -07:00
Balachandar Namasivayam
899f8d8f4d
Merge pull request #544 from etschannen/feature-remote-logs
...
Reduce the number of cluster controller changes during a DC failover
2018-06-29 10:47:04 -07:00
Evan Tschannen
7e68bee692
update better machine classes first to give them a higher chance of becoming the next cluster controller
2018-06-29 01:11:59 -07:00
Evan Tschannen
e9ac8a1039
when the cluster controller is changing itself to a better dc fitness, it should notify itself first so another process does not take over
2018-06-29 00:10:29 -07:00
Evan Tschannen
899f880ce0
fix: log router class did not have the proper fitness for becoming the cluster controller
2018-06-28 23:20:01 -07:00
Evan Tschannen
02f616eb68
fix: consistency check was broken when the key server key space is sharded
2018-06-28 23:16:32 -07:00
Evan Tschannen
a288d5b9a9
added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down
2018-06-28 23:15:32 -07:00
Steve Atherton
ddf1d15009
Merge pull request #543 from ajbeamon/fix-missing-trace-event-fields
...
The Machine field was missing in early trace events.
2018-06-28 16:02:11 -07:00
A.J. Beamon
a680837ee4
The Machine field was missing in early trace events. The logGroup field was not being properly set.
2018-06-28 15:28:58 -07:00
A.J. Beamon
890b18505d
Merge pull request #542 from ajbeamon/master
...
Add missing include for Windows, remove throw from TraceEvent destructor.
2018-06-28 15:00:22 -07:00
A.J. Beamon
1ff42e078f
Add missing include for Windows, remove throw from TraceEvent destructor.
2018-06-28 14:59:23 -07:00
Balachandar Namasivayam
8caa6eaecf
Merge pull request #541 from etschannen/feature-remote-logs
...
More multiple DC improvements
2018-06-28 11:22:08 -07:00
Evan Tschannen
45cf0067e4
fix: consistency check was not checking for data inconsistencies
2018-06-28 11:08:16 -07:00
A.J. Beamon
65e03555bc
Merge pull request #540 from ajbeamon/master
...
Add include statement for std::function to try to make Windows build happy.
2018-06-28 10:36:43 -07:00
A.J. Beamon
09624aeec9
Add include statement for std::function to try to make Windows build happy
2018-06-28 10:22:33 -07:00
Evan Tschannen
a66eda8baa
added the three_datacenter_fallback redundancy mode, which allows you to drop a down datacenter when configured in three_datacenter mode
2018-06-27 23:24:33 -07:00
Evan Tschannen
58c2f67ff6
checking outstanding requests can be CPU intensive, so rate limit checking requests
2018-06-27 23:02:08 -07:00
Evan Tschannen
fb0d10635d
the first location in a satellite team is the one that will serve peek requests. Make sure we probably balance peek traffic by having the first servers on each team be used an equal amount of times
2018-06-27 22:14:50 -07:00
Evan Tschannen
a5b4698bc8
do not wait for good recruitment delay if the cluster controller is in the second best region
2018-06-27 21:05:55 -07:00
Evan Tschannen
dd72379363
reduced the failure detection times
2018-06-27 20:41:18 -07:00
Evan Tschannen
c74e43f2d0
fix: during upgrades, a storage server which does not have data for a shard could be in the source servers, so as a fallback if a fetch keys fails long enough disable locality based load balancing to allow the storage server to peek from someone different than itself
2018-06-27 20:35:51 -07:00
Steve Atherton
2203ba6c8f
Merge pull request #539 from ajbeamon/backstop-trace-event-throttle-in-constructor
...
Move the spammy trace event backstop from the destructor to the const…
2018-06-27 16:30:02 -07:00
A.J. Beamon
ea8a288a20
Merge pull request #537 from apple/release-5.2
...
Merge Release-5.2 into master
2018-06-27 15:55:58 -07:00
A.J. Beamon
cbc840ad0a
Move the spammy trace event backstop from the destructor to the constructor. This allows us to avoid doing needless work on a trace event that is going to be throttled.
2018-06-27 15:51:30 -07:00
Alec Grieser
ac9de812f4
Merge pull request #406 from ajbeamon/directory-tester-cleanup
...
Directory tester cleanup
2018-06-27 15:48:15 -07:00
Alex Miller
23b691b9c8
Merge pull request #536 from brownleej/ruby-doc-fixes-52
...
Add a documentation plugin for the formatting in our Ruby docs.
2018-06-27 15:40:08 -07:00
John Brownlee
9a51dec64e
Add a documentation plugin for the formatting in our Ruby docs.
2018-06-27 14:58:34 -07:00
Steve Atherton
cbcf5177eb
Merge pull request #429 from ajbeamon/trace-log-refactor
...
Trace log refactor
2018-06-27 14:52:09 -07:00
Alex Miller
f6c6d79056
Merge pull request #534 from etschannen/feature-remote-logs
...
Durable known committed version was incorrect
2018-06-27 14:24:09 -07:00
A.J. Beamon
d8ca7a766c
Change tree node state to have references to parent nodes and update merge logic accordinly.
2018-06-27 14:12:51 -07:00
Evan Tschannen
2987f85177
fix: known committed version must be updated before creating the tlogQueueEntryRef
2018-06-26 23:21:30 -07:00
Evan Tschannen
00167b0157
renamed some uses of knownCommittedVersion to durableKnownCommittedVersion
...
epochEnd exclusively refers to the last version a set of logs is responsible for serving peek requests for
recoverAt and recoveredAt refer to the last committed version of the previous generation
2018-06-26 18:20:28 -07:00
Evan Tschannen
6e19622872
fix: durableKnownCommittedVersion was incorrect because we could update knownCommittedVersion before pushing a TLogQueueEntry with that known committed version
2018-06-26 18:02:55 -07:00
A.J. Beamon
1f0561a9c0
Missed a couple requested changes
2018-06-26 15:22:39 -07:00
A.J. Beamon
a7158f96aa
Address some review comments
2018-06-26 15:06:15 -07:00
A.J. Beamon
2ed452353f
Merge branch 'release-5.2' into directory-tester-cleanup
2018-06-26 14:56:09 -07:00
A.J. Beamon
fec225075f
Merge branch 'master' into trace-log-refactor
2018-06-26 14:54:42 -07:00
A.J. Beamon
fe956bc35a
Address review comments
2018-06-26 14:37:21 -07:00
A.J. Beamon
9f545ce002
Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor
2018-06-26 11:37:23 -07:00
Alex Miller
4f6054a3e6
Merge pull request #530 from etschannen/feature-remote-logs
...
Bug fixes
2018-06-25 18:41:15 -07:00
Evan Tschannen
c6313a79e3
fix: the cluster controller needs to continue to retry recruitment until after wait_for_good_remote_recruitment_delay
2018-06-25 18:20:16 -07:00
Evan Tschannen
1a8dac365d
fix: poppedAllAfter was not set to a large enough version
2018-06-25 15:57:11 -07:00
Balachandar Namasivayam
5c9ef7763a
Merge pull request #528 from etschannen/feature-remote-logs
...
Fixed a correctness issue with parallel get more
2018-06-25 11:20:25 -07:00
Evan Tschannen
2ec8744ab3
fix: parallel get more needs to verify the begin version matches the end of the previous request, because when a peek cursor expires we lose all history, so the same sequence number could start at different versions
2018-06-25 11:15:49 -07:00
A.J. Beamon
203fd93fcc
Merge pull request #480 from fannix/master
...
Fix a concurrency bug in Java queue example
2018-06-25 08:22:58 -07:00
xmeng
1bd1d9562a
Fix indentation
2018-06-24 21:54:23 +01:00
Balachandar Namasivayam
d12c43b7ec
Merge pull request #527 from etschannen/feature-remote-logs
...
fix: wrong desired count used when checking good remote fitness
2018-06-22 12:42:08 -07:00