Commit Graph

265 Commits

Author SHA1 Message Date
Evan Tschannen 200e65fe61 added a workload which tests killing an entire region, and recovering from the failure with data loss.
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Evan Tschannen 4dd2dda0a3 Merge branch 'release-6.0'
# Conflicts:
#	fdbserver/worker.actor.cpp
2018-09-05 16:11:06 -07:00
Evan Tschannen df406a340e
Merge pull request #742 from ajbeamon/roles-in-trace-events
Add the roles running on a process as a field on trace events in the …
2018-09-05 16:08:12 -07:00
Evan Tschannen 90301f497f Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/TLSConnection.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	versions.target
2018-09-05 16:06:33 -07:00
A.J. Beamon 2de0b5d6d7 Add the roles running on a process as a field on trace events in the form of a comma delimited string of role abbreviations. 2018-09-05 15:06:14 -07:00
Evan Tschannen 1022e0a5c6 added yields to the log router and tlogs after processing a version 2018-09-04 17:16:44 -07:00
Evan Tschannen 717c43a69f merge 6.0 into master 2018-08-22 00:28:04 -07:00
Evan Tschannen 84e1f7b2b5 added overhead bytes durable to complement overhead bytes input 2018-08-21 22:35:04 -07:00
Evan Tschannen 74f7412975 added separate logging for overhead bytes 2018-08-21 22:18:38 -07:00
Evan Tschannen d7c01f0419 added a separate knob for tlog’s recoverMemoryLimit 2018-08-21 21:11:23 -07:00
Alex Miller 74a9d2f836 Remove a couple more `Void _ = wait`s that crept in from rebase. 2018-08-14 15:50:26 -07:00
Alex Miller fb31a6999f Rewrite all files to have #include actorcompiler.h as the last include. 2018-08-14 15:50:26 -07:00
Alex Miller 69aa33eed5 Fix a case of a bool being used as an integer. 2018-08-14 15:50:26 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen 7c5d414f7b fix: during destruction logData could attempt to dereference tLogData after it has been deleted 2018-08-09 12:38:35 -07:00
Evan Tschannen c757c68bfa fix: nextVersion needs to be set to logData->version if version_sizes is empty 2018-08-04 23:53:37 -07:00
Evan Tschannen fec285146c significant cpu optimization in update storage 2018-08-04 12:36:48 -07:00
Evan Tschannen be1a4d74c7 tlogs serve reads to log routers at a low priority, to prevent them from using all their resources catching up a remote dc that has been down for a long time
increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget
2018-08-04 10:31:30 -07:00
Evan Tschannen 71f89f372f changed a trace event name to avoid scope type mismatch on the tag field 2018-08-03 15:53:38 -07:00
Evan Tschannen 2619234477 Merge branch 'release-5.2' into release-6.0
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2018-08-03 11:40:24 -07:00
Evan Tschannen 501033c5af fix: tlog spilling on a stopped log was only making one version durable at a time 2018-08-03 11:38:12 -07:00
Evan Tschannen 1c29275672 call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details. 2018-08-01 14:30:57 -07:00
Stephen Atherton 40762d9f9b Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-07-25 17:58:52 -07:00
Evan Tschannen 0f59dc4086 fix: do not write to the persistent queue when we are terminated, which could happen if shutdown was caused by setting a promise in the asyncPullData loop 2018-07-13 17:01:31 -07:00
Evan Tschannen cd63c7a7cc added a buffered cursor, which efficiently merges lots of peek cursors 2018-07-12 12:09:48 -07:00
Stephen Atherton 96389c74cd Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 16:42:34 -07:00
Stephen Atherton 1bc95862b7 Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 04:16:02 -07:00
Evan Tschannen 6b40f2764d fix: off by one error on popping missing tags 2018-07-09 15:43:22 -07:00
Evan Tschannen 2718176927 fix: remote logs did not pop all of the data for removed logs on recovery because data for the missing tag was not recorded yet at the time of recovery 2018-07-06 16:10:41 -07:00
Evan Tschannen 507b3bacb0 fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
added more recovery states.
2018-07-05 00:08:51 -07:00
Stephen Atherton b95a2bd6c1 Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
# Conflicts:
#	flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen 2987f85177 fix: known committed version must be updated before creating the tlogQueueEntryRef 2018-06-26 23:21:30 -07:00
Evan Tschannen 6e19622872 fix: durableKnownCommittedVersion was incorrect because we could update knownCommittedVersion before pushing a TLogQueueEntry with that known committed version 2018-06-26 18:02:55 -07:00
Evan Tschannen 2ec8744ab3 fix: parallel get more needs to verify the begin version matches the end of the previous request, because when a peek cursor expires we lose all history, so the same sequence number could start at different versions 2018-06-25 11:15:49 -07:00
Evan Tschannen 68ac3bdc4c log routers now calculate a precise version to pop for their log router tag 2018-06-21 15:29:46 -07:00
Evan Tschannen f755961c42 use parallelGetMore during log recovery 2018-06-20 17:04:06 -07:00
Stephen Atherton e5c48d453a Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-18 22:45:27 -07:00
Evan Tschannen df4c445e25 fix: we need to return from commit when stopped 2018-06-18 22:12:46 -07:00
Evan Tschannen 403fb5a2e9 removed unnecessary suppressFor 2018-06-18 17:59:29 -07:00
Evan Tschannen b79feaddd3 added a hard memory limit to the TLog to prevent it from running out of memory. Because remote logs are not ratekeeper controlled this is their only protection 2018-06-18 17:22:40 -07:00
Stephen Atherton 90c8288c68 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-17 14:55:05 -07:00
Evan Tschannen 7aef5ec6f1 got rid of persistUnrecoveredBefore
added persistLocality
2018-06-17 14:44:33 -07:00
Evan Tschannen f637c680f1 fix: populateSatelliteTagLocations was broken
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen 0d87186821 use a specific locality for satellites 2018-06-15 11:06:38 -07:00
Evan Tschannen feb8578c06 fix: only flush and exit in simulation 2018-06-14 13:48:30 -07:00
Stephen Atherton 1eae9d621b Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-13 15:58:21 -07:00
Stephen Atherton 2878f30f29 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/IKeyValueStore.h
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Alex Miller fcfa00928b Make RecoveryState an enum class.
This means that all the == 7 or != 0 checks go away, and explicit names must be used.
2018-06-12 16:50:25 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen 588eaf4b36 fix: previous delay 0 could still cause us to recruit a tlog before processing disk errors 2018-06-11 11:26:30 -07:00
Evan Tschannen a5c2a8ee8a fix: allow disk errors to cancel the actor before recruiting logs 2018-06-10 20:27:19 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen bf65e745a9 tlogs do not index tags for other localities 2018-06-01 22:51:08 -07:00
Evan Tschannen c519339adb avoid peeking from logs that do not match the tag’s locality 2018-06-01 18:42:48 -07:00
A.J. Beamon 1f0b519a73 Rename several variables in TLogServer.actor.cpp to follow our normal camel case conventions. I didn't rename every variable here, because some appear to be data structures (like a map) following the pattern keydesc_valuedesc, and I wasn't sure that the straightforward keydescValuedesc rename made sense. I did rename a couple of instances of these where it seemed reasonable, though. 2018-06-01 10:18:07 -07:00
A.J. Beamon d9c702a9e3 Merge release-5.1 into release-5.2 2018-05-30 09:09:55 -07:00
Evan Tschannen 529bd34cf9 fix: when a tlog is stopped by another recruitment it no longer has the opportunity for commtingQueue to be set 2018-05-06 20:37:44 -07:00
Evan Tschannen 81c7bddaf8 fix: must check for log router errors while waiting on satellite replies because the recruitmentID will not be updated if it threw an error 2018-05-06 18:15:12 -07:00
Evan Tschannen b4bd03e67e fix: we cannot set queueCommitEnd until we have popped the log system to prevent the popped version from going backwards 2018-05-01 22:20:25 -07:00
Evan Tschannen 12ef63b698 knobify replace contents bytes 2018-05-01 19:43:35 -07:00
Evan Tschannen 10d25927cd Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
2018-04-30 22:15:39 -07:00
Evan Tschannen 5143871fed passed debug ids into all versions of peek() to assist debugging 2018-04-30 13:36:35 -07:00
Evan Tschannen 883f2318a0 test fearless configurations 2018-04-30 13:17:29 -07:00
Evan Tschannen 92b134eb98 fix: errors from removed were not handled properly 2018-04-29 23:05:08 -07:00
Evan Tschannen 6f318dbff2 fix: do not reply to recruitment until we are sure the log commits to the queue 2018-04-29 22:08:24 -07:00
Evan Tschannen 9cdabfed0e added useful trace events 2018-04-29 18:54:47 -07:00
Evan Tschannen f77c1ec14e fix: fixed rare bug where a log stopped by a different recruitment would still response successfully to the recruitment message 2018-04-28 13:34:06 -07:00
Evan Tschannen af63dac5dd fix: remote logs need to wait until the durable known committed version is greater than the recovery version before completing recovery to ensure we will not pick a start version that we do not have 2018-04-27 12:18:42 -07:00
Stephen Atherton af61d3596d Merge branch 'public-master' into feature-redwood
# Conflicts:
#	fdbserver/DatabaseConfiguration.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
2018-04-24 17:22:21 -07:00
Evan Tschannen ae1de575f1 fix: remote logs are not considered fully recovered until they are at recoveredAt 2018-04-23 17:49:46 -07:00
Evan Tschannen 3ec09ce9f6 fix: only peekSingle needs to throw worker_removed, because tlogs have other ways to get notified they are no longer needed
fix: we need to wait until tags are popped past recoveredAt instead of unrecovered before
2018-04-23 16:43:08 -07:00
Evan Tschannen 126fc53d10 fix: the start version for peek cursors that merge with multiple log sets is the maximum of the individual start versions 2018-04-23 12:42:51 -07:00
Dennis Schafroth 290122637b Using ASSERT_ABORT in destructors 2018-04-23 14:05:10 +02:00
Evan Tschannen 73597f190e fix: new tlogs are initialized with exactly the tags which existed at the recovery version 2018-04-22 20:28:01 -07:00
Evan Tschannen a520d03397 fix: if we cannot find a tag, it must have been popped at the recovery version. 2018-04-22 15:08:38 -07:00
Evan Tschannen 1d1e2cd367 fix: initialize the known committed version on the tlog 2018-04-21 00:41:15 -07:00
Evan Tschannen 8d350ceb5f fix: persist the known committed version on the tlogs 2018-04-20 17:55:46 -07:00
Evan Tschannen a6d9e889f0 a cleaner solution to preventing tlogs from peeking log routers 2018-04-20 13:25:22 -07:00
Evan Tschannen f5c3417905 fix: prevent tlogs from peeking the wrong log routers 2018-04-20 00:30:37 -07:00
Evan Tschannen 5da452db8e fix: pop the log routers again after the log system updates 2018-04-19 14:33:31 -07:00
Evan Tschannen 22526ef996 fix: do not tell storage servers about large sections of empty versions, because it can lead them to make mutations durable which have not been committed 2018-04-18 16:06:44 -07:00
Evan Tschannen 447c7bd15b fix: log routers use durable known committed version at the time of the pop to determine what is safe to pop from their logs
fix: storage server does not advance its version across large version increase until it has data associated with the version
2018-04-18 12:07:29 -07:00
Evan Tschannen 760bc8bc99 fix: log router version needs to be fetched before it is available
fix: tlog did not fetch known committed version if start version was exactly equal to it
2018-04-17 11:16:48 -07:00
Evan Tschannen 3e40505f4a Revert "fix: remote logs should reply until they have recovered through recoverAt"
This reverts commit 3c0c03c004.
2018-04-16 23:17:16 -07:00
Evan Tschannen 3c0c03c004 fix: remote logs should reply until they have recovered through recoverAt 2018-04-16 17:25:49 -07:00
Evan Tschannen 3018a7b1b3 fix: the known committed version of a newly initialized log is 1, since by definition the first commit must have succeeded 2018-04-16 10:42:48 -07:00
Evan Tschannen 5533016f1e fix: tlogs are now initialized immediately, instead of when starting the core, this must be done to pop the log routers during recovery
fix: log router start version must be the same as remote log start version
2018-04-15 14:33:07 -07:00
Evan Tschannen 65e69620a7 fix: unrecoveredBefore on a new log is at minimum 1 2018-04-13 10:41:30 -07:00
Evan Tschannen 1af5ac0d9d fix: a number of different problems prevented tlogs from using log routers during recovery 2018-04-12 15:20:54 -07:00
Alex Miller 20082e3228 Clang fixes. 2018-04-12 11:10:53 -07:00
Evan Tschannen a738c4bec1 fix: if the known committed version is equal to the recovery version we do not need to copy any data 2018-04-09 20:48:55 -07:00
Evan Tschannen 419951f601 fix: need to initialize tlog versions to less than the startVersion 2018-04-09 17:17:11 -07:00
Evan Tschannen 4c89f721cd fix: do not include logRouter tags in lock results 2018-04-09 10:48:57 -07:00
Evan Tschannen 7af892f50b first working version of non-copying recovery working with fearless configurations 2018-04-08 21:24:05 -07:00
Stephen Atherton 2752a28611 Merge branch 'release-5.2' of github.com:apple/foundationdb into feature-redwood 2018-04-06 16:29:37 -07:00
Evan Tschannen 331e707684 fix: pop all tags that did not have data at the recovery version because fully popped tags may come back when pullAsyncData re-indexes the mutations 2018-03-31 16:47:56 -07:00
Evan Tschannen 96fffe2cea fix: do not update version if the log has been stopped 2018-03-30 22:11:42 -07:00
Evan Tschannen 1a4ded1c99 support upgrades by merging tags associated with the different peek requests 2018-03-29 17:54:08 -07:00
Evan Tschannen b36e08f08f first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet 2018-03-29 15:12:38 -07:00
Evan Tschannen 0746fe4d56 optimized tag lookups on the tlog by removing one level of vectors 2018-03-20 10:41:42 -07:00
Evan Tschannen d8e064d8bb fix: when a new log is recruited on a shared log, all outstanding commits need to be notified that they are stopped, because there is no longer a guarantee that their queueCommittedVersion will advance 2018-03-19 17:48:28 -07:00
Evan Tschannen 54be14000d do not deserialize tags 2018-03-17 11:24:18 -07:00
Evan Tschannen 9c8cb445d6 optimized the tlog to use a vector for tags instead of a map 2018-03-17 10:36:19 -07:00
Evan Tschannen fecfea0f7d fix: messages vector was not cleared 2018-03-17 10:24:44 -07:00
Evan Tschannen ccd70fd005 The tlog uses the tags embedded in the message instead of a separate vector of locations
optimized remote tlog committing to avoid re-serializing the message
2018-03-16 16:47:05 -07:00
Evan Tschannen 820382ea68 optimized the log router commit path to avoid re-serializing the data 2018-03-16 11:40:21 -07:00
Evan Tschannen f6a22c1035 fix: the recovery actor was holding a copy of the tlogInterface after the tlog was removed 2018-03-12 16:56:34 -07:00
A.J. Beamon f2c804e14f Reverting changes from merge of master into release-5.2 (b25810711c). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit. 2018-03-06 10:15:04 -08:00
A.J. Beamon b25810711c
Merge branch 'master' into release-5.2 2018-03-05 10:32:57 -08:00
Balachandar Namasivayam 8ae640c062 Addressed review comments. 2018-03-02 17:56:49 -08:00
Balachandar Namasivayam 11df1aeabf Add new api to get shared tlogs id and address 2018-03-02 16:50:30 -08:00
Evan Tschannen e3c6b66240 fix: do not commit more data after being stopped
fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center
2018-02-26 13:13:37 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen ddb484143c fix: do not peek from remote logs if they are not fully recovered 2018-02-21 14:06:44 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen 1dc6a8d4bd fix: the tlog can peek from log systems that have been recovered even if it does not match its recoverFrom set 2018-02-20 14:50:13 -08:00
Evan Tschannen 31b89a638f added satellite_none and remote_none options to unconfigure from a fearless setup
fix: log_router configuration was broken
2018-02-17 13:51:17 -08:00
Evan Tschannen 1fedcba890 fix: do not use log router tags when configured without remote logs
fix: data distribution tracks undesired storage servers
re-enabled consistency check
2018-02-13 17:01:34 -08:00
Stephen Atherton 0a35f167e4 Merge branch 'master' into feature-redwood
# Conflicts:
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/IDiskQueue.h
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/worker.actor.cpp
2018-02-12 01:30:02 -08:00
Evan Tschannen 6b54d56ca7 gracefully exit if attempting to upgrade from 4.X versions 2018-01-30 17:10:50 -08:00
Evan Tschannen 29c5d4ad3d upgrades from 5.X mostly supported, still some remaining correctness problems 2018-01-28 11:52:54 -08:00
Evan Tschannen 66b2218989 added tlog support for upgrading from 5.X clusters. Does not support upgrading from 4.X or earlier. Untested, storage servers still need the ability to change their tag. 2018-01-21 12:21:46 -08:00
Evan Tschannen 21482a45e1 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DBCoreState.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen be643d6937 fix: the tlog did not cancel recovery properly when stopped 2018-01-12 17:18:14 -08:00
Evan Tschannen de119f192d fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled) 2018-01-11 16:09:49 -08:00
Evan Tschannen 30710f7493 syncLogId was not necessary 2018-01-06 14:52:39 -08:00
Evan Tschannen 10c3fc165e fix: after recovering from disk, only allow peeking data the was fully recovered 2018-01-06 13:49:13 -08:00
Evan Tschannen 63751fb0e2 fix: remote logs are not in the log system until the recovery is complete so they cannot be used to determine if this is the correct log system to recover from 2018-01-05 14:15:25 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Evan Tschannen f2c4beed9f fix: tlogFitness did not consider it better to have one tlog of a better fitness
fix: checkStable was not used in all places in better master exists
fix: we need to call checkOutstanding on worker registration in all cases
fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it
2018-01-04 11:33:02 -08:00
Alex Miller c7dbd31a1e Refactoring: Create a common prefixRange and do UID->Key once in backup. 2017-12-19 17:17:50 -08:00
Evan Tschannen 8c51bc4ac4 fixed low latency tests in a way that gives us better test coverage 2017-11-28 18:20:29 -08:00
Evan Tschannen dc624a54dc fix: avoid flushing large queues in simulation when checking latency 2017-11-27 17:23:20 -08:00
Evan Tschannen df74e2a373 re-added support for non-copying tlog recovery 2017-10-24 15:09:31 -07:00
Evan Tschannen 15962cf079 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbrpc/Locality.cpp
#	fdbrpc/Locality.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/ClusterRecruitmentInterface.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/masterserver.actor.cpp
#	fdbserver/worker.actor.cpp
#	flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Balachandar Namasivayam 0e153cdd35 Throttle Spammy logs. Three knobs are added.
Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent.
If a TraceEvent is throttled, a warning msg is logged.
2017-10-02 18:43:11 -07:00
Stephen Atherton 248dab79b6 Created “redwood” storage engine option and many changes to support that including IKeyValueStore::init() and custom DiskQueue file extensions. 2017-09-21 23:51:55 -07:00
Evan Tschannen f75dfc3153 do not register with the master until recovery of the queue is complete, to avoid having the master wait a long time for a peek response 2017-09-18 17:39:12 -07:00
Evan Tschannen 36c98f18e9 do not register a worker with the cluster controller until it has finished recovering all files from disk 2017-09-15 10:57:58 -07:00
Evan Tschannen d343d37274 fixed merge problems 2017-09-11 16:37:10 -07:00
Evan Tschannen 76e7988663 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.h
#	flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Alec Grieser 300b5a17ed Merge branch 'release-5.0' 2017-08-25 18:55:33 -07:00
Evan Tschannen 272b4b984c fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine 2017-08-25 10:12:58 -07:00
A.J. Beamon 4c706d33e9 Merge branch 'release-5.0' 2017-08-23 14:59:43 -07:00
Alvin Moore 7729f663e9 Ensured that the circus id is always lowercase 2017-08-23 13:45:00 -07:00
Evan Tschannen 4b40f817f1 fix: is recovery is cancelled before the copy is complete, remove the tlog 2017-08-23 12:26:03 -07:00
Alec Grieser 5ee07b1a9e Merge branch 'release-5.0' 2017-08-14 16:56:58 -07:00
Evan Tschannen de1b590a8a The TLog did not delete data from removed logs
The TLog continued to make data from removed logs persistent
2017-08-11 18:08:09 -07:00
Stephen Atherton 50fb44be92 Merge branch 'release-5.0'
# Conflicts:
#	versions.target
2017-08-09 23:36:12 -07:00
Evan Tschannen 2335fc73f2 fix: peek cursors were being timed out every 10 minutes, instead of 10 minutes after the last use
fix: if an interface is changed while we are not waiting in getMore, we will not reset the sequence to 0.
2017-08-09 15:58:06 -07:00