Commit Graph

265 Commits

Author SHA1 Message Date
Evan Tschannen d8e064d8bb fix: when a new log is recruited on a shared log, all outstanding commits need to be notified that they are stopped, because there is no longer a guarantee that their queueCommittedVersion will advance 2018-03-19 17:48:28 -07:00
Evan Tschannen 54be14000d do not deserialize tags 2018-03-17 11:24:18 -07:00
Evan Tschannen 9c8cb445d6 optimized the tlog to use a vector for tags instead of a map 2018-03-17 10:36:19 -07:00
Evan Tschannen fecfea0f7d fix: messages vector was not cleared 2018-03-17 10:24:44 -07:00
Evan Tschannen ccd70fd005 The tlog uses the tags embedded in the message instead of a separate vector of locations
optimized remote tlog committing to avoid re-serializing the message
2018-03-16 16:47:05 -07:00
Evan Tschannen 820382ea68 optimized the log router commit path to avoid re-serializing the data 2018-03-16 11:40:21 -07:00
Evan Tschannen f6a22c1035 fix: the recovery actor was holding a copy of the tlogInterface after the tlog was removed 2018-03-12 16:56:34 -07:00
A.J. Beamon f2c804e14f Reverting changes from merge of master into release-5.2 (b25810711c). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit. 2018-03-06 10:15:04 -08:00
A.J. Beamon b25810711c
Merge branch 'master' into release-5.2 2018-03-05 10:32:57 -08:00
Balachandar Namasivayam 8ae640c062 Addressed review comments. 2018-03-02 17:56:49 -08:00
Balachandar Namasivayam 11df1aeabf Add new api to get shared tlogs id and address 2018-03-02 16:50:30 -08:00
Evan Tschannen e3c6b66240 fix: do not commit more data after being stopped
fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center
2018-02-26 13:13:37 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen ddb484143c fix: do not peek from remote logs if they are not fully recovered 2018-02-21 14:06:44 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen 1dc6a8d4bd fix: the tlog can peek from log systems that have been recovered even if it does not match its recoverFrom set 2018-02-20 14:50:13 -08:00
Evan Tschannen 31b89a638f added satellite_none and remote_none options to unconfigure from a fearless setup
fix: log_router configuration was broken
2018-02-17 13:51:17 -08:00
Evan Tschannen 1fedcba890 fix: do not use log router tags when configured without remote logs
fix: data distribution tracks undesired storage servers
re-enabled consistency check
2018-02-13 17:01:34 -08:00
Stephen Atherton 0a35f167e4 Merge branch 'master' into feature-redwood
# Conflicts:
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/IDiskQueue.h
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/worker.actor.cpp
2018-02-12 01:30:02 -08:00
Evan Tschannen 6b54d56ca7 gracefully exit if attempting to upgrade from 4.X versions 2018-01-30 17:10:50 -08:00
Evan Tschannen 29c5d4ad3d upgrades from 5.X mostly supported, still some remaining correctness problems 2018-01-28 11:52:54 -08:00
Evan Tschannen 66b2218989 added tlog support for upgrading from 5.X clusters. Does not support upgrading from 4.X or earlier. Untested, storage servers still need the ability to change their tag. 2018-01-21 12:21:46 -08:00
Evan Tschannen 21482a45e1 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DBCoreState.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen be643d6937 fix: the tlog did not cancel recovery properly when stopped 2018-01-12 17:18:14 -08:00
Evan Tschannen de119f192d fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled) 2018-01-11 16:09:49 -08:00
Evan Tschannen 30710f7493 syncLogId was not necessary 2018-01-06 14:52:39 -08:00
Evan Tschannen 10c3fc165e fix: after recovering from disk, only allow peeking data the was fully recovered 2018-01-06 13:49:13 -08:00
Evan Tschannen 63751fb0e2 fix: remote logs are not in the log system until the recovery is complete so they cannot be used to determine if this is the correct log system to recover from 2018-01-05 14:15:25 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Evan Tschannen f2c4beed9f fix: tlogFitness did not consider it better to have one tlog of a better fitness
fix: checkStable was not used in all places in better master exists
fix: we need to call checkOutstanding on worker registration in all cases
fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it
2018-01-04 11:33:02 -08:00
Alex Miller c7dbd31a1e Refactoring: Create a common prefixRange and do UID->Key once in backup. 2017-12-19 17:17:50 -08:00
Evan Tschannen 8c51bc4ac4 fixed low latency tests in a way that gives us better test coverage 2017-11-28 18:20:29 -08:00
Evan Tschannen dc624a54dc fix: avoid flushing large queues in simulation when checking latency 2017-11-27 17:23:20 -08:00
Evan Tschannen df74e2a373 re-added support for non-copying tlog recovery 2017-10-24 15:09:31 -07:00
Evan Tschannen 15962cf079 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbrpc/Locality.cpp
#	fdbrpc/Locality.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/ClusterRecruitmentInterface.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/masterserver.actor.cpp
#	fdbserver/worker.actor.cpp
#	flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Balachandar Namasivayam 0e153cdd35 Throttle Spammy logs. Three knobs are added.
Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent.
If a TraceEvent is throttled, a warning msg is logged.
2017-10-02 18:43:11 -07:00
Stephen Atherton 248dab79b6 Created “redwood” storage engine option and many changes to support that including IKeyValueStore::init() and custom DiskQueue file extensions. 2017-09-21 23:51:55 -07:00
Evan Tschannen f75dfc3153 do not register with the master until recovery of the queue is complete, to avoid having the master wait a long time for a peek response 2017-09-18 17:39:12 -07:00
Evan Tschannen 36c98f18e9 do not register a worker with the cluster controller until it has finished recovering all files from disk 2017-09-15 10:57:58 -07:00
Evan Tschannen d343d37274 fixed merge problems 2017-09-11 16:37:10 -07:00
Evan Tschannen 76e7988663 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.h
#	flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Alec Grieser 300b5a17ed Merge branch 'release-5.0' 2017-08-25 18:55:33 -07:00
Evan Tschannen 272b4b984c fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine 2017-08-25 10:12:58 -07:00
A.J. Beamon 4c706d33e9 Merge branch 'release-5.0' 2017-08-23 14:59:43 -07:00
Alvin Moore 7729f663e9 Ensured that the circus id is always lowercase 2017-08-23 13:45:00 -07:00
Evan Tschannen 4b40f817f1 fix: is recovery is cancelled before the copy is complete, remove the tlog 2017-08-23 12:26:03 -07:00
Alec Grieser 5ee07b1a9e Merge branch 'release-5.0' 2017-08-14 16:56:58 -07:00
Evan Tschannen de1b590a8a The TLog did not delete data from removed logs
The TLog continued to make data from removed logs persistent
2017-08-11 18:08:09 -07:00
Stephen Atherton 50fb44be92 Merge branch 'release-5.0'
# Conflicts:
#	versions.target
2017-08-09 23:36:12 -07:00
Evan Tschannen 2335fc73f2 fix: peek cursors were being timed out every 10 minutes, instead of 10 minutes after the last use
fix: if an interface is changed while we are not waiting in getMore, we will not reset the sequence to 0.
2017-08-09 15:58:06 -07:00
Evan Tschannen c22708b6d6 added tag localities
fix: remote logs need to stop the master when they are stopped
2017-08-03 16:16:36 -07:00
Alec Grieser ca7437ecf6 Merge branch 'release-5.0' 2017-08-02 22:07:01 -07:00
A.J. Beamon 4243486f54 fix: TLogMetrics was being track latested with the wrong ID 2017-07-28 14:37:23 -07:00
Evan Tschannen f6826f1e15 fix: log routers were popped at too high of a version
fix: make sure tlogs make everything durable
fix: make cluster controller’s temporary remote log recruitment not interfere with better master exists
2017-07-20 16:26:05 -07:00
Evan Tschannen 7fec378830 do not continue copying data from prior generations after being locked 2017-07-19 15:11:18 -07:00
Alec Grieser 660729839c moved Notified.h from flow -> fdbclient ; flow bindings package does better job when excluding testers 2017-07-14 15:49:30 -07:00
Evan Tschannen 57ba9d36af fixed a large number of bugs 2017-07-13 12:29:21 -07:00
Evan Tschannen 81ae263ad9 implemented setPeekCursor
removed oldTLogServer
first compiling version
2017-07-10 17:41:32 -07:00
Evan Tschannen 979ebcef6c changed to using a vector of logSets instead of a duplicate set of logs for remote servers
finished porting changes to the tlog
everything but peeking is finished in the TagPartitionedLogSystem
2017-07-09 14:46:16 -07:00
Evan Tschannen 0906250e78 merged everything from feature-remote-logs besides the tlog and tagpartitionedlogsystem
re-included tags in messages to the tlog
previously never committed the LogRouter
2017-06-29 15:50:19 -07:00
Evan Tschannen 663535e00a add the same number for shared metrics and local metrics 2017-06-28 15:21:54 -07:00
Evan Tschannen fe37d0b056 added additional logging about bytesInput and bytesDurable 2017-06-28 13:29:40 -07:00
Evan Tschannen 533dca95d8 fix: bytesDurable was not correctly increased when a log was removed
re-added many TLogMetrics
added a new role for the shared tlog
2017-06-22 17:21:42 -07:00
Evan Tschannen b108c1ea0d fix: do not delay before setting logData->version 2017-06-02 11:27:37 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00