Evan Tschannen
3c0c03c004
fix: remote logs should reply until they have recovered through recoverAt
2018-04-16 17:25:49 -07:00
Evan Tschannen
3018a7b1b3
fix: the known committed version of a newly initialized log is 1, since by definition the first commit must have succeeded
2018-04-16 10:42:48 -07:00
Evan Tschannen
5533016f1e
fix: tlogs are now initialized immediately, instead of when starting the core, this must be done to pop the log routers during recovery
...
fix: log router start version must be the same as remote log start version
2018-04-15 14:33:07 -07:00
Evan Tschannen
65e69620a7
fix: unrecoveredBefore on a new log is at minimum 1
2018-04-13 10:41:30 -07:00
Evan Tschannen
1af5ac0d9d
fix: a number of different problems prevented tlogs from using log routers during recovery
2018-04-12 15:20:54 -07:00
Evan Tschannen
a738c4bec1
fix: if the known committed version is equal to the recovery version we do not need to copy any data
2018-04-09 20:48:55 -07:00
Evan Tschannen
419951f601
fix: need to initialize tlog versions to less than the startVersion
2018-04-09 17:17:11 -07:00
Evan Tschannen
4c89f721cd
fix: do not include logRouter tags in lock results
2018-04-09 10:48:57 -07:00
Evan Tschannen
7af892f50b
first working version of non-copying recovery working with fearless configurations
2018-04-08 21:24:05 -07:00
Evan Tschannen
331e707684
fix: pop all tags that did not have data at the recovery version because fully popped tags may come back when pullAsyncData re-indexes the mutations
2018-03-31 16:47:56 -07:00
Evan Tschannen
96fffe2cea
fix: do not update version if the log has been stopped
2018-03-30 22:11:42 -07:00
Evan Tschannen
1a4ded1c99
support upgrades by merging tags associated with the different peek requests
2018-03-29 17:54:08 -07:00
Evan Tschannen
b36e08f08f
first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet
2018-03-29 15:12:38 -07:00
Evan Tschannen
0746fe4d56
optimized tag lookups on the tlog by removing one level of vectors
2018-03-20 10:41:42 -07:00
Evan Tschannen
d8e064d8bb
fix: when a new log is recruited on a shared log, all outstanding commits need to be notified that they are stopped, because there is no longer a guarantee that their queueCommittedVersion will advance
2018-03-19 17:48:28 -07:00
Evan Tschannen
54be14000d
do not deserialize tags
2018-03-17 11:24:18 -07:00
Evan Tschannen
9c8cb445d6
optimized the tlog to use a vector for tags instead of a map
2018-03-17 10:36:19 -07:00
Evan Tschannen
fecfea0f7d
fix: messages vector was not cleared
2018-03-17 10:24:44 -07:00
Evan Tschannen
ccd70fd005
The tlog uses the tags embedded in the message instead of a separate vector of locations
...
optimized remote tlog committing to avoid re-serializing the message
2018-03-16 16:47:05 -07:00
Evan Tschannen
820382ea68
optimized the log router commit path to avoid re-serializing the data
2018-03-16 11:40:21 -07:00
Evan Tschannen
f6a22c1035
fix: the recovery actor was holding a copy of the tlogInterface after the tlog was removed
2018-03-12 16:56:34 -07:00
A.J. Beamon
b25810711c
Merge branch 'master' into release-5.2
2018-03-05 10:32:57 -08:00
Balachandar Namasivayam
8ae640c062
Addressed review comments.
2018-03-02 17:56:49 -08:00
Balachandar Namasivayam
11df1aeabf
Add new api to get shared tlogs id and address
2018-03-02 16:50:30 -08:00
Evan Tschannen
e3c6b66240
fix: do not commit more data after being stopped
...
fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center
2018-02-26 13:13:37 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen
ddb484143c
fix: do not peek from remote logs if they are not fully recovered
2018-02-21 14:06:44 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
1dc6a8d4bd
fix: the tlog can peek from log systems that have been recovered even if it does not match its recoverFrom set
2018-02-20 14:50:13 -08:00
Evan Tschannen
31b89a638f
added satellite_none and remote_none options to unconfigure from a fearless setup
...
fix: log_router configuration was broken
2018-02-17 13:51:17 -08:00
Evan Tschannen
1fedcba890
fix: do not use log router tags when configured without remote logs
...
fix: data distribution tracks undesired storage servers
re-enabled consistency check
2018-02-13 17:01:34 -08:00
Evan Tschannen
6b54d56ca7
gracefully exit if attempting to upgrade from 4.X versions
2018-01-30 17:10:50 -08:00
Evan Tschannen
29c5d4ad3d
upgrades from 5.X mostly supported, still some remaining correctness problems
2018-01-28 11:52:54 -08:00
Evan Tschannen
66b2218989
added tlog support for upgrading from 5.X clusters. Does not support upgrading from 4.X or earlier. Untested, storage servers still need the ability to change their tag.
2018-01-21 12:21:46 -08:00
Evan Tschannen
21482a45e1
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DBCoreState.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen
be643d6937
fix: the tlog did not cancel recovery properly when stopped
2018-01-12 17:18:14 -08:00
Evan Tschannen
de119f192d
fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled)
2018-01-11 16:09:49 -08:00
Evan Tschannen
30710f7493
syncLogId was not necessary
2018-01-06 14:52:39 -08:00
Evan Tschannen
10c3fc165e
fix: after recovering from disk, only allow peeking data the was fully recovered
2018-01-06 13:49:13 -08:00
Evan Tschannen
63751fb0e2
fix: remote logs are not in the log system until the recovery is complete so they cannot be used to determine if this is the correct log system to recover from
2018-01-05 14:15:25 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Evan Tschannen
f2c4beed9f
fix: tlogFitness did not consider it better to have one tlog of a better fitness
...
fix: checkStable was not used in all places in better master exists
fix: we need to call checkOutstanding on worker registration in all cases
fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it
2018-01-04 11:33:02 -08:00
Alex Miller
c7dbd31a1e
Refactoring: Create a common prefixRange and do UID->Key once in backup.
2017-12-19 17:17:50 -08:00
Evan Tschannen
8c51bc4ac4
fixed low latency tests in a way that gives us better test coverage
2017-11-28 18:20:29 -08:00
Evan Tschannen
dc624a54dc
fix: avoid flushing large queues in simulation when checking latency
2017-11-27 17:23:20 -08:00
Evan Tschannen
df74e2a373
re-added support for non-copying tlog recovery
2017-10-24 15:09:31 -07:00
Evan Tschannen
15962cf079
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbrpc/Locality.cpp
# fdbrpc/Locality.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/ClusterRecruitmentInterface.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/fdbserver.vcxproj.filters
# fdbserver/masterserver.actor.cpp
# fdbserver/worker.actor.cpp
# flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Balachandar Namasivayam
0e153cdd35
Throttle Spammy logs. Three knobs are added.
...
Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent.
If a TraceEvent is throttled, a warning msg is logged.
2017-10-02 18:43:11 -07:00
Evan Tschannen
f75dfc3153
do not register with the master until recovery of the queue is complete, to avoid having the master wait a long time for a peek response
2017-09-18 17:39:12 -07:00
Evan Tschannen
36c98f18e9
do not register a worker with the cluster controller until it has finished recovering all files from disk
2017-09-15 10:57:58 -07:00