Evan Tschannen
200e65fe61
added a workload which tests killing an entire region, and recovering from the failure with data loss.
...
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Evan Tschannen
4dd2dda0a3
Merge branch 'release-6.0'
...
# Conflicts:
# fdbserver/worker.actor.cpp
2018-09-05 16:11:06 -07:00
Evan Tschannen
df406a340e
Merge pull request #742 from ajbeamon/roles-in-trace-events
...
Add the roles running on a process as a field on trace events in the …
2018-09-05 16:08:12 -07:00
Evan Tschannen
90301f497f
Merge branch 'release-6.0'
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/TLSConnection.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# versions.target
2018-09-05 16:06:33 -07:00
A.J. Beamon
2de0b5d6d7
Add the roles running on a process as a field on trace events in the form of a comma delimited string of role abbreviations.
2018-09-05 15:06:14 -07:00
Evan Tschannen
1022e0a5c6
added yields to the log router and tlogs after processing a version
2018-09-04 17:16:44 -07:00
Evan Tschannen
717c43a69f
merge 6.0 into master
2018-08-22 00:28:04 -07:00
Evan Tschannen
84e1f7b2b5
added overhead bytes durable to complement overhead bytes input
2018-08-21 22:35:04 -07:00
Evan Tschannen
74f7412975
added separate logging for overhead bytes
2018-08-21 22:18:38 -07:00
Evan Tschannen
d7c01f0419
added a separate knob for tlog’s recoverMemoryLimit
2018-08-21 21:11:23 -07:00
Alex Miller
74a9d2f836
Remove a couple more `Void _ = wait`s that crept in from rebase.
2018-08-14 15:50:26 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
69aa33eed5
Fix a case of a bool being used as an integer.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
7c5d414f7b
fix: during destruction logData could attempt to dereference tLogData after it has been deleted
2018-08-09 12:38:35 -07:00
Evan Tschannen
c757c68bfa
fix: nextVersion needs to be set to logData->version if version_sizes is empty
2018-08-04 23:53:37 -07:00
Evan Tschannen
fec285146c
significant cpu optimization in update storage
2018-08-04 12:36:48 -07:00
Evan Tschannen
be1a4d74c7
tlogs serve reads to log routers at a low priority, to prevent them from using all their resources catching up a remote dc that has been down for a long time
...
increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget
2018-08-04 10:31:30 -07:00
Evan Tschannen
71f89f372f
changed a trace event name to avoid scope type mismatch on the tag field
2018-08-03 15:53:38 -07:00
Evan Tschannen
2619234477
Merge branch 'release-5.2' into release-6.0
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2018-08-03 11:40:24 -07:00
Evan Tschannen
501033c5af
fix: tlog spilling on a stopped log was only making one version durable at a time
2018-08-03 11:38:12 -07:00
Evan Tschannen
1c29275672
call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.
2018-08-01 14:30:57 -07:00
Stephen Atherton
40762d9f9b
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-07-25 17:58:52 -07:00
Evan Tschannen
0f59dc4086
fix: do not write to the persistent queue when we are terminated, which could happen if shutdown was caused by setting a promise in the asyncPullData loop
2018-07-13 17:01:31 -07:00
Evan Tschannen
cd63c7a7cc
added a buffered cursor, which efficiently merges lots of peek cursors
2018-07-12 12:09:48 -07:00
Stephen Atherton
96389c74cd
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 16:42:34 -07:00
Stephen Atherton
1bc95862b7
Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 04:16:02 -07:00
Evan Tschannen
6b40f2764d
fix: off by one error on popping missing tags
2018-07-09 15:43:22 -07:00
Evan Tschannen
2718176927
fix: remote logs did not pop all of the data for removed logs on recovery because data for the missing tag was not recorded yet at the time of recovery
2018-07-06 16:10:41 -07:00
Evan Tschannen
507b3bacb0
fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
...
added more recovery states.
2018-07-05 00:08:51 -07:00
Stephen Atherton
b95a2bd6c1
Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
...
# Conflicts:
# flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen
2987f85177
fix: known committed version must be updated before creating the tlogQueueEntryRef
2018-06-26 23:21:30 -07:00
Evan Tschannen
6e19622872
fix: durableKnownCommittedVersion was incorrect because we could update knownCommittedVersion before pushing a TLogQueueEntry with that known committed version
2018-06-26 18:02:55 -07:00
Evan Tschannen
2ec8744ab3
fix: parallel get more needs to verify the begin version matches the end of the previous request, because when a peek cursor expires we lose all history, so the same sequence number could start at different versions
2018-06-25 11:15:49 -07:00
Evan Tschannen
68ac3bdc4c
log routers now calculate a precise version to pop for their log router tag
2018-06-21 15:29:46 -07:00
Evan Tschannen
f755961c42
use parallelGetMore during log recovery
2018-06-20 17:04:06 -07:00
Stephen Atherton
e5c48d453a
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-18 22:45:27 -07:00
Evan Tschannen
df4c445e25
fix: we need to return from commit when stopped
2018-06-18 22:12:46 -07:00
Evan Tschannen
403fb5a2e9
removed unnecessary suppressFor
2018-06-18 17:59:29 -07:00
Evan Tschannen
b79feaddd3
added a hard memory limit to the TLog to prevent it from running out of memory. Because remote logs are not ratekeeper controlled this is their only protection
2018-06-18 17:22:40 -07:00
Stephen Atherton
90c8288c68
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-17 14:55:05 -07:00
Evan Tschannen
7aef5ec6f1
got rid of persistUnrecoveredBefore
...
added persistLocality
2018-06-17 14:44:33 -07:00
Evan Tschannen
f637c680f1
fix: populateSatelliteTagLocations was broken
...
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen
0d87186821
use a specific locality for satellites
2018-06-15 11:06:38 -07:00
Evan Tschannen
feb8578c06
fix: only flush and exit in simulation
2018-06-14 13:48:30 -07:00
Stephen Atherton
1eae9d621b
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-13 15:58:21 -07:00
Stephen Atherton
2878f30f29
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/IKeyValueStore.h
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Alex Miller
fcfa00928b
Make RecoveryState an enum class.
...
This means that all the == 7 or != 0 checks go away, and explicit names must be used.
2018-06-12 16:50:25 -07:00
Evan Tschannen
372ed67497
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen
588eaf4b36
fix: previous delay 0 could still cause us to recruit a tlog before processing disk errors
2018-06-11 11:26:30 -07:00
Evan Tschannen
a5c2a8ee8a
fix: allow disk errors to cancel the actor before recruiting logs
2018-06-10 20:27:19 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen
bf65e745a9
tlogs do not index tags for other localities
2018-06-01 22:51:08 -07:00
Evan Tschannen
c519339adb
avoid peeking from logs that do not match the tag’s locality
2018-06-01 18:42:48 -07:00
A.J. Beamon
1f0b519a73
Rename several variables in TLogServer.actor.cpp to follow our normal camel case conventions. I didn't rename every variable here, because some appear to be data structures (like a map) following the pattern keydesc_valuedesc, and I wasn't sure that the straightforward keydescValuedesc rename made sense. I did rename a couple of instances of these where it seemed reasonable, though.
2018-06-01 10:18:07 -07:00
A.J. Beamon
d9c702a9e3
Merge release-5.1 into release-5.2
2018-05-30 09:09:55 -07:00
Evan Tschannen
529bd34cf9
fix: when a tlog is stopped by another recruitment it no longer has the opportunity for commtingQueue to be set
2018-05-06 20:37:44 -07:00
Evan Tschannen
81c7bddaf8
fix: must check for log router errors while waiting on satellite replies because the recruitmentID will not be updated if it threw an error
2018-05-06 18:15:12 -07:00
Evan Tschannen
b4bd03e67e
fix: we cannot set queueCommitEnd until we have popped the log system to prevent the popped version from going backwards
2018-05-01 22:20:25 -07:00
Evan Tschannen
12ef63b698
knobify replace contents bytes
2018-05-01 19:43:35 -07:00
Evan Tschannen
10d25927cd
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
2018-04-30 22:15:39 -07:00
Evan Tschannen
5143871fed
passed debug ids into all versions of peek() to assist debugging
2018-04-30 13:36:35 -07:00
Evan Tschannen
883f2318a0
test fearless configurations
2018-04-30 13:17:29 -07:00
Evan Tschannen
92b134eb98
fix: errors from removed were not handled properly
2018-04-29 23:05:08 -07:00
Evan Tschannen
6f318dbff2
fix: do not reply to recruitment until we are sure the log commits to the queue
2018-04-29 22:08:24 -07:00
Evan Tschannen
9cdabfed0e
added useful trace events
2018-04-29 18:54:47 -07:00
Evan Tschannen
f77c1ec14e
fix: fixed rare bug where a log stopped by a different recruitment would still response successfully to the recruitment message
2018-04-28 13:34:06 -07:00
Evan Tschannen
af63dac5dd
fix: remote logs need to wait until the durable known committed version is greater than the recovery version before completing recovery to ensure we will not pick a start version that we do not have
2018-04-27 12:18:42 -07:00
Stephen Atherton
af61d3596d
Merge branch 'public-master' into feature-redwood
...
# Conflicts:
# fdbserver/DatabaseConfiguration.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
2018-04-24 17:22:21 -07:00
Evan Tschannen
ae1de575f1
fix: remote logs are not considered fully recovered until they are at recoveredAt
2018-04-23 17:49:46 -07:00
Evan Tschannen
3ec09ce9f6
fix: only peekSingle needs to throw worker_removed, because tlogs have other ways to get notified they are no longer needed
...
fix: we need to wait until tags are popped past recoveredAt instead of unrecovered before
2018-04-23 16:43:08 -07:00
Evan Tschannen
126fc53d10
fix: the start version for peek cursors that merge with multiple log sets is the maximum of the individual start versions
2018-04-23 12:42:51 -07:00
Dennis Schafroth
290122637b
Using ASSERT_ABORT in destructors
2018-04-23 14:05:10 +02:00
Evan Tschannen
73597f190e
fix: new tlogs are initialized with exactly the tags which existed at the recovery version
2018-04-22 20:28:01 -07:00
Evan Tschannen
a520d03397
fix: if we cannot find a tag, it must have been popped at the recovery version.
2018-04-22 15:08:38 -07:00
Evan Tschannen
1d1e2cd367
fix: initialize the known committed version on the tlog
2018-04-21 00:41:15 -07:00
Evan Tschannen
8d350ceb5f
fix: persist the known committed version on the tlogs
2018-04-20 17:55:46 -07:00
Evan Tschannen
a6d9e889f0
a cleaner solution to preventing tlogs from peeking log routers
2018-04-20 13:25:22 -07:00
Evan Tschannen
f5c3417905
fix: prevent tlogs from peeking the wrong log routers
2018-04-20 00:30:37 -07:00
Evan Tschannen
5da452db8e
fix: pop the log routers again after the log system updates
2018-04-19 14:33:31 -07:00
Evan Tschannen
22526ef996
fix: do not tell storage servers about large sections of empty versions, because it can lead them to make mutations durable which have not been committed
2018-04-18 16:06:44 -07:00
Evan Tschannen
447c7bd15b
fix: log routers use durable known committed version at the time of the pop to determine what is safe to pop from their logs
...
fix: storage server does not advance its version across large version increase until it has data associated with the version
2018-04-18 12:07:29 -07:00
Evan Tschannen
760bc8bc99
fix: log router version needs to be fetched before it is available
...
fix: tlog did not fetch known committed version if start version was exactly equal to it
2018-04-17 11:16:48 -07:00
Evan Tschannen
3e40505f4a
Revert "fix: remote logs should reply until they have recovered through recoverAt"
...
This reverts commit 3c0c03c004
.
2018-04-16 23:17:16 -07:00
Evan Tschannen
3c0c03c004
fix: remote logs should reply until they have recovered through recoverAt
2018-04-16 17:25:49 -07:00
Evan Tschannen
3018a7b1b3
fix: the known committed version of a newly initialized log is 1, since by definition the first commit must have succeeded
2018-04-16 10:42:48 -07:00
Evan Tschannen
5533016f1e
fix: tlogs are now initialized immediately, instead of when starting the core, this must be done to pop the log routers during recovery
...
fix: log router start version must be the same as remote log start version
2018-04-15 14:33:07 -07:00
Evan Tschannen
65e69620a7
fix: unrecoveredBefore on a new log is at minimum 1
2018-04-13 10:41:30 -07:00
Evan Tschannen
1af5ac0d9d
fix: a number of different problems prevented tlogs from using log routers during recovery
2018-04-12 15:20:54 -07:00
Alex Miller
20082e3228
Clang fixes.
2018-04-12 11:10:53 -07:00
Evan Tschannen
a738c4bec1
fix: if the known committed version is equal to the recovery version we do not need to copy any data
2018-04-09 20:48:55 -07:00
Evan Tschannen
419951f601
fix: need to initialize tlog versions to less than the startVersion
2018-04-09 17:17:11 -07:00
Evan Tschannen
4c89f721cd
fix: do not include logRouter tags in lock results
2018-04-09 10:48:57 -07:00
Evan Tschannen
7af892f50b
first working version of non-copying recovery working with fearless configurations
2018-04-08 21:24:05 -07:00
Stephen Atherton
2752a28611
Merge branch 'release-5.2' of github.com:apple/foundationdb into feature-redwood
2018-04-06 16:29:37 -07:00
Evan Tschannen
331e707684
fix: pop all tags that did not have data at the recovery version because fully popped tags may come back when pullAsyncData re-indexes the mutations
2018-03-31 16:47:56 -07:00
Evan Tschannen
96fffe2cea
fix: do not update version if the log has been stopped
2018-03-30 22:11:42 -07:00
Evan Tschannen
1a4ded1c99
support upgrades by merging tags associated with the different peek requests
2018-03-29 17:54:08 -07:00
Evan Tschannen
b36e08f08f
first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet
2018-03-29 15:12:38 -07:00
Evan Tschannen
0746fe4d56
optimized tag lookups on the tlog by removing one level of vectors
2018-03-20 10:41:42 -07:00
Evan Tschannen
d8e064d8bb
fix: when a new log is recruited on a shared log, all outstanding commits need to be notified that they are stopped, because there is no longer a guarantee that their queueCommittedVersion will advance
2018-03-19 17:48:28 -07:00
Evan Tschannen
54be14000d
do not deserialize tags
2018-03-17 11:24:18 -07:00
Evan Tschannen
9c8cb445d6
optimized the tlog to use a vector for tags instead of a map
2018-03-17 10:36:19 -07:00
Evan Tschannen
fecfea0f7d
fix: messages vector was not cleared
2018-03-17 10:24:44 -07:00
Evan Tschannen
ccd70fd005
The tlog uses the tags embedded in the message instead of a separate vector of locations
...
optimized remote tlog committing to avoid re-serializing the message
2018-03-16 16:47:05 -07:00
Evan Tschannen
820382ea68
optimized the log router commit path to avoid re-serializing the data
2018-03-16 11:40:21 -07:00
Evan Tschannen
f6a22c1035
fix: the recovery actor was holding a copy of the tlogInterface after the tlog was removed
2018-03-12 16:56:34 -07:00
A.J. Beamon
f2c804e14f
Reverting changes from merge of master into release-5.2 ( b25810711c
). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit.
2018-03-06 10:15:04 -08:00
A.J. Beamon
b25810711c
Merge branch 'master' into release-5.2
2018-03-05 10:32:57 -08:00
Balachandar Namasivayam
8ae640c062
Addressed review comments.
2018-03-02 17:56:49 -08:00
Balachandar Namasivayam
11df1aeabf
Add new api to get shared tlogs id and address
2018-03-02 16:50:30 -08:00
Evan Tschannen
e3c6b66240
fix: do not commit more data after being stopped
...
fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center
2018-02-26 13:13:37 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen
ddb484143c
fix: do not peek from remote logs if they are not fully recovered
2018-02-21 14:06:44 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
1dc6a8d4bd
fix: the tlog can peek from log systems that have been recovered even if it does not match its recoverFrom set
2018-02-20 14:50:13 -08:00
Evan Tschannen
31b89a638f
added satellite_none and remote_none options to unconfigure from a fearless setup
...
fix: log_router configuration was broken
2018-02-17 13:51:17 -08:00
Evan Tschannen
1fedcba890
fix: do not use log router tags when configured without remote logs
...
fix: data distribution tracks undesired storage servers
re-enabled consistency check
2018-02-13 17:01:34 -08:00
Stephen Atherton
0a35f167e4
Merge branch 'master' into feature-redwood
...
# Conflicts:
# fdbserver/DiskQueue.actor.cpp
# fdbserver/IDiskQueue.h
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
# fdbserver/worker.actor.cpp
2018-02-12 01:30:02 -08:00
Evan Tschannen
6b54d56ca7
gracefully exit if attempting to upgrade from 4.X versions
2018-01-30 17:10:50 -08:00
Evan Tschannen
29c5d4ad3d
upgrades from 5.X mostly supported, still some remaining correctness problems
2018-01-28 11:52:54 -08:00
Evan Tschannen
66b2218989
added tlog support for upgrading from 5.X clusters. Does not support upgrading from 4.X or earlier. Untested, storage servers still need the ability to change their tag.
2018-01-21 12:21:46 -08:00
Evan Tschannen
21482a45e1
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DBCoreState.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen
be643d6937
fix: the tlog did not cancel recovery properly when stopped
2018-01-12 17:18:14 -08:00
Evan Tschannen
de119f192d
fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled)
2018-01-11 16:09:49 -08:00
Evan Tschannen
30710f7493
syncLogId was not necessary
2018-01-06 14:52:39 -08:00
Evan Tschannen
10c3fc165e
fix: after recovering from disk, only allow peeking data the was fully recovered
2018-01-06 13:49:13 -08:00
Evan Tschannen
63751fb0e2
fix: remote logs are not in the log system until the recovery is complete so they cannot be used to determine if this is the correct log system to recover from
2018-01-05 14:15:25 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Evan Tschannen
f2c4beed9f
fix: tlogFitness did not consider it better to have one tlog of a better fitness
...
fix: checkStable was not used in all places in better master exists
fix: we need to call checkOutstanding on worker registration in all cases
fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it
2018-01-04 11:33:02 -08:00
Alex Miller
c7dbd31a1e
Refactoring: Create a common prefixRange and do UID->Key once in backup.
2017-12-19 17:17:50 -08:00
Evan Tschannen
8c51bc4ac4
fixed low latency tests in a way that gives us better test coverage
2017-11-28 18:20:29 -08:00
Evan Tschannen
dc624a54dc
fix: avoid flushing large queues in simulation when checking latency
2017-11-27 17:23:20 -08:00
Evan Tschannen
df74e2a373
re-added support for non-copying tlog recovery
2017-10-24 15:09:31 -07:00
Evan Tschannen
15962cf079
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbrpc/Locality.cpp
# fdbrpc/Locality.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/ClusterRecruitmentInterface.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/fdbserver.vcxproj.filters
# fdbserver/masterserver.actor.cpp
# fdbserver/worker.actor.cpp
# flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Balachandar Namasivayam
0e153cdd35
Throttle Spammy logs. Three knobs are added.
...
Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent.
If a TraceEvent is throttled, a warning msg is logged.
2017-10-02 18:43:11 -07:00
Stephen Atherton
248dab79b6
Created “redwood” storage engine option and many changes to support that including IKeyValueStore::init() and custom DiskQueue file extensions.
2017-09-21 23:51:55 -07:00
Evan Tschannen
f75dfc3153
do not register with the master until recovery of the queue is complete, to avoid having the master wait a long time for a peek response
2017-09-18 17:39:12 -07:00
Evan Tschannen
36c98f18e9
do not register a worker with the cluster controller until it has finished recovering all files from disk
2017-09-15 10:57:58 -07:00
Evan Tschannen
d343d37274
fixed merge problems
2017-09-11 16:37:10 -07:00
Evan Tschannen
76e7988663
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.h
# flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Alec Grieser
300b5a17ed
Merge branch 'release-5.0'
2017-08-25 18:55:33 -07:00
Evan Tschannen
272b4b984c
fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine
2017-08-25 10:12:58 -07:00
A.J. Beamon
4c706d33e9
Merge branch 'release-5.0'
2017-08-23 14:59:43 -07:00
Alvin Moore
7729f663e9
Ensured that the circus id is always lowercase
2017-08-23 13:45:00 -07:00
Evan Tschannen
4b40f817f1
fix: is recovery is cancelled before the copy is complete, remove the tlog
2017-08-23 12:26:03 -07:00
Alec Grieser
5ee07b1a9e
Merge branch 'release-5.0'
2017-08-14 16:56:58 -07:00
Evan Tschannen
de1b590a8a
The TLog did not delete data from removed logs
...
The TLog continued to make data from removed logs persistent
2017-08-11 18:08:09 -07:00
Stephen Atherton
50fb44be92
Merge branch 'release-5.0'
...
# Conflicts:
# versions.target
2017-08-09 23:36:12 -07:00
Evan Tschannen
2335fc73f2
fix: peek cursors were being timed out every 10 minutes, instead of 10 minutes after the last use
...
fix: if an interface is changed while we are not waiting in getMore, we will not reset the sequence to 0.
2017-08-09 15:58:06 -07:00