Commit Graph

204 Commits

Author SHA1 Message Date
Richard Chen bf7bf470d9 Merge branch 'master' of https://github.com/apple/foundationdb into stable_interfaces 2020-10-26 19:36:19 +00:00
Lukas Joswiak 22bc2eb71c Fix subsequence ordering issue 2020-10-26 01:15:26 -07:00
Lukas Joswiak 7403e49c33 Fix deserialization issue with late LogProtocolMessage
Previously, if multiple metadata messages were written without clearing
transaction state between them, an issue could occur when deserializing
where a later message could be missing its LogProtocolMessage prefix.
This would cause the protocol version to not be set, causing an
assertion failure when deserializing MutationRefs.
2020-10-26 01:13:51 -07:00
Lukas Joswiak 09721ba9bf Only send span context info if supported 2020-10-23 16:36:05 -07:00
Richard Chen 545ee4269d master conflicts 2020-10-19 01:03:54 +00:00
Richard Chen 41843f07e6 add simulator support for different process versions and ProtocolVersion test 2020-10-12 18:19:31 +00:00
Lukas Joswiak dea7000970 Merge remote-tracking branch 'upstream/master' into visibility-1 2020-10-06 18:38:15 -07:00
sfc-gh-tclinkenbeard 0814841827 Replace NULL with nullptr in fdbserver 2020-09-20 11:31:49 -07:00
Lukas Joswiak efde86340a Add knob to disable span serialization 2020-09-04 17:37:34 -07:00
Lukas Joswiak 783e6a170e Add code coverage 2020-09-04 17:36:56 -07:00
Lukas Joswiak f896c68996 Cleanup 2020-09-04 15:36:47 -07:00
Lukas Joswiak 2a58e775d2 Add original changes 2020-09-04 15:36:47 -07:00
Lukas Joswiak 1ca7fe1a05 Add span metadata message 2020-09-04 15:36:47 -07:00
Evan Tschannen 12edadd059 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	fdbclient/Knobs.cpp
#	fdbclient/MasterProxyInterface.h
#	fdbrpc/simulator.h
#	fdbserver/MasterProxyServer.actor.cpp
#	tests/fast/CycleAndLock.txt
#	tests/fast/TxnStateStoreCycleTest.txt
#	tests/fast/VersionStamp.txt
#	tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
#	tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
#	versions.target
2020-08-31 19:33:34 -07:00
Evan Tschannen 29eec30183 Merge branch 'release-6.2' into release-6.3
# Conflicts:
#	CMakeLists.txt
#	build/Dockerfile
#	build/Dockerfile.devel
#	documentation/sphinx/source/downloads.rst
#	fdbserver/Knobs.cpp
#	fdbserver/LogSystem.h
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WaitFailure.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	packaging/msi/FDBInstaller.wxs
2020-08-31 01:10:29 -07:00
Evan Tschannen 331a49a62a do not allow a proxy to reset a connection with the logs immediately upon starting up 2020-08-30 18:50:19 -07:00
Evan Tschannen fd1a4304fa fix: made ConnectionResetInfo reference counted 2020-08-26 10:53:17 -07:00
Evan Tschannen 8ede143941 Track tlog push latencies and reset connections if they are above 500ms 2020-08-18 08:43:14 -07:00
Meng Xu ef8c1060a2 Merge branch 'master' into mengxu/tmp-merge-6.3 2020-07-13 10:15:56 -07:00
A.J. Beamon b09dddc07e Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/ConfigureCompiler.cmake
#	documentation/sphinx/source/downloads.rst
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/fdbrpc.vcxproj
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen 717242a0ee reset WAN network connections every 5 minutes is responses take more than 500ms 2020-07-09 22:50:47 -07:00
Alex Miller ccaac162e2 Resolve performance concerns of nearly-no-op debugMutation being frequently called
This introduces unhygenic macro variants that inline a `ENABLED &&`
before the TraceEvent.  This way, they get entirely compiled out unless
enabled.

Then rewrite all debugMutation uses via sed.
2020-05-13 18:44:15 -07:00
Alex Miller cd65f301dd Merge remote-tracking branch 'upstream/master' into mutation-debugging 2020-03-27 03:47:24 -07:00
Alex Miller 122762cce1 Add debugMessagesAndTags, and track mutations in more places.
Like:
* Leaving the proxy
* Entering the TLog
* Leaving the TLog
* Being read on a cursor

All of this brought to you by TagsAndMessage!

This also slides in a minor optimization as to how mutations are serialized per target log.
2020-03-27 03:31:04 -07:00
Jingyu Zhou 90b40e1d75 Merge branch 'mengxu/new-backup-format-PR-delta' of github.com:xumengpanda/foundationdb into backup-worker-bak
Resolve Conflicts:
	fdbclient/BackupAgent.actor.h
	fdbserver/BackupWorker.actor.cpp
	fdbserver/RestoreMaster.actor.cpp
	fdbserver/masterserver.actor.cpp
2020-03-23 13:35:33 -07:00
Meng Xu 3f31ebf659 New backup:Revise event name and explain code 2020-03-23 10:55:44 -07:00
Jingyu Zhou 818072f3cb Set oldest backup epoch if not recruiting backup workers
Since tlog is not kept until backup worker has pulled mutations from it, the
old tlogs can only be displaced after oldest backup epoch equals current epoch.
So if master is not recruiting backup workers, it should set the oldest backup
epoch as the current epoch.
2020-03-20 20:16:43 -07:00
Jingyu Zhou 12ed8ad536 Fix backup worker start version when logset start version is lower
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 89d8f13038 Fix backup worker start version when logset start version is lower
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
2020-03-18 16:41:35 -07:00
Evan Tschannen 303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen 1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
Jingyu Zhou 1eaea91cb3 Address review comments 2020-01-22 19:42:13 -08:00
Jingyu Zhou 4ed75e37f3 BackupProgress uses old epoch's begin version if no progress found
Get rid of the complex logic of choosing the largest saved version from
previous epoch for the oldest epoch. Instead, use the begin version now
available from log system.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 64052f6349 Check and fill backup gaps for old epochs and tags
Sometimes the backup worker has not updated progress to the system space and a
master recovery happens. As a result, next epoch doesn't know the progress of
previous ones. This change is to check for such missing gaps and fill them with
the whole range [startVersion, endVersion).

The code is refactored into BackupProgress.actor.* to consolidate backup
progress processing for the master server.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 0c08161d8e Remove old backup workers when done
For backup workers working on old epochs, once their work is done, they will
notify the master. Then the master removes them from the log system and
acknowledge back to the backup workers so that they can gracefully shut down.

The popping of a backup worker is stalled if there are workers from older
epochs still working. Otherwise, workers from old epochs will lost data.

However, allowing newer epoch to start backup can cause holes in version ranges.
The restore process must verify the backup progress to make sure there are no
holes, otherwise it has to wait.
2020-01-22 19:38:45 -08:00
Jingyu Zhou 73824faf65 Track pseudo tags popping for individual IDs
For each log router ID, we track the popped version of each pseudo tag so that
the popping only applied to the minimum of these versions.

Also add more tracing for popping and epochs.
2020-01-22 19:38:45 -08:00
Jingyu Zhou 11964733b7 WIP: should be divided into smaller commits. 2020-01-22 19:38:45 -08:00
Jingyu Zhou a797958af6 Update peekLogRouter for backup workers to peek 2020-01-22 19:37:48 -08:00
Jingyu Zhou a4d6ebe79e Recruit backup worker in newEpoch 2020-01-22 19:37:48 -08:00
Evan Tschannen afc9713005 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/FDBTypes.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen 1c873591be fixed a compiler error 2019-11-05 18:32:15 -08:00
Evan Tschannen a8ca47beff optimized memory allocations by using VectorRef<Tag> instead of std::vector<Tag> 2019-11-05 18:07:30 -08:00
Evan Tschannen 457896b80d remote logs use bufferedCursor when peeking from log routers to improve performance
bufferedCursor performance has been improved
2019-11-04 19:47:45 -08:00
Jingyu Zhou cd3f1e33d4 Refactor deserialization of TagsAndMessages
Consolidate deserialization of TagsAndMessages in the structure itself and
change both TLog and ServerPeekCursor to use it.
2019-09-04 14:55:05 -07:00
Jingyu Zhou 4a63de16e9
Merge pull request #1945 from xumengpanda/mengxu/tLog-code-read-v2
Add comments to DiskQueue and tLog
2019-08-08 13:24:32 -07:00
Evan Tschannen 4c9a392f05 the master checks the popped version of the txsTag before recovering the txnStateStore, to avoid restoring data that is later found to be popped 2019-08-05 17:01:48 -07:00
Meng Xu c9c50ceff8 Comments:Add comments to DiskQueue
No functional change.
2019-08-01 15:20:01 -07:00
Evan Tschannen 653d9be6e2 we cannot pop old generations because it breaks forced recoveries 2019-07-31 18:27:36 -07:00
Evan Tschannen 1ea3ce8f9c txs pops also go to the old generations of tlogs to reduce the chance we have to restart txnStateStore recovery 2019-07-31 18:06:39 -07:00
Evan Tschannen 7ac7eb82f2 fix: buffered cursor would start multiple bufferedGetMore actors
advance all of the cursors to the poppedVersion
2019-07-30 14:42:05 -07:00
Evan Tschannen 9e3ec2cb33 fix: when resetting the peekCursor, we cannot discard the popped data if the adapter has already processed data 2019-07-30 13:25:25 -07:00
Evan Tschannen 1d326e3dc8 removed debugging message 2019-07-30 12:42:50 -07:00
Evan Tschannen 5d79e4141f fix: buffered cursor messageVersion should be set to the version we will be at after exhausting everything in messages 2019-07-30 12:38:44 -07:00
Evan Tschannen 45f7b41b48 fix: multi-cursor could discard popped commits after already returning data 2019-07-29 21:36:42 -07:00
Evan Tschannen 5bb322b483 implement popped on bufferedCursor 2019-07-29 21:19:47 -07:00
Evan Tschannen 28df2c35bb
Merge pull request #1855 from alexmiller-apple/sharded-txs-safe-upgrade
Make sharded txsTag upgradeable and downgradeable
2019-07-26 13:29:39 -07:00
sramamoorthy 9afd162e2f remove snap v1 related code 2019-07-25 17:29:31 -07:00
Alex Miller 95487861be Make sharded txsTag gated on TLogVersion::V4.
To allow a potential 6.2 -> 6.1 rollback.
2019-07-16 19:09:53 -07:00
Alex Miller 9396eedd11 Const some random functions that are trivially const.
For code hygiene reasons only.
2019-07-16 19:09:09 -07:00
Evan Tschannen 15e894c724 Merge in master 2019-07-05 15:49:24 -07:00
Alex Miller bf883d7055 Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-25 14:26:50 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen 1c005d5878
Merge pull request #1584 from alexmiller-apple/spilled-only-peek
Save TLog resources by letting peek request only spilled data.
2019-06-20 18:22:31 -07:00
Alex Miller 26343f557a Update getMore() contract.
MultiCursor already did this.
2019-06-20 17:48:24 -07:00
Evan Tschannen e0be631414 shard the txs tag so that more transaction logs are involved in its recovery 2019-06-19 18:15:09 -07:00
Alex Miller 51fd42a4d2 Merge remote-tracking branch 'upstream/master' into spilled-only-peek 2019-06-18 17:33:52 -07:00
mpilman 8576665a90 Revert "Revert "Make protocol version a type""
This reverts commit 455bf3b3ec.
2019-06-18 14:49:04 -07:00
Alex Miller 455bf3b3ec Revert "Make protocol version a type" 2019-06-18 10:59:17 -07:00
mpilman da53a92bec Make protocol version a type
This fixes #1214

The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
sramamoorthy 3d5998e9dd tlog: when pops are disabled, store them & replay
In Tlogs, disable pop is done whlie taking snapshots. Earlier, tlogs
were ignoring the pops if it got pop requests when pops were
disabled. In this change, instead of ignoring the pop - it remembers
the list of pops in-memory and plays them once the popping is
enabled.
2019-05-28 22:07:46 -07:00
sramamoorthy 4bc4c615da exec op to all tlog, restore change in test &other
- exec operation to go to all the TLogs
- minor bug fix in tlog
- restore implementation for the simulator
- restore snap UID to be stored in restartInfo.ini
- test cases added
- indentation and trace file fixes
2019-05-28 22:07:46 -07:00
sramamoorthy 69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
Jingyu Zhou b8e7fc1b84 Refactor: add std:: qualifier and use emplace_back 2019-05-17 09:38:50 -10:00
Alex Miller 4eb4c03ce5 Save TLog resources by letting peek request only spilled data.
If a peek is entirely fulfilled from spilled data, then it's likely that
the next peek will be also.  It is thus wasteful for each of these peeks
to call peekMessagesFromMemory, which memcpy's excessively, and then
throw all that data away without using it.

Now, TLogs will give a hint back to peek cursors about if the provided
reply was served entirely from the spilled data, which peek curors then
feed back as the hint into their next request.

At some point, a cursor will send a request for only spilled data, get
an incomplete response, and then be told to send its next request as one
that peeks from memory as well, and then it will fully catch up.
2019-05-14 15:38:48 -10:00
Jingyu Zhou 8b5449e608 Fix review comments for PR #1473 2019-04-29 16:45:42 -07:00
Jingyu Zhou 5462f560e7 Add pseudo locality for log routers and tlogs
This changes the logic of pop operations from log routers (LG):
- LG pops tagLocalityLogRouterMapped from TLogs;
- TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before
  popping.

Later when we add more psuedo localities, the same pattern can be used.
2019-04-23 21:35:56 -07:00
Jingyu Zhou 010f825aff Remove pseudoLocalities from LogSet, TLogSet, and CoreTLogSet 2019-04-21 10:41:07 -07:00
Jingyu Zhou 7befce6bf1 More pseudoLocalities and refactors. 2019-04-21 10:41:07 -07:00
Jingyu Zhou 966ec30fcc Add pseudoLocalities for special tag consumers 2019-04-21 10:41:07 -07:00
Jingyu Zhou d19b0cf1c1 Refactor LogSet with two new constructors 2019-04-21 10:41:07 -07:00
Jingyu Zhou 0b1984978a Small code refactoring. 2019-04-21 10:41:07 -07:00
Jingyu Zhou ec1bc5cfca Add LogSystemType enum 2019-04-21 10:41:07 -07:00
Evan Tschannen b6008558d3 renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen 5a00f567be fix CheckSatelliteTagLocation 2019-03-20 09:30:11 -07:00
Balachandar Namasivayam f9560e1abd Addressed Review Comments 2019-03-19 15:23:14 -07:00
Balachandar Namasivayam 5471725db5 Support config where the primary and remote DC's can be used as satellites. 2019-03-18 12:17:59 -07:00
Evan Tschannen a2108047aa removed LocalitySetRef and IRepPolicyRef typedefs, because for clarity the Ref suffix is reserved for arena allocated objects instead of reference counted objects. 2019-03-13 13:14:39 -07:00
Evan Tschannen 8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Alex Miller 2dc57568cb Change many things about log_version.
* log_version in the database (`/conf/log_version`) is now a hint that gets
  rounded to the nearest supported version.
* fdbcli and FDB enforce that only a valid log_version can be configured to
* TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet)
* Some comments here and there
* Add an assert on filename length to make sure KV-pairs in filename
  don't exceed a maximum length.
2019-02-26 16:47:04 -08:00
Evan Tschannen b8910ba7cd Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.h
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen 0e19b5a935 fix: allow the txnStateStore to be recovered from a process in a down datacenter, so that the cluster controller can know to switch to the other region 2019-02-21 16:52:27 -08:00
mpilman 3f0fd2a20c Use fwd decls in WorkerInterface
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen 9cfadad41b fix: if the tagPartitionedLogSystem cannot do a forced recovery, the master should not execute it forced recovery based modifications either 2019-02-18 15:13:18 -08:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen e7e1c634e0 fix: we need to restart the peek cursor when the known committed version becomes available 2018-10-02 17:44:14 -07:00
Evan Tschannen 05e7f08b26 added a peek method which will attempt to read the txsTag from the local region as much as possible 2018-09-28 12:21:08 -07:00
Evan Tschannen 30b2f85020 fix: it is not safe to drop logs supporting the current primary datacenter, because configuring usable_regions down will drop the storage servers in the remote region, leaving you will no remaining logs 2018-07-14 16:26:45 -07:00
Evan Tschannen cd63c7a7cc added a buffered cursor, which efficiently merges lots of peek cursors 2018-07-12 12:09:48 -07:00
Evan Tschannen c148c865e3 optimized log peek cursors to use much less CPU when using the policy engine 2018-07-11 15:43:55 -07:00
Evan Tschannen 6cf5354425 checkSatelliteTagLocations is not an error if the same zoneId is used multiple times 2018-07-05 13:00:13 -07:00