Commit Graph

320 Commits

Author SHA1 Message Date
A.J. Beamon 25c4880ebe Merge branch 'release-6.3' into merge-release-6.3-into-master (temporarily discard all changes to BackupContainer.actor.cpp)
# Conflicts:
#	fdbclient/BackupContainer.actor.cpp
#	fdbserver/Knobs.h
2021-03-15 16:41:04 -07:00
FDB Formatster df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
Vishesh Yadav 3fbe7e69fa Add TraceEvent to warn tLogs which has not joined after timeout
During recovery, wait for TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS and
log the tLogs that has not joined yet.
2021-03-09 18:57:46 -08:00
FDB Formatster 8a8c488ede apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-05 18:13:38 -06:00
Andrew Noyes 79cec09255 Apply clang-tidy's performance-inefficient-vector-operation fix
I ran this command in my build directory after compiling with
OPEN_FOR_IDE. It took a few small tweaks to get it to compile, which is
outside the scope of this commit.

    $ python run-clang-tidy.py -j $(nproc) -checks='-*,performance-inefficient-vector-operation' -fix
2021-03-04 03:58:25 +00:00
Evan Tschannen 346a4e3ecd Merge branch 'release-6.3'
# Conflicts:
#	fdbcli/fdbcli.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/MultiInterface.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2021-03-01 18:52:06 -08:00
Meng Xu 0c28e1d640 Add comment to peekLogRouter in TagPartitionedLogSystem 2021-02-22 16:20:50 -08:00
Meng Xu ef0bf2728e Merge branch 'release-6.3' into mengxu/ha-code
Resolve Conflicts:
	fdbserver/LogRouter.actor.cpp: Only conflicts at comments
2021-02-19 21:47:09 -08:00
Meng Xu 33eb1de00e Add some comment to log system
and resolve review comment by deleting my questions.
2021-02-19 21:44:13 -08:00
Meng Xu 471a3489fb Resolve review comments and add trace fields to MasterRecoveryState 2021-02-17 14:44:14 -08:00
Meng Xu 9122be4d81 Add comments to HA code and loadBalance code 2021-02-10 13:51:36 -08:00
sfc-gh-tclinkenbeard dd3669886e Improve IPeekCursor and ILogSystem method signatures 2020-12-08 09:09:31 -08:00
Andrew Noyes 877997632d Merge branch 'release-6.3' into anoyes/merge-release-6.3-master
Include conflict markers for review purposes
2020-12-04 01:38:07 +00:00
Andrew Noyes dc2bac5670 Resolve conflicts 2020-11-24 19:09:42 +00:00
Andrew Noyes 1f541f02be Merge branch 'anoyes/merge-6.2-to-6.3' into anoyes/release-6.3-merge
Merge, leaving conflict markers for now
2020-11-24 16:55:34 +00:00
Meng Xu 046a6e8427 Add Alex comment on tLog 2020-11-12 13:29:11 -08:00
sfc-gh-tclinkenbeard 4669f837fa Add uses of makeReference 2020-11-07 22:10:18 -08:00
sfc-gh-tclinkenbeard 0ff1809d25 Remove emplace_back(new ...) pattern
This pattern can cause a memory leak due to an exception between
resource allocation and management
2020-11-07 22:09:53 -08:00
Meng Xu 4788544a6f Revise comments based on review suggestions
Ack. Jingyu and Xin for their suggestions.
2020-11-06 08:51:13 -08:00
Meng Xu 063700e4d6 Add comments and questions to HA and tLog code reading
The comments' correctness need to be confirmed by reviewers.
2020-10-30 12:14:57 -07:00
Lukas Joswiak 53b7721d6c Add additional trace information 2020-09-04 15:36:47 -07:00
Lukas Joswiak 2a58e775d2 Add original changes 2020-09-04 15:36:47 -07:00
Evan Tschannen 12edadd059 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	fdbclient/Knobs.cpp
#	fdbclient/MasterProxyInterface.h
#	fdbrpc/simulator.h
#	fdbserver/MasterProxyServer.actor.cpp
#	tests/fast/CycleAndLock.txt
#	tests/fast/TxnStateStoreCycleTest.txt
#	tests/fast/VersionStamp.txt
#	tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
#	tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
#	versions.target
2020-08-31 19:33:34 -07:00
Evan Tschannen 29eec30183 Merge branch 'release-6.2' into release-6.3
# Conflicts:
#	CMakeLists.txt
#	build/Dockerfile
#	build/Dockerfile.devel
#	documentation/sphinx/source/downloads.rst
#	fdbserver/Knobs.cpp
#	fdbserver/LogSystem.h
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WaitFailure.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	packaging/msi/FDBInstaller.wxs
2020-08-31 01:10:29 -07:00
Evan Tschannen 507c67c930 Added additional information to trace events 2020-08-26 11:42:23 -07:00
Evan Tschannen 28cb5f242c another fix 2020-08-26 11:01:40 -07:00
Evan Tschannen e81ccd2dc9 another compiler fix 2020-08-26 10:59:06 -07:00
Evan Tschannen e531046b53 fix compiler errors 2020-08-26 10:56:21 -07:00
Evan Tschannen fd1a4304fa fix: made ConnectionResetInfo reference counted 2020-08-26 10:53:17 -07:00
Evan Tschannen 8ede143941 Track tlog push latencies and reset connections if they are above 500ms 2020-08-18 08:43:14 -07:00
Meng Xu ef8c1060a2 Merge branch 'master' into mengxu/tmp-merge-6.3 2020-07-13 10:15:56 -07:00
Jingyu Zhou 5cc5d9cf1e Log peer address whose failure can cause master recovery
So when there is master recovery due to failed tlog, proxy, resolver, log
router, or resolver, we can have a trace event tells which address that the
master thinks is dead.
2020-07-10 15:57:03 -07:00
negoyal cf13e00a8f Merge remote-tracking branch 'origin/release-6.3' into fdb_cache_wo_allocator 2020-06-01 17:38:31 -07:00
negoyal dd033736ed Merge branch 'master' into fdb_cache_subfeature2 2020-05-04 17:29:43 -07:00
Evan Tschannen fd0ee72293 Merge branch 'master' into feature-small-endpoint 2020-04-29 18:43:10 -07:00
Jingyu Zhou 0823091423 Fix backup worker removal races with setting
The master waits for all backup worker recruitment done and then set them in a
batch. However, a backup worker could remove itself before the master sets it.
As a result, the worker is not removed and oldest backup epoch can't advance,
and TLog can't be popped.
2020-04-20 11:06:46 -07:00
Evan Tschannen a442565e13 more work towards shrinking locality 2020-04-18 21:29:38 -07:00
Evan Tschannen 99a58f8ee5 fix compiler errors 2020-04-17 17:47:50 -07:00
Evan Tschannen 07dc2988e7 filter the locality inside of tlog interface to reduce its size 2020-04-17 17:37:14 -07:00
negoyal a0c8946f31 Merge branch 'master' into fdb_cache_subfeature2 2020-04-02 12:27:04 -07:00
Jingyu Zhou 280bc94738 Do not recruit backup workers with wrong tags
In a rare scenario, the master can recruit backup workers with more tags than
the number of log router tags for an epoch. This can be caused by an
unsuccessful recovery, which uses more tags than the next epoch. When
recruiting for the next epoch, if no progress has been made yet, the recruiting
logic will look back at the previous epoch. If previous epoch has saved past
this epoch's begin version, current epoch's progress is updated with that
information and can result in more tags being inserted to this epoch's
recruitment.
2020-03-28 21:19:41 -07:00
negoyal 4772bedc1b Remove a trce message that seemed to have problems. 2020-03-26 17:11:26 -07:00
negoyal acaf91ac47 Merge branch 'master' into fdb_cache_subfeature2 2020-03-26 13:33:08 -07:00
negoyal 8abac91033 Fixed a bug in cache server while peeking at a version lower than popped version and added some logging. 2020-03-26 12:39:07 -07:00
Jingyu Zhou 90b40e1d75 Merge branch 'mengxu/new-backup-format-PR-delta' of github.com:xumengpanda/foundationdb into backup-worker-bak
Resolve Conflicts:
	fdbclient/BackupAgent.actor.h
	fdbserver/BackupWorker.actor.cpp
	fdbserver/RestoreMaster.actor.cpp
	fdbserver/masterserver.actor.cpp
2020-03-23 13:35:33 -07:00
Meng Xu 3f31ebf659 New backup:Revise event name and explain code 2020-03-23 10:55:44 -07:00
Jingyu Zhou 0eacf1cdab trackTlogRecovery listens on backup worker change events
Old TLogs can only be removed when backup workers no long need them (i.e., the
oldest backup epoch == current epoch). As a result, the core state changes need
include backup worker changes, which updates the oldest backup epoch.
2020-03-20 20:17:32 -07:00
Jingyu Zhou 818072f3cb Set oldest backup epoch if not recruiting backup workers
Since tlog is not kept until backup worker has pulled mutations from it, the
old tlogs can only be displaced after oldest backup epoch equals current epoch.
So if master is not recruiting backup workers, it should set the oldest backup
epoch as the current epoch.
2020-03-20 20:16:43 -07:00
Jingyu Zhou 5b36dcaad5 Fix oldest backup epoch for backup workers
The oldest backup epoch is piggybacked in LogSystemConfig from master to
cluster controller and then to all workers. Previously, this epoch is set
to the current master epoch, which is wrong.
2020-03-20 20:15:09 -07:00
Jingyu Zhou 12ed8ad536 Fix backup worker start version when logset start version is lower
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
2020-03-20 20:15:08 -07:00