Commit Graph

879 Commits

Author SHA1 Message Date
sfc-gh-tclinkenbeard c74047c665 Merge remote-tracking branch 'origin/master' into fix-more-clang-warnings 2021-07-28 11:51:02 -07:00
Steve Atherton 507c1f11e3 Add .log() to bare TraceEvent() invocations without any .detail()s to avoid clang-tidy warning about immediate destruction of object without use. 2021-07-26 19:55:10 -07:00
sfc-gh-tclinkenbeard a27d7c86f4 Fix more -Wreorder-ctor warnings in DataDistribution.actor.cpp 2021-07-24 22:14:43 -07:00
sfc-gh-tclinkenbeard da50e13f3e Fix more -Wreorder-ctor warnings in DataDistribution.actor.cpp, OldTLogServer_4_6.actor.cpp, and Net2.actor.cpp 2021-07-24 17:33:11 -07:00
sfc-gh-tclinkenbeard 6f81155784 Merge remote-tracking branch 'origin/master' into const-serverdbinfo 2021-07-20 10:18:40 -07:00
Steve Atherton f596a81073 Rename ::TRUE and ::FALSE in BooleanParams to ::True and ::False so as to not conflict with the TRUE and FALSE macros provided by the Windows and MacOS SDKs. 2021-07-17 00:11:40 -07:00
Xiaoxi Wang 501dc339a9 relax perpetual wiggle pause condition; add trace log; correct perpetual wiggle priority setting 2021-07-12 05:46:55 +00:00
sfc-gh-tclinkenbeard 8a212862f0 Prevent dataDistributor from modifying ServerDBInfo object 2021-07-11 22:04:54 -07:00
sfc-gh-tclinkenbeard 79ff07a071 Added *BOOLEAN_PARAM macros to enforce documentation of boolean parameters 2021-07-02 15:04:42 -07:00
Neethu Haneesha Bingi 73752f441b exclude locality:clang-format, ranged loops, documentation, tracking addStoragesever for exclusion. 2021-06-23 18:03:27 -07:00
Neethu Haneesha Bingi 62355571d0 exclude servers based on locality match 2021-06-23 18:03:27 -07:00
Xiaoxi Wang 7b713f7fd2 add knob 2021-06-23 05:49:55 +00:00
Xiaoxi Wang f2daf20927 TEST condition 2021-06-21 06:56:03 +00:00
Xiaoxi Wang 0493d149e6 wait remove 2021-06-21 05:18:42 +00:00
Xiaoxi Wang 783520ce85 add and remove some healthy check to solve cluster status oscillation when #ss is little; simplify some code 2021-06-19 16:57:04 +00:00
Xiaoxi Wang 647138145d adjust default value of stopWiggleSignal; better trace logic 2021-06-17 20:59:47 +00:00
Xiaoxi Wang fdd9c30794 code refactor;change stopSignal; 2021-06-16 05:30:58 +00:00
Xiaoxi Wang d33e43fd2b code format 2021-06-14 23:00:02 +00:00
Xiaoxi Wang 2cd4e6d62f check healthy team count, dd queue and disk space;
code refactor
2021-06-14 22:09:45 +00:00
Xiaoxi Wang d46fccc30f Revert "Revert "Properly set simulation test for perpetual storage wiggle and bug fixing""
This reverts commit ad576e8c20.
2021-06-11 22:58:05 +00:00
Xiaoxi Wang ad576e8c20
Revert "Properly set simulation test for perpetual storage wiggle and bug fixing" 2021-06-11 09:07:45 -07:00
Xiaoxi Wang 17ac91bac4
Merge pull request #4929 from sfc-gh-xwang/ppwtest
Properly set simulation test for perpetual storage wiggle and bug fixing
2021-06-10 14:09:50 -07:00
Xiaoxi Wang cd58c0c149 add useful trace; add invalid wiggling server check 2021-06-10 06:50:44 +00:00
Xiaoxi Wang 4220a548ce use the same health check as exclude to avoid 'best team get stuck' 2021-06-09 22:51:46 +00:00
Xiaoxi Wang 51b4cb89c2 fix server_status bug 2021-06-08 23:47:59 +00:00
Xiaoxi Wang 45ebdb1a9d fix perpetual wiggle bug caused by multiple DCs and removeStorageServer 2021-06-08 23:33:25 +00:00
Xiaoxi Wang 6ab0ea3d0f properly set perpetual_storage_wiggle value during tests 2021-06-07 17:55:20 +00:00
sfc-gh-tclinkenbeard 371a38e6e5 Merge remote-tracking branch 'origin/master' into remove-extra-copies 2021-06-07 10:26:06 -07:00
Xiaoxi Wang 838d847d4e
Merge pull request #4860 from sfc-gh-xwang/ppwtest
implement perpetual storage wiggling feature
2021-06-04 16:18:39 -07:00
Xiaoxi Wang 5be65fab5e add comment 2021-06-04 18:40:18 +00:00
Xiaoxi Wang e0981d6732 add code coverage mark 2021-06-03 19:58:28 +00:00
Xiaoxi Wang 351325b3af comment modification; wait perpetual wiggling close 2021-06-03 05:13:20 +00:00
Xiaoxi Wang 21e175b16c add comments for new actors 2021-06-02 18:49:01 +00:00
Xiaoxi Wang 944c9ad8d9 fix memory bug 2021-06-02 17:53:44 +00:00
Josh Slocum b3e4f182ef TSS Mapping Change 2021-06-02 17:30:09 +00:00
Xiaoxi Wang 9684d78a6e solve recruiting conflict with TSS 2021-06-02 06:12:45 +00:00
Xiaoxi Wang 8b9c8b33fc manually merge with master 2021-06-01 17:51:42 +00:00
Xiaoxi Wang ce308edc5e fix wiggler logic bug 2021-05-26 21:57:58 +00:00
Josh Slocum 4257ac2b4d More TSS Changes/Fixes 2021-05-25 20:37:48 +00:00
Josh Slocum ce82c9653e Testing Storage Server implementation 2021-05-25 20:28:50 +00:00
Xiaoxi Wang e9a23840ea fix promise bug 2021-05-25 20:25:21 +00:00
Xiaoxi Wang f11b7ffa5f merge master, fix promise callback bug 2021-05-25 18:43:08 +00:00
Xiaoxi Wang 7bc55448aa fix iterator bug 2021-05-24 19:11:28 +00:00
Xiaoxi Wang 85cd2b9945 add perpetualStorageWiggler 2021-05-20 23:31:08 +00:00
Xiaoxi Wang 3f3a81b3d9 add pid2server_info to maintain Process id set 2021-05-20 03:32:15 +00:00
sfc-gh-tclinkenbeard f28ac955c3 Remove unnecessary temporary objects while growing objects of type std::vector<std::pair<A, B>> 2021-05-10 16:32:50 -07:00
sfc-gh-tclinkenbeard 5c2d7b6080 Create RangeResult type alias 2021-05-03 13:14:16 -07:00
Trevor Clinkenbeard 0db28f6ea0
Merge pull request #4535 from jzhou77/fix-dd
Fix DD Assertion failed in canBeSet
2021-03-24 10:50:04 -07:00
Jingyu Zhou 0c3bc09524 Remove the shuttingDown flag 2021-03-21 20:12:37 -07:00
Jingyu Zhou cb26576b95 Fix DD assertion failure
This fixes #4493, where DDTeamCollection::~DDTeamCollection creates new teams
that hold pointer to the DDTeamCollection, thus later causes assertion failure
because the memory is invalid.

The fix is to cancel teamBuilder at the begining of the ~DDTeamCollection.
2021-03-21 19:54:44 -07:00
Evan Tschannen d2f9bf7eb6 added comments and fixed style 2021-03-16 15:44:49 -07:00
Evan Tschannen edefcff3ac do not kill the data distributor after removing a failed server, completely remove the failed server 2021-03-15 16:48:08 -07:00
Evan Tschannen c0a1362478 fixed a bug where DD was shutdown while still in a callback from trackExcludedServers 2021-03-15 16:26:57 -07:00
Evan Tschannen c570a7b718 added trace events 2021-03-15 15:55:02 -07:00
Evan Tschannen 403d933329 fixed trace event 2021-03-15 10:51:53 -07:00
Evan Tschannen 831224df99 fixed complier error 2021-03-15 10:48:48 -07:00
Evan Tschannen 4e4149b070 exclude failed shuts down data distribution while the server is being removed to avoid two processes making changes to the key servers at the same time 2021-03-15 10:43:06 -07:00
FDB Formatster df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
Vishesh Yadav 2bb4f2e59f Merge branch 'release-6.3-pre-format' into master-format
This merges release-6.3 branch right before it was fully formatted.
There were quite a few conflicts that are resolved here. CoroFlow had
a check for OOM errors introduced in 6.3, but didn't seem applicable in
the new implmentation which seems to use boost.
2021-03-10 09:37:41 -08:00
Chaoguang Lin 9645f489e6 Fix base trace event name inconsistency 2021-03-08 15:20:50 -08:00
Zhe Wu 59181245c1 Change SSVersionDiffLarge event log level to warning 2021-03-03 23:33:48 -08:00
Andrew Noyes 79cec09255 Apply clang-tidy's performance-inefficient-vector-operation fix
I ran this command in my build directory after compiling with
OPEN_FOR_IDE. It took a few small tweaks to get it to compile, which is
outside the scope of this commit.

    $ python run-clang-tidy.py -j $(nproc) -checks='-*,performance-inefficient-vector-operation' -fix
2021-03-04 03:58:25 +00:00
Evan Tschannen 346a4e3ecd Merge branch 'release-6.3'
# Conflicts:
#	fdbcli/fdbcli.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/MultiInterface.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2021-03-01 18:52:06 -08:00
sfc-gh-tclinkenbeard 32486a2785 Reenable tlog pops in ddSnapCreateCore even if some disable requests fail
If some tlogs successfully disable pops but others fail to, we do not
want to wait TLOG_IGNORE_POP_AUTO_ENABLE_DELAY seconds before reenabling
pops
2021-02-20 18:24:21 -08:00
Russell Sears 8025cc4571 Merge remote-tracking branch 'upstream/release-6.3' into merge-6.3-to-master-1-26-21 2021-01-26 19:46:47 +00:00
Andrew Noyes 0ef44739ea Fix OPEN_FOR_IDE build in preparation for using clang-tidy 2021-01-26 02:04:11 +00:00
Xin Dong 0cde3cc48f Make trace event happy 2021-01-25 14:04:16 -08:00
Xin Dong dce8af5b63 Change back to SevWarnAlways for now. 2021-01-25 13:54:24 -08:00
sfc-gh-tclinkenbeard bdf58d0b2f Add assertion to DDTeamCollection::overlappingMembers 2021-01-21 14:39:39 -08:00
sfc-gh-tclinkenbeard ad99bf0471 Merge remote-tracking branch 'origin' into misc-changes 2021-01-21 10:03:07 -08:00
Xin Dong 83506cda87 Log SevError instead of SevWarnAlways when all replicas of some data are lost. 2021-01-15 15:00:48 -08:00
Andrew Noyes ff7d306b09 Merge branch 'release-6.3' into anoyes/merge-6.3-to-master
Include conflict markers for now. Will resolve.
2021-01-15 18:04:09 +00:00
sfc-gh-tclinkenbeard 5b2e88b187 Use structured bindings in for loops 2020-12-27 01:46:20 -04:00
sfc-gh-tclinkenbeard 0d4e81e6b4 Use unique_ptr in DataDistribution.actor.cpp 2020-12-26 23:40:54 -04:00
sfc-gh-tclinkenbeard 19816ccdbf Improve DataDistribution const-correctness 2020-12-26 20:22:27 -04:00
sfc-gh-tclinkenbeard 26a4884eef Mark TCMachineTeamInfo::size const 2020-12-26 19:23:01 -04:00
Jingyu Zhou bbb56e4089 Merge branch 'release-6.2' of https://github.com/apple/foundationdb into release-6.3 2020-12-23 14:26:59 -08:00
Andrew Noyes 877997632d Merge branch 'release-6.3' into anoyes/merge-release-6.3-master
Include conflict markers for review purposes
2020-12-04 01:38:07 +00:00
Xin Dong 78503db523 Reset and retry transaction errors 2020-12-03 14:42:30 -08:00
Xin Dong ac02329d7d Added a command in fdbcli to allow user to manually trigger the detailed teams info loggings in data distributor 2020-12-03 14:42:30 -08:00
Andrew Noyes b8a9807336 Move trackerCancelled higher in catch block 2020-11-24 20:34:06 +00:00
Andrew Noyes dc2bac5670 Resolve conflicts 2020-11-24 19:09:42 +00:00
Andrew Noyes 1f541f02be Merge branch 'anoyes/merge-6.2-to-6.3' into anoyes/release-6.3-merge
Merge, leaving conflict markers for now
2020-11-24 16:55:34 +00:00
David Youngworth fc9b78737f Fix some merge bugs 2020-11-17 14:53:02 -08:00
David Youngworth d64cf8b9e3 Merge branch 6.3 into master 2020-11-17 11:22:45 -08:00
David Youngworth 489ba20641 Fix several merge issues 2020-11-16 14:46:36 -08:00
David Youngworth d0391db862 Merge branch 'release-6.2' into release-6.3 2020-11-16 10:15:23 -08:00
sfc-gh-tclinkenbeard ca8ea3b6ff Fix memory issues caused by cancelling data distribution tracker 2020-11-15 23:52:36 -08:00
Markus Pilman 1343f40117 don't allow empty coments 2020-11-11 14:07:54 -07:00
Markus Pilman bdd3dbfa7d remove duplicates 2020-11-10 14:01:07 -07:00
sfc-gh-tclinkenbeard 4669f837fa Add uses of makeReference 2020-11-07 22:10:18 -08:00
Jon Fu 3ae611d668 Merge branch 'master' of https://github.com/apple/foundationdb into jfu-pause-backup-snapshot 2020-11-04 14:26:49 -05:00
Jon Fu bda72d9a3d first draft at changing snapshot backup behaviour 2020-11-02 17:12:30 -05:00
sfc-gh-tclinkenbeard cf4c8e375f Merge remote-tracking branch 'origin/release-6.3' into merge 2020-10-29 22:15:41 -07:00
Steve Atherton 99c1880a83 Merge commit 'f9581de2005e6b085776e81b9fcaa16442b32589' into merge-6.2-to-6.3
# Conflicts:
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
2020-10-27 12:21:26 -07:00
Xin Dong be7944773f Fix a typo 2020-10-26 16:44:52 -07:00
Xin Dong 9ef29d0cea Changed getTeamID() to return a string instead of UID as suggested by reviews. 2020-10-26 16:44:52 -07:00
Xin Dong bec2cfb167 Fix typos. 2020-10-26 16:44:52 -07:00
Xin Dong 0bc51bb780 Resolve review comments 2020-10-26 16:44:50 -07:00
Xin Dong 7ebb2e5c09 Piggy back this PR to polish more TraceEvent by:
- Making it clear that it's tracking machine team info or server team info
- Added ID to both machine team and server team for better trackability
- Attach distributor id to some trace events.
2020-10-26 16:44:09 -07:00
Xin Dong c037bfd001 Added detailed logging when there is no servers left in a server team, because that may indicate a data loss incident. 2020-10-26 16:44:07 -07:00
Xin Dong 6395b76d8c Address more review comments 2020-10-23 15:29:08 -07:00
Xin Dong f757cae786 Address review comments 2020-10-23 14:01:53 -07:00
sfc-gh-tclinkenbeard e0b1f95740 Merge remote-tracking branch 'origin/master' into remove-global-ddenabled-flag 2020-10-21 18:22:08 -07:00
Young Liu 8cc3e4d3c6 Merge release-6.3 into master 2020-10-19 22:51:56 -07:00
Jingyu Zhou 44c62b2d51
Merge pull request #3922 from jzhou77/release-6.3
Merge Release 6.2 to Release 6.3
2020-10-19 14:38:36 -07:00
sfc-gh-tclinkenbeard 652d753daf Remove global ddEnabled flag 2020-10-17 11:23:52 -07:00
Andrew Noyes 70c1ac2131 Use TraceEvent::error 2020-10-16 16:55:09 -07:00
Xin Dong 8d0aa02a63 Do not periodically print detailed DD teams info 2020-10-16 16:11:14 -07:00
Andrew Noyes 9dec4bc46a Add ErrorCode to StorageServerTrackerCancelled trace event 2020-10-16 15:40:14 -07:00
Jingyu Zhou 8f17a1a5d6 Merge branch 'release-6.2' into release-6.3 2020-10-16 15:25:39 -07:00
Andrew Noyes 30488df5ea Fix build 2020-10-16 14:26:40 -07:00
Andrew Noyes 81193a9226 Move TraceEvent upward 2020-10-16 14:23:27 -07:00
Andrew Noyes 1e0e800751 Fix build 2020-10-16 12:10:07 -07:00
Andrew Noyes 2b87627d1b Check for cancellation after errorOut.sendError(e) 2020-10-16 12:10:07 -07:00
Xin Dong 92e31dd338 Address review comments 2020-10-15 15:25:00 -07:00
Xin Dong 1d43729cc9 Added a way to print detailed information about team collection for debugging. 2020-10-15 10:01:56 -07:00
Andrew Noyes a1e868a569
Merge pull request #3862 from sfc-gh-tclinkenbeard/use-override-more
Add uses of override keyword, remove unnecessary uses of virtual
2020-10-14 15:06:45 -07:00
A.J. Beamon 3b66a1f2d4 Fix a couple places where we were creating vectors with default elements rather than reserving space. 2020-10-09 10:51:06 -07:00
sfc-gh-tclinkenbeard a9607bdcec Explicitly seal classes that inherit but aren't inherited from 2020-10-07 21:58:24 -07:00
sfc-gh-tclinkenbeard 8571dcfe28 Use override where applicable in fdbserver 2020-10-07 18:41:19 -07:00
Jon Fu b4ad989252 use stack transaction instead of heap 2020-10-05 16:51:01 -04:00
Evan Tschannen 52a6496a54 fix compiler errors 2020-10-04 16:50:54 -07:00
sfc-gh-tclinkenbeard 91a8367acb Avoid slow task in ~DataDistributionTracker 2020-10-01 11:44:55 -07:00
Jon Fu 69580593dd Merge branch 'master' of https://github.com/apple/foundationdb into jfu-snapshot-record-version 2020-09-23 15:35:05 -04:00
sfc-gh-tclinkenbeard 0814841827 Replace NULL with nullptr in fdbserver 2020-09-20 11:31:49 -07:00
Jon Fu 260c8d9568 Merge branch 'master' of https://github.com/apple/foundationdb into jfu-snapshot-record-version 2020-09-11 15:05:58 -04:00
Evan Tschannen ae7bf24353
Merge pull request #3549 from yliucode/grv-proxy
Separate out a new role GrvProxy to serve GRVs.
2020-09-03 19:03:45 -07:00
Young Liu 87693cae81 merge master branch and resolve conflicts 2020-09-02 13:44:33 -07:00
A.J. Beamon b4c96cadc7 Merge branch 'release-6.3' into merge-release-6.3-into-master 2020-09-02 12:45:57 -07:00
Jon Fu d334b6484e attempt to write to system keys with snapshot 2020-09-02 15:17:54 -04:00
Evan Tschannen 0443ea7a9b fix: prioritize marking a region as fully replicated over removing machine teams 2020-09-01 15:55:33 -07:00
Evan Tschannen 12edadd059 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	fdbclient/Knobs.cpp
#	fdbclient/MasterProxyInterface.h
#	fdbrpc/simulator.h
#	fdbserver/MasterProxyServer.actor.cpp
#	tests/fast/CycleAndLock.txt
#	tests/fast/TxnStateStoreCycleTest.txt
#	tests/fast/VersionStamp.txt
#	tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
#	tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
#	versions.target
2020-08-31 19:33:34 -07:00
Young Liu 8994719e46 Merge branch 'master' into grv-proxy 2020-08-31 10:21:32 -07:00
Young Liu e87327b33b Merge master branch and keep master proxy reporting txn cost estimation to ratekeeper 2020-08-29 12:47:35 -07:00
Meng Xu ca9b1f5b34 Merge branch 'release-6.3' into mengxu/fr-sched-PR
Resolve conflict at BackupContainer.actor.cpp
2020-08-27 16:54:00 -07:00
sfc-gh-tclinkenbeard c3991262cf Add nullptr check to traceAllInfo 2020-08-27 09:40:42 -07:00
Young Liu 63b3612ad5 Merge master branch and resolve conflicts 2020-08-24 16:42:31 -07:00
Xiaoxi Wang 3afdb44c7a merge master 2020-08-23 17:09:04 +00:00
David Youngworth e1b7dd0c7d Merge remote-tracking branch 'upstream/release-6.3' into dyoungworth/fixMerge1 2020-08-22 12:25:19 -07:00
Xiaoxi Wang 3b63d8b01b remove FIXME; remote tagSet.reset(); trivial changes 2020-08-21 19:17:16 +00:00
A.J. Beamon f864606d8d Don't block the data distributor when getting a GetDataDistributorMetricsRequest. 2020-08-21 18:16:07 +00:00
A.J. Beamon 6380b92b10 Don't block the data distributor when getting a GetDataDistributorMetricsRequest. 2020-08-21 09:26:18 -07:00
Xiaoxi Wang 599675cba8 modify some details to get better performance 2020-08-19 04:23:23 +00:00
Meng Xu 1e571a5a1a FastRestore:Loader:Kick off scheduler when loader starts to have new requests 2020-08-15 21:57:00 -07:00
Xiaoxi Wang f3ecf14601 change midShardSize type and other details 2020-08-12 17:49:12 +00:00
Young Liu 79ce16650d merge master branch 2020-08-11 19:22:10 -07:00
Xiaoxi Wang 0cceda9908 solve distributor present bug 2020-08-11 21:54:52 +00:00
Meng Xu 97e49f2f70 Resolve throttling events 2020-08-10 22:01:12 -07:00
Xiaoxi Wang 696e77c94e query midShardSize from proxy 2020-08-10 20:13:44 +00:00
Xiaoxi Wang df9149fea4 ignore transaction tag of immediate transactions 2020-08-07 23:36:17 +00:00
Xiaoxi Wang 13307679c5 use median shard size" 2020-08-05 03:57:25 +00:00
Xiaoxi Wang b903e60cb7 fix monitorDDMetricsChanges bugs 2020-08-03 17:12:36 +00:00
Xiaoxi Wang d1cc87452c merge with master; solve conflicts; solve initialization; 2020-08-02 22:44:07 +00:00
Xiaoxi Wang c3a629588f add client transaction tag sample 2020-07-31 19:08:42 +00:00
Young Liu 30ea639666 Remove debug traces 2020-07-29 07:55:05 -07:00
Young Liu f7b76a92af pass joshua 2020-07-29 07:26:55 -07:00
Evan Tschannen a49cb41de7 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	cmake/ConfigureCompiler.cmake
#	fdbserver/Knobs.cpp
#	fdbserver/StorageCache.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/ThreadHelper.actor.h
#	flow/serialize.h
#	tests/CMakeLists.txt
2020-07-29 00:31:55 -07:00
Xiaoxi Wang 819e3ab3e8 Merge branch 'ratekeeper' 2020-07-28 16:48:50 +00:00
Xiaoxi Wang 48a0fb5154 ask DD for shard info 2020-07-25 04:08:12 +00:00
Young Liu 525f10e30c Merge master branch 2020-07-22 16:08:49 -07:00
Russell Sears ab0d8b0626
Merge pull request #3509 from sfc-gh-anoyes/anoyes/remove-using-relops
Remove using namespace std::rel_ops
2020-07-22 11:58:25 -07:00
sfc-gh-tclinkenbeard 638f586f78 Remove unnecessary override 2020-07-21 11:05:46 -07:00
sfc-gh-tclinkenbeard 83c5a30f62 Add encapsulation to TCTeamInfo and ParallelTCInfo 2020-07-21 11:05:41 -07:00
sfc-gh-tclinkenbeard 9a2ce4c981 Make IDataDistributionTeam const-correct 2020-07-21 11:05:34 -07:00
Meng Xu b2a3b4fd83 Merge branch 'master' into mengxu/merge-6.3-PR 2020-07-20 11:34:18 -07:00
Meng Xu 1ba9b6b07f DD:Change SendRelocateToDDQx100 to SendRelocateToDDQueue 2020-07-17 14:10:17 -07:00
Meng Xu 098cdfb558 Replace actor_cancelled error with dd_cancelled 2020-07-16 20:26:07 -07:00
Meng Xu ba3c631350 Remove spammy trace 2020-07-16 10:33:24 -07:00
Meng Xu 638e612a97 Improve coding style and trace events 2020-07-16 10:25:42 -07:00
Meng Xu acbb389862 Debug and fix very rare crash in TeamTracker
teamTracker only works when all DDTeamCollections are valid.
However, teamTracker can be triggered by zeroTeamSignalling event
after a DDTeamCollection is destructed and the other DDTeamCollection has not been
destructed yet.

This causes teamTracker to uses a pointer to the destructed DDTeamCollection and thus
has mysterious failure.
2020-07-16 10:23:02 -07:00
Young Liu 5b06d69d25 Pass watches test 2020-07-15 00:37:41 -07:00
Meng Xu 47ae66bd61 Merge branch 'master' into mengxu/tmp-minor-comment-PR
Resolve conflict at waitFailureClient
2020-07-13 16:17:50 -07:00
Meng Xu ef8c1060a2 Merge branch 'master' into mengxu/tmp-merge-6.3 2020-07-13 10:15:56 -07:00
Meng Xu 6f2e12be42 Minor improvement on comments 2020-07-12 18:32:47 -07:00
Andrew Noyes f470ba8316 Remove using namespace std::rel_ops
This causes the following to not compile anymore

\#include <utility>
\#include <vector>

using namespace std::rel_ops;

int main() {
    std::vector<int> xs;
    return xs.rbegin() != xs.rend();
}

See https://godbolt.org/z/s1977n
2020-07-10 22:58:15 +00:00
A.J. Beamon b09dddc07e Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/ConfigureCompiler.cmake
#	documentation/sphinx/source/downloads.rst
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/fdbrpc.vcxproj
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen 5e02fd490e fix: the check for if a teamCollection was tracking a source server was unreliable, leading to scenarios where we would temporarily replicate a shard less than teamSIze 2020-06-29 10:02:27 -07:00
negoyal cf13e00a8f Merge remote-tracking branch 'origin/release-6.3' into fdb_cache_wo_allocator 2020-06-01 17:38:31 -07:00
Chaoguang Lin 6ce574f5ad Merge remote-tracking branch 'upstream/release-6.3' into add-data-distribution-metrics 2020-05-17 23:36:52 -07:00
Markus Pilman c2bc75516f Merge branch 'release-6.3' of github.com:apple/foundationdb into features/trace-roles 2020-05-14 10:34:53 -07:00
Evan Tschannen 48b1b20f67 Fixed a crash related to destruction order in data distribution 2020-05-10 23:14:19 -07:00
Chaoguang Lin ef724bf939 Merge remote-tracking branch 'upstream/master' into add-data-distribution-metrics 2020-05-08 18:39:28 -07:00
chaoguang e8b62e48f4 Rename DDMetrics to DDMetricsRef 2020-05-08 17:17:27 -07:00
Markus Pilman 5f9b127e56 Emit traces regularly about role assignment
We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.

We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
2020-05-08 16:27:57 -07:00
negoyal f4d30f8dce Fix the compilation error. 2020-05-06 19:09:40 -07:00
Markus Pilman 94570ea590 Data distributor now waitfails caches 2020-05-06 10:35:56 -07:00
Evan Tschannen aed2d34bcb Merge branch 'master' into feature-proxy-load-balance
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	flow/Knobs.cpp
2020-05-01 09:19:39 -07:00
Evan Tschannen b7f5f3be48 merge in master 2020-04-28 13:11:47 -07:00
Evan Tschannen 0c84ad4bc6
Merge pull request #2917 from bnamasivayam/fail-slow-ss
Mark the storage servers that are continually lagging as unhealthy
2020-04-22 23:18:35 -07:00
Balachandar Namasivayam d5bef6fc32
Update fdbserver/DataDistribution.actor.cpp 2020-04-22 09:45:56 -07:00
Evan Tschannen ba3e2af473 Merge commit '5288033bcfe40c3ade97c8bf2d04cf31b3f16cb1' into feature-tree-broadcast 2020-04-17 15:17:37 -07:00
Evan Tschannen 33efb9ec97 code cleanup based on review comments 2020-04-17 15:05:01 -07:00
Alex Miller 1439de37b5 Convert GetRangeLimits() -> TOO_MANY + ASSERT(). 2020-04-12 18:23:14 -07:00
Evan Tschannen ce4493f679 many bug fixes 2020-04-10 13:45:16 -07:00
Balachandar Namasivayam 6916434f7d Addressed review comments 2020-04-08 10:48:32 -07:00
Balachandar Namasivayam 69ef8a127b Add a backstop mechanism to stop failing too many storage servers when they fall behind. 2020-04-06 23:37:11 -07:00
Alex Miller 6078fd1b18 Convert UID to Tag in keyServers to reduce txnStateStore size 2020-04-05 14:30:09 -07:00
Balachandar Namasivayam 73272fc72e Version difference is now the diff between TLog versions and SS version. 2020-04-03 19:04:43 -07:00
Balachandar Namasivayam a70bfcc3c8 Remove unnecessary comment. 2020-03-31 18:33:12 -07:00
Balachandar Namasivayam b1c3893d40 Fix some corner case bugs exposed by simulation.
In one case, when a SS joins the cluster and DD doesn't find any healthy server to form a team with the newly added server, then the SS does not get added to any team even when the other servers get healthy.
Another is an extreme case where a data center is down, and a SS in the active DC joins and then dies immediately but not before DD adds it to a destination team for a relocating shard which will result in DD waiting indefinitely for the dead data center to come back up for the cluster to be fully recovered.
2020-03-31 18:33:12 -07:00
Balachandar Namasivayam ad1dd4fd9b Mark the storage servers that are continually lagging as unhealthy and so this will give the Data Distributor the chance to move data out of this server. 2020-03-31 18:25:39 -07:00
tclinken 247ab84323 Merge branch 'master' of https://github.com/apple/foundationdb into add-data-distribution-metrics 2020-03-23 17:01:17 -07:00
Evan Tschannen e08f0201f1 merge release 6.2 into master 2020-03-17 12:51:47 -07:00
Evan Tschannen 7adc916e18
Merge pull request #2806 from ajbeamon/improve-team-request-performance
Improve performance of get team requests.
2020-03-16 11:56:45 -07:00
A.J. Beamon 700b13e5f8 Remember the best team from team requests, which will likely be the best again and can save us some computation. 2020-03-13 15:21:33 -07:00
Evan Tschannen 12f2b32770 added additional logging in data distribution 2020-03-13 15:19:33 -07:00
Evan Tschannen 9e99a00c8f fix: do not use priority 0 left when calculating priorities for empty teams 2020-03-13 13:56:46 -07:00
A.J. Beamon 555db50cd1 Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist. 2020-03-12 11:22:03 -07:00
Evan Tschannen 303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen 1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
Evan Tschannen e219c1671f Merge branch 'release-6.2' into feature-dd-region-queue
# Conflicts:
#	fdbserver/Knobs.h
2020-03-04 16:25:38 -08:00
Evan Tschannen 125bd13198 fix: in multi-region configurations, the data distribution queue could start too much work, expecting that the remote region would contribute to the read workload 2020-03-04 14:17:17 -08:00
Evan Tschannen 6296465e07 Make the DD priority associated with populating a remote region lower than machine failures 2020-03-04 14:07:32 -08:00
Evan Tschannen 924d335aa7 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	flow/Knobs.cpp
#	flow/Knobs.h
2020-02-25 18:25:19 -08:00
Evan Tschannen c05c95cbe8 forgot to rename the knob 2020-02-25 15:47:39 -08:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen aa4d1357b3 handle the case that there is only one healthy team 2020-02-21 15:41:01 -08:00
Evan Tschannen 457dbc5215
Update fdbserver/DataDistribution.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-21 15:39:17 -08:00
Evan Tschannen 6a634652c4
Update fdbserver/DataDistribution.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-21 15:39:06 -08:00
Evan Tschannen 08914a2acd Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team 2020-02-21 15:14:32 -08:00
Evan Tschannen 819c55556c More aggressively attempt to find teams that do not have low disk space 2020-02-20 16:47:50 -08:00
A.J. Beamon e1fb568fd1 Merge branch 'release-6.2' into dd-use-available-space
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon e4b483796d Combine some logic that was doing similar computations for free space ratio. 2020-02-20 14:52:08 -08:00
A.J. Beamon 4c9c736253 Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them. 2020-02-20 11:21:03 -08:00
A.J. Beamon 3a1ba5a077 Rename variable for clarity 2020-02-20 10:59:52 -08:00
A.J. Beamon c164acb88d Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck. 2020-02-20 09:32:00 -08:00
mpilman 5a9d420cb7 Merge remote-tracking branch 'upstream/release-6.2' into release-merges/20200210 2020-02-10 10:02:05 -08:00
A.J. Beamon b8a252da40 Clarify the names of a couple trace fields 2020-02-10 08:15:00 -08:00
tclinken c9363e7e28 Merge branch 'master' of https://github.com/apple/foundationdb into add-data-distribution-metrics 2020-01-22 21:02:21 -08:00
Evan Tschannen 3f9d9d8b84 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	cmake/FlowCommands.cmake
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/StorageServerInterface.h
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Knobs.h
#	flow/Platform.cpp
#	versions.target
2020-01-16 18:37:47 -08:00
tclinken 1d6ac716a1 Merge remote-tracking branch 'origin' into add-data-distribution-metrics 2020-01-15 13:20:04 -08:00
Evan Tschannen 9b80498180 Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth 2020-01-10 14:58:38 -08:00
Evan Tschannen c2608f0af9 fix: completeSources could be larger than the teamSize, so we need to check all completeSources
we do not need to track bestSize, since all teams in the list will be the same size
2020-01-10 14:46:40 -08:00
Evan Tschannen ab7071932f Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly 2020-01-09 16:59:37 -08:00
Evan Tschannen 83ad9caf54 implemented a load balancing algorithm which evens out the number of requests processes by each proxy 2020-01-08 01:59:01 -08:00
Evan Tschannen 59738e8ef1 fixed compiler error 2019-11-22 16:19:34 -08:00
Evan Tschannen 3c769fcf60 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2019-11-22 15:39:19 -08:00
Evan Tschannen 3a3ab5664b fix: team trackers for bad teams that contain a removed servers must be cancelled or the cluster will falsely report those teams as failed 2019-11-22 10:20:13 -08:00
Andrew Noyes d4de608bb6 Fix OPEN_FOR_IDE build 2019-10-25 10:42:22 -07:00
Evan Tschannen f8e44d2f71 fix: If a storage server was offline, it would not be checked for being in an undesired dc 2019-10-23 23:04:39 -07:00
Jon Fu d2b6626d5c Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-21 13:47:06 -07:00
Evan Tschannen 688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Jon Fu b1fd6b4443 addressed review comments 2019-10-18 09:43:25 -07:00
Jon Fu 896701006f addressed code review changes 2019-10-16 11:30:20 -07:00
Evan Tschannen 86bcb84b45 Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards 2019-10-11 17:50:43 -07:00
Jon Fu 34baa37e60 Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-10 10:14:58 -07:00
Meng Xu 1bd6151f54
Update fdbserver/DataDistribution.actor.cpp
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-10-09 21:17:03 -07:00
Meng Xu 26e1d565f6 StorageServerTracker:Fix OOM bug caused by server healthyness toggles infinitely
When there is only one healthy team, the bug will set a server's status as unhealthy;
which causes the healthyTeam to 0, triggering StorageServerTracker to loop back;
which resets the server's status to healthy, and thus the healthyTeam to non-zero.

This pattern will cause infinite loop.

Infinite loop will prevent TraceEvent from flushing, which causes
TraceEvent to use most of memory and out-of-memory.

Kudos to JingYu Zhou (jingyu_zhou@apple.com) who is the main contributor who found the bug!
2019-10-09 17:45:09 -07:00
Jon Fu eb41e32876 add extra dd safety check to deny exclude if only 1 team exists 2019-10-08 16:10:09 -07:00