sfc-gh-tclinkenbeard
c74047c665
Merge remote-tracking branch 'origin/master' into fix-more-clang-warnings
2021-07-28 11:51:02 -07:00
Steve Atherton
507c1f11e3
Add .log() to bare TraceEvent() invocations without any .detail()s to avoid clang-tidy warning about immediate destruction of object without use.
2021-07-26 19:55:10 -07:00
sfc-gh-tclinkenbeard
a27d7c86f4
Fix more -Wreorder-ctor warnings in DataDistribution.actor.cpp
2021-07-24 22:14:43 -07:00
sfc-gh-tclinkenbeard
da50e13f3e
Fix more -Wreorder-ctor warnings in DataDistribution.actor.cpp, OldTLogServer_4_6.actor.cpp, and Net2.actor.cpp
2021-07-24 17:33:11 -07:00
sfc-gh-tclinkenbeard
6f81155784
Merge remote-tracking branch 'origin/master' into const-serverdbinfo
2021-07-20 10:18:40 -07:00
Steve Atherton
f596a81073
Rename ::TRUE and ::FALSE in BooleanParams to ::True and ::False so as to not conflict with the TRUE and FALSE macros provided by the Windows and MacOS SDKs.
2021-07-17 00:11:40 -07:00
Xiaoxi Wang
501dc339a9
relax perpetual wiggle pause condition; add trace log; correct perpetual wiggle priority setting
2021-07-12 05:46:55 +00:00
sfc-gh-tclinkenbeard
8a212862f0
Prevent dataDistributor from modifying ServerDBInfo object
2021-07-11 22:04:54 -07:00
sfc-gh-tclinkenbeard
79ff07a071
Added *BOOLEAN_PARAM macros to enforce documentation of boolean parameters
2021-07-02 15:04:42 -07:00
Neethu Haneesha Bingi
73752f441b
exclude locality:clang-format, ranged loops, documentation, tracking addStoragesever for exclusion.
2021-06-23 18:03:27 -07:00
Neethu Haneesha Bingi
62355571d0
exclude servers based on locality match
2021-06-23 18:03:27 -07:00
Xiaoxi Wang
7b713f7fd2
add knob
2021-06-23 05:49:55 +00:00
Xiaoxi Wang
f2daf20927
TEST condition
2021-06-21 06:56:03 +00:00
Xiaoxi Wang
0493d149e6
wait remove
2021-06-21 05:18:42 +00:00
Xiaoxi Wang
783520ce85
add and remove some healthy check to solve cluster status oscillation when #ss is little; simplify some code
2021-06-19 16:57:04 +00:00
Xiaoxi Wang
647138145d
adjust default value of stopWiggleSignal; better trace logic
2021-06-17 20:59:47 +00:00
Xiaoxi Wang
fdd9c30794
code refactor;change stopSignal;
2021-06-16 05:30:58 +00:00
Xiaoxi Wang
d33e43fd2b
code format
2021-06-14 23:00:02 +00:00
Xiaoxi Wang
2cd4e6d62f
check healthy team count, dd queue and disk space;
...
code refactor
2021-06-14 22:09:45 +00:00
Xiaoxi Wang
d46fccc30f
Revert "Revert "Properly set simulation test for perpetual storage wiggle and bug fixing""
...
This reverts commit ad576e8c20
.
2021-06-11 22:58:05 +00:00
Xiaoxi Wang
ad576e8c20
Revert "Properly set simulation test for perpetual storage wiggle and bug fixing"
2021-06-11 09:07:45 -07:00
Xiaoxi Wang
17ac91bac4
Merge pull request #4929 from sfc-gh-xwang/ppwtest
...
Properly set simulation test for perpetual storage wiggle and bug fixing
2021-06-10 14:09:50 -07:00
Xiaoxi Wang
cd58c0c149
add useful trace; add invalid wiggling server check
2021-06-10 06:50:44 +00:00
Xiaoxi Wang
4220a548ce
use the same health check as exclude to avoid 'best team get stuck'
2021-06-09 22:51:46 +00:00
Xiaoxi Wang
51b4cb89c2
fix server_status bug
2021-06-08 23:47:59 +00:00
Xiaoxi Wang
45ebdb1a9d
fix perpetual wiggle bug caused by multiple DCs and removeStorageServer
2021-06-08 23:33:25 +00:00
Xiaoxi Wang
6ab0ea3d0f
properly set perpetual_storage_wiggle value during tests
2021-06-07 17:55:20 +00:00
sfc-gh-tclinkenbeard
371a38e6e5
Merge remote-tracking branch 'origin/master' into remove-extra-copies
2021-06-07 10:26:06 -07:00
Xiaoxi Wang
838d847d4e
Merge pull request #4860 from sfc-gh-xwang/ppwtest
...
implement perpetual storage wiggling feature
2021-06-04 16:18:39 -07:00
Xiaoxi Wang
5be65fab5e
add comment
2021-06-04 18:40:18 +00:00
Xiaoxi Wang
e0981d6732
add code coverage mark
2021-06-03 19:58:28 +00:00
Xiaoxi Wang
351325b3af
comment modification; wait perpetual wiggling close
2021-06-03 05:13:20 +00:00
Xiaoxi Wang
21e175b16c
add comments for new actors
2021-06-02 18:49:01 +00:00
Xiaoxi Wang
944c9ad8d9
fix memory bug
2021-06-02 17:53:44 +00:00
Josh Slocum
b3e4f182ef
TSS Mapping Change
2021-06-02 17:30:09 +00:00
Xiaoxi Wang
9684d78a6e
solve recruiting conflict with TSS
2021-06-02 06:12:45 +00:00
Xiaoxi Wang
8b9c8b33fc
manually merge with master
2021-06-01 17:51:42 +00:00
Xiaoxi Wang
ce308edc5e
fix wiggler logic bug
2021-05-26 21:57:58 +00:00
Josh Slocum
4257ac2b4d
More TSS Changes/Fixes
2021-05-25 20:37:48 +00:00
Josh Slocum
ce82c9653e
Testing Storage Server implementation
2021-05-25 20:28:50 +00:00
Xiaoxi Wang
e9a23840ea
fix promise bug
2021-05-25 20:25:21 +00:00
Xiaoxi Wang
f11b7ffa5f
merge master, fix promise callback bug
2021-05-25 18:43:08 +00:00
Xiaoxi Wang
7bc55448aa
fix iterator bug
2021-05-24 19:11:28 +00:00
Xiaoxi Wang
85cd2b9945
add perpetualStorageWiggler
2021-05-20 23:31:08 +00:00
Xiaoxi Wang
3f3a81b3d9
add pid2server_info to maintain Process id set
2021-05-20 03:32:15 +00:00
sfc-gh-tclinkenbeard
f28ac955c3
Remove unnecessary temporary objects while growing objects of type std::vector<std::pair<A, B>>
2021-05-10 16:32:50 -07:00
sfc-gh-tclinkenbeard
5c2d7b6080
Create RangeResult type alias
2021-05-03 13:14:16 -07:00
Trevor Clinkenbeard
0db28f6ea0
Merge pull request #4535 from jzhou77/fix-dd
...
Fix DD Assertion failed in canBeSet
2021-03-24 10:50:04 -07:00
Jingyu Zhou
0c3bc09524
Remove the shuttingDown flag
2021-03-21 20:12:37 -07:00
Jingyu Zhou
cb26576b95
Fix DD assertion failure
...
This fixes #4493 , where DDTeamCollection::~DDTeamCollection creates new teams
that hold pointer to the DDTeamCollection, thus later causes assertion failure
because the memory is invalid.
The fix is to cancel teamBuilder at the begining of the ~DDTeamCollection.
2021-03-21 19:54:44 -07:00
Evan Tschannen
d2f9bf7eb6
added comments and fixed style
2021-03-16 15:44:49 -07:00
Evan Tschannen
edefcff3ac
do not kill the data distributor after removing a failed server, completely remove the failed server
2021-03-15 16:48:08 -07:00
Evan Tschannen
c0a1362478
fixed a bug where DD was shutdown while still in a callback from trackExcludedServers
2021-03-15 16:26:57 -07:00
Evan Tschannen
c570a7b718
added trace events
2021-03-15 15:55:02 -07:00
Evan Tschannen
403d933329
fixed trace event
2021-03-15 10:51:53 -07:00
Evan Tschannen
831224df99
fixed complier error
2021-03-15 10:48:48 -07:00
Evan Tschannen
4e4149b070
exclude failed shuts down data distribution while the server is being removed to avoid two processes making changes to the key servers at the same time
2021-03-15 10:43:06 -07:00
FDB Formatster
df90cc89de
apply clang-format to *.c, *.cpp, *.h, *.hpp files
2021-03-10 10:18:07 -08:00
Vishesh Yadav
2bb4f2e59f
Merge branch 'release-6.3-pre-format' into master-format
...
This merges release-6.3 branch right before it was fully formatted.
There were quite a few conflicts that are resolved here. CoroFlow had
a check for OOM errors introduced in 6.3, but didn't seem applicable in
the new implmentation which seems to use boost.
2021-03-10 09:37:41 -08:00
Chaoguang Lin
9645f489e6
Fix base trace event name inconsistency
2021-03-08 15:20:50 -08:00
Zhe Wu
59181245c1
Change SSVersionDiffLarge event log level to warning
2021-03-03 23:33:48 -08:00
Andrew Noyes
79cec09255
Apply clang-tidy's performance-inefficient-vector-operation fix
...
I ran this command in my build directory after compiling with
OPEN_FOR_IDE. It took a few small tweaks to get it to compile, which is
outside the scope of this commit.
$ python run-clang-tidy.py -j $(nproc) -checks='-*,performance-inefficient-vector-operation' -fix
2021-03-04 03:58:25 +00:00
Evan Tschannen
346a4e3ecd
Merge branch 'release-6.3'
...
# Conflicts:
# fdbcli/fdbcli.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/MultiInterface.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2021-03-01 18:52:06 -08:00
sfc-gh-tclinkenbeard
32486a2785
Reenable tlog pops in ddSnapCreateCore even if some disable requests fail
...
If some tlogs successfully disable pops but others fail to, we do not
want to wait TLOG_IGNORE_POP_AUTO_ENABLE_DELAY seconds before reenabling
pops
2021-02-20 18:24:21 -08:00
Russell Sears
8025cc4571
Merge remote-tracking branch 'upstream/release-6.3' into merge-6.3-to-master-1-26-21
2021-01-26 19:46:47 +00:00
Andrew Noyes
0ef44739ea
Fix OPEN_FOR_IDE build in preparation for using clang-tidy
2021-01-26 02:04:11 +00:00
Xin Dong
0cde3cc48f
Make trace event happy
2021-01-25 14:04:16 -08:00
Xin Dong
dce8af5b63
Change back to SevWarnAlways for now.
2021-01-25 13:54:24 -08:00
sfc-gh-tclinkenbeard
bdf58d0b2f
Add assertion to DDTeamCollection::overlappingMembers
2021-01-21 14:39:39 -08:00
sfc-gh-tclinkenbeard
ad99bf0471
Merge remote-tracking branch 'origin' into misc-changes
2021-01-21 10:03:07 -08:00
Xin Dong
83506cda87
Log SevError instead of SevWarnAlways when all replicas of some data are lost.
2021-01-15 15:00:48 -08:00
Andrew Noyes
ff7d306b09
Merge branch 'release-6.3' into anoyes/merge-6.3-to-master
...
Include conflict markers for now. Will resolve.
2021-01-15 18:04:09 +00:00
sfc-gh-tclinkenbeard
5b2e88b187
Use structured bindings in for loops
2020-12-27 01:46:20 -04:00
sfc-gh-tclinkenbeard
0d4e81e6b4
Use unique_ptr in DataDistribution.actor.cpp
2020-12-26 23:40:54 -04:00
sfc-gh-tclinkenbeard
19816ccdbf
Improve DataDistribution const-correctness
2020-12-26 20:22:27 -04:00
sfc-gh-tclinkenbeard
26a4884eef
Mark TCMachineTeamInfo::size const
2020-12-26 19:23:01 -04:00
Jingyu Zhou
bbb56e4089
Merge branch 'release-6.2' of https://github.com/apple/foundationdb into release-6.3
2020-12-23 14:26:59 -08:00
Andrew Noyes
877997632d
Merge branch 'release-6.3' into anoyes/merge-release-6.3-master
...
Include conflict markers for review purposes
2020-12-04 01:38:07 +00:00
Xin Dong
78503db523
Reset and retry transaction errors
2020-12-03 14:42:30 -08:00
Xin Dong
ac02329d7d
Added a command in fdbcli to allow user to manually trigger the detailed teams info loggings in data distributor
2020-12-03 14:42:30 -08:00
Andrew Noyes
b8a9807336
Move trackerCancelled higher in catch block
2020-11-24 20:34:06 +00:00
Andrew Noyes
dc2bac5670
Resolve conflicts
2020-11-24 19:09:42 +00:00
Andrew Noyes
1f541f02be
Merge branch 'anoyes/merge-6.2-to-6.3' into anoyes/release-6.3-merge
...
Merge, leaving conflict markers for now
2020-11-24 16:55:34 +00:00
David Youngworth
fc9b78737f
Fix some merge bugs
2020-11-17 14:53:02 -08:00
David Youngworth
d64cf8b9e3
Merge branch 6.3 into master
2020-11-17 11:22:45 -08:00
David Youngworth
489ba20641
Fix several merge issues
2020-11-16 14:46:36 -08:00
David Youngworth
d0391db862
Merge branch 'release-6.2' into release-6.3
2020-11-16 10:15:23 -08:00
sfc-gh-tclinkenbeard
ca8ea3b6ff
Fix memory issues caused by cancelling data distribution tracker
2020-11-15 23:52:36 -08:00
Markus Pilman
1343f40117
don't allow empty coments
2020-11-11 14:07:54 -07:00
Markus Pilman
bdd3dbfa7d
remove duplicates
2020-11-10 14:01:07 -07:00
sfc-gh-tclinkenbeard
4669f837fa
Add uses of makeReference
2020-11-07 22:10:18 -08:00
Jon Fu
3ae611d668
Merge branch 'master' of https://github.com/apple/foundationdb into jfu-pause-backup-snapshot
2020-11-04 14:26:49 -05:00
Jon Fu
bda72d9a3d
first draft at changing snapshot backup behaviour
2020-11-02 17:12:30 -05:00
sfc-gh-tclinkenbeard
cf4c8e375f
Merge remote-tracking branch 'origin/release-6.3' into merge
2020-10-29 22:15:41 -07:00
Steve Atherton
99c1880a83
Merge commit 'f9581de2005e6b085776e81b9fcaa16442b32589' into merge-6.2-to-6.3
...
# Conflicts:
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
2020-10-27 12:21:26 -07:00
Xin Dong
be7944773f
Fix a typo
2020-10-26 16:44:52 -07:00
Xin Dong
9ef29d0cea
Changed getTeamID() to return a string instead of UID as suggested by reviews.
2020-10-26 16:44:52 -07:00
Xin Dong
bec2cfb167
Fix typos.
2020-10-26 16:44:52 -07:00
Xin Dong
0bc51bb780
Resolve review comments
2020-10-26 16:44:50 -07:00
Xin Dong
7ebb2e5c09
Piggy back this PR to polish more TraceEvent by:
...
- Making it clear that it's tracking machine team info or server team info
- Added ID to both machine team and server team for better trackability
- Attach distributor id to some trace events.
2020-10-26 16:44:09 -07:00
Xin Dong
c037bfd001
Added detailed logging when there is no servers left in a server team, because that may indicate a data loss incident.
2020-10-26 16:44:07 -07:00
Xin Dong
6395b76d8c
Address more review comments
2020-10-23 15:29:08 -07:00
Xin Dong
f757cae786
Address review comments
2020-10-23 14:01:53 -07:00
sfc-gh-tclinkenbeard
e0b1f95740
Merge remote-tracking branch 'origin/master' into remove-global-ddenabled-flag
2020-10-21 18:22:08 -07:00
Young Liu
8cc3e4d3c6
Merge release-6.3 into master
2020-10-19 22:51:56 -07:00
Jingyu Zhou
44c62b2d51
Merge pull request #3922 from jzhou77/release-6.3
...
Merge Release 6.2 to Release 6.3
2020-10-19 14:38:36 -07:00
sfc-gh-tclinkenbeard
652d753daf
Remove global ddEnabled flag
2020-10-17 11:23:52 -07:00
Andrew Noyes
70c1ac2131
Use TraceEvent::error
2020-10-16 16:55:09 -07:00
Xin Dong
8d0aa02a63
Do not periodically print detailed DD teams info
2020-10-16 16:11:14 -07:00
Andrew Noyes
9dec4bc46a
Add ErrorCode to StorageServerTrackerCancelled trace event
2020-10-16 15:40:14 -07:00
Jingyu Zhou
8f17a1a5d6
Merge branch 'release-6.2' into release-6.3
2020-10-16 15:25:39 -07:00
Andrew Noyes
30488df5ea
Fix build
2020-10-16 14:26:40 -07:00
Andrew Noyes
81193a9226
Move TraceEvent upward
2020-10-16 14:23:27 -07:00
Andrew Noyes
1e0e800751
Fix build
2020-10-16 12:10:07 -07:00
Andrew Noyes
2b87627d1b
Check for cancellation after errorOut.sendError(e)
2020-10-16 12:10:07 -07:00
Xin Dong
92e31dd338
Address review comments
2020-10-15 15:25:00 -07:00
Xin Dong
1d43729cc9
Added a way to print detailed information about team collection for debugging.
2020-10-15 10:01:56 -07:00
Andrew Noyes
a1e868a569
Merge pull request #3862 from sfc-gh-tclinkenbeard/use-override-more
...
Add uses of override keyword, remove unnecessary uses of virtual
2020-10-14 15:06:45 -07:00
A.J. Beamon
3b66a1f2d4
Fix a couple places where we were creating vectors with default elements rather than reserving space.
2020-10-09 10:51:06 -07:00
sfc-gh-tclinkenbeard
a9607bdcec
Explicitly seal classes that inherit but aren't inherited from
2020-10-07 21:58:24 -07:00
sfc-gh-tclinkenbeard
8571dcfe28
Use override where applicable in fdbserver
2020-10-07 18:41:19 -07:00
Jon Fu
b4ad989252
use stack transaction instead of heap
2020-10-05 16:51:01 -04:00
Evan Tschannen
52a6496a54
fix compiler errors
2020-10-04 16:50:54 -07:00
sfc-gh-tclinkenbeard
91a8367acb
Avoid slow task in ~DataDistributionTracker
2020-10-01 11:44:55 -07:00
Jon Fu
69580593dd
Merge branch 'master' of https://github.com/apple/foundationdb into jfu-snapshot-record-version
2020-09-23 15:35:05 -04:00
sfc-gh-tclinkenbeard
0814841827
Replace NULL with nullptr in fdbserver
2020-09-20 11:31:49 -07:00
Jon Fu
260c8d9568
Merge branch 'master' of https://github.com/apple/foundationdb into jfu-snapshot-record-version
2020-09-11 15:05:58 -04:00
Evan Tschannen
ae7bf24353
Merge pull request #3549 from yliucode/grv-proxy
...
Separate out a new role GrvProxy to serve GRVs.
2020-09-03 19:03:45 -07:00
Young Liu
87693cae81
merge master branch and resolve conflicts
2020-09-02 13:44:33 -07:00
A.J. Beamon
b4c96cadc7
Merge branch 'release-6.3' into merge-release-6.3-into-master
2020-09-02 12:45:57 -07:00
Jon Fu
d334b6484e
attempt to write to system keys with snapshot
2020-09-02 15:17:54 -04:00
Evan Tschannen
0443ea7a9b
fix: prioritize marking a region as fully replicated over removing machine teams
2020-09-01 15:55:33 -07:00
Evan Tschannen
12edadd059
Merge branch 'release-6.3'
...
# Conflicts:
# CMakeLists.txt
# fdbclient/Knobs.cpp
# fdbclient/MasterProxyInterface.h
# fdbrpc/simulator.h
# fdbserver/MasterProxyServer.actor.cpp
# tests/fast/CycleAndLock.txt
# tests/fast/TxnStateStoreCycleTest.txt
# tests/fast/VersionStamp.txt
# tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
# tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
# versions.target
2020-08-31 19:33:34 -07:00
Young Liu
8994719e46
Merge branch 'master' into grv-proxy
2020-08-31 10:21:32 -07:00
Young Liu
e87327b33b
Merge master branch and keep master proxy reporting txn cost estimation to ratekeeper
2020-08-29 12:47:35 -07:00
Meng Xu
ca9b1f5b34
Merge branch 'release-6.3' into mengxu/fr-sched-PR
...
Resolve conflict at BackupContainer.actor.cpp
2020-08-27 16:54:00 -07:00
sfc-gh-tclinkenbeard
c3991262cf
Add nullptr check to traceAllInfo
2020-08-27 09:40:42 -07:00
Young Liu
63b3612ad5
Merge master branch and resolve conflicts
2020-08-24 16:42:31 -07:00
Xiaoxi Wang
3afdb44c7a
merge master
2020-08-23 17:09:04 +00:00
David Youngworth
e1b7dd0c7d
Merge remote-tracking branch 'upstream/release-6.3' into dyoungworth/fixMerge1
2020-08-22 12:25:19 -07:00
Xiaoxi Wang
3b63d8b01b
remove FIXME; remote tagSet.reset(); trivial changes
2020-08-21 19:17:16 +00:00
A.J. Beamon
f864606d8d
Don't block the data distributor when getting a GetDataDistributorMetricsRequest.
2020-08-21 18:16:07 +00:00
A.J. Beamon
6380b92b10
Don't block the data distributor when getting a GetDataDistributorMetricsRequest.
2020-08-21 09:26:18 -07:00
Xiaoxi Wang
599675cba8
modify some details to get better performance
2020-08-19 04:23:23 +00:00
Meng Xu
1e571a5a1a
FastRestore:Loader:Kick off scheduler when loader starts to have new requests
2020-08-15 21:57:00 -07:00
Xiaoxi Wang
f3ecf14601
change midShardSize type and other details
2020-08-12 17:49:12 +00:00
Young Liu
79ce16650d
merge master branch
2020-08-11 19:22:10 -07:00
Xiaoxi Wang
0cceda9908
solve distributor present bug
2020-08-11 21:54:52 +00:00
Meng Xu
97e49f2f70
Resolve throttling events
2020-08-10 22:01:12 -07:00
Xiaoxi Wang
696e77c94e
query midShardSize from proxy
2020-08-10 20:13:44 +00:00
Xiaoxi Wang
df9149fea4
ignore transaction tag of immediate transactions
2020-08-07 23:36:17 +00:00
Xiaoxi Wang
13307679c5
use median shard size"
2020-08-05 03:57:25 +00:00
Xiaoxi Wang
b903e60cb7
fix monitorDDMetricsChanges bugs
2020-08-03 17:12:36 +00:00
Xiaoxi Wang
d1cc87452c
merge with master; solve conflicts; solve initialization;
2020-08-02 22:44:07 +00:00
Xiaoxi Wang
c3a629588f
add client transaction tag sample
2020-07-31 19:08:42 +00:00
Young Liu
30ea639666
Remove debug traces
2020-07-29 07:55:05 -07:00
Young Liu
f7b76a92af
pass joshua
2020-07-29 07:26:55 -07:00
Evan Tschannen
a49cb41de7
Merge branch 'release-6.3'
...
# Conflicts:
# CMakeLists.txt
# cmake/ConfigureCompiler.cmake
# fdbserver/Knobs.cpp
# fdbserver/StorageCache.actor.cpp
# fdbserver/storageserver.actor.cpp
# flow/ThreadHelper.actor.h
# flow/serialize.h
# tests/CMakeLists.txt
2020-07-29 00:31:55 -07:00
Xiaoxi Wang
819e3ab3e8
Merge branch 'ratekeeper'
2020-07-28 16:48:50 +00:00
Xiaoxi Wang
48a0fb5154
ask DD for shard info
2020-07-25 04:08:12 +00:00
Young Liu
525f10e30c
Merge master branch
2020-07-22 16:08:49 -07:00
Russell Sears
ab0d8b0626
Merge pull request #3509 from sfc-gh-anoyes/anoyes/remove-using-relops
...
Remove using namespace std::rel_ops
2020-07-22 11:58:25 -07:00
sfc-gh-tclinkenbeard
638f586f78
Remove unnecessary override
2020-07-21 11:05:46 -07:00
sfc-gh-tclinkenbeard
83c5a30f62
Add encapsulation to TCTeamInfo and ParallelTCInfo
2020-07-21 11:05:41 -07:00
sfc-gh-tclinkenbeard
9a2ce4c981
Make IDataDistributionTeam const-correct
2020-07-21 11:05:34 -07:00
Meng Xu
b2a3b4fd83
Merge branch 'master' into mengxu/merge-6.3-PR
2020-07-20 11:34:18 -07:00
Meng Xu
1ba9b6b07f
DD:Change SendRelocateToDDQx100 to SendRelocateToDDQueue
2020-07-17 14:10:17 -07:00
Meng Xu
098cdfb558
Replace actor_cancelled error with dd_cancelled
2020-07-16 20:26:07 -07:00
Meng Xu
ba3c631350
Remove spammy trace
2020-07-16 10:33:24 -07:00
Meng Xu
638e612a97
Improve coding style and trace events
2020-07-16 10:25:42 -07:00
Meng Xu
acbb389862
Debug and fix very rare crash in TeamTracker
...
teamTracker only works when all DDTeamCollections are valid.
However, teamTracker can be triggered by zeroTeamSignalling event
after a DDTeamCollection is destructed and the other DDTeamCollection has not been
destructed yet.
This causes teamTracker to uses a pointer to the destructed DDTeamCollection and thus
has mysterious failure.
2020-07-16 10:23:02 -07:00
Young Liu
5b06d69d25
Pass watches test
2020-07-15 00:37:41 -07:00
Meng Xu
47ae66bd61
Merge branch 'master' into mengxu/tmp-minor-comment-PR
...
Resolve conflict at waitFailureClient
2020-07-13 16:17:50 -07:00
Meng Xu
ef8c1060a2
Merge branch 'master' into mengxu/tmp-merge-6.3
2020-07-13 10:15:56 -07:00
Meng Xu
6f2e12be42
Minor improvement on comments
2020-07-12 18:32:47 -07:00
Andrew Noyes
f470ba8316
Remove using namespace std::rel_ops
...
This causes the following to not compile anymore
\#include <utility>
\#include <vector>
using namespace std::rel_ops;
int main() {
std::vector<int> xs;
return xs.rbegin() != xs.rend();
}
See https://godbolt.org/z/s1977n
2020-07-10 22:58:15 +00:00
A.J. Beamon
b09dddc07e
Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
...
# Conflicts:
# cmake/ConfigureCompiler.cmake
# documentation/sphinx/source/downloads.rst
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/fdbrpc.vcxproj
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen
5e02fd490e
fix: the check for if a teamCollection was tracking a source server was unreliable, leading to scenarios where we would temporarily replicate a shard less than teamSIze
2020-06-29 10:02:27 -07:00
negoyal
cf13e00a8f
Merge remote-tracking branch 'origin/release-6.3' into fdb_cache_wo_allocator
2020-06-01 17:38:31 -07:00
Chaoguang Lin
6ce574f5ad
Merge remote-tracking branch 'upstream/release-6.3' into add-data-distribution-metrics
2020-05-17 23:36:52 -07:00
Markus Pilman
c2bc75516f
Merge branch 'release-6.3' of github.com:apple/foundationdb into features/trace-roles
2020-05-14 10:34:53 -07:00
Evan Tschannen
48b1b20f67
Fixed a crash related to destruction order in data distribution
2020-05-10 23:14:19 -07:00
Chaoguang Lin
ef724bf939
Merge remote-tracking branch 'upstream/master' into add-data-distribution-metrics
2020-05-08 18:39:28 -07:00
chaoguang
e8b62e48f4
Rename DDMetrics to DDMetricsRef
2020-05-08 17:17:27 -07:00
Markus Pilman
5f9b127e56
Emit traces regularly about role assignment
...
We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.
We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
2020-05-08 16:27:57 -07:00
negoyal
f4d30f8dce
Fix the compilation error.
2020-05-06 19:09:40 -07:00
Markus Pilman
94570ea590
Data distributor now waitfails caches
2020-05-06 10:35:56 -07:00
Evan Tschannen
aed2d34bcb
Merge branch 'master' into feature-proxy-load-balance
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# flow/Knobs.cpp
2020-05-01 09:19:39 -07:00
Evan Tschannen
b7f5f3be48
merge in master
2020-04-28 13:11:47 -07:00
Evan Tschannen
0c84ad4bc6
Merge pull request #2917 from bnamasivayam/fail-slow-ss
...
Mark the storage servers that are continually lagging as unhealthy
2020-04-22 23:18:35 -07:00
Balachandar Namasivayam
d5bef6fc32
Update fdbserver/DataDistribution.actor.cpp
2020-04-22 09:45:56 -07:00
Evan Tschannen
ba3e2af473
Merge commit '5288033bcfe40c3ade97c8bf2d04cf31b3f16cb1' into feature-tree-broadcast
2020-04-17 15:17:37 -07:00
Evan Tschannen
33efb9ec97
code cleanup based on review comments
2020-04-17 15:05:01 -07:00
Alex Miller
1439de37b5
Convert GetRangeLimits() -> TOO_MANY + ASSERT().
2020-04-12 18:23:14 -07:00
Evan Tschannen
ce4493f679
many bug fixes
2020-04-10 13:45:16 -07:00
Balachandar Namasivayam
6916434f7d
Addressed review comments
2020-04-08 10:48:32 -07:00
Balachandar Namasivayam
69ef8a127b
Add a backstop mechanism to stop failing too many storage servers when they fall behind.
2020-04-06 23:37:11 -07:00
Alex Miller
6078fd1b18
Convert UID to Tag in keyServers to reduce txnStateStore size
2020-04-05 14:30:09 -07:00
Balachandar Namasivayam
73272fc72e
Version difference is now the diff between TLog versions and SS version.
2020-04-03 19:04:43 -07:00
Balachandar Namasivayam
a70bfcc3c8
Remove unnecessary comment.
2020-03-31 18:33:12 -07:00
Balachandar Namasivayam
b1c3893d40
Fix some corner case bugs exposed by simulation.
...
In one case, when a SS joins the cluster and DD doesn't find any healthy server to form a team with the newly added server, then the SS does not get added to any team even when the other servers get healthy.
Another is an extreme case where a data center is down, and a SS in the active DC joins and then dies immediately but not before DD adds it to a destination team for a relocating shard which will result in DD waiting indefinitely for the dead data center to come back up for the cluster to be fully recovered.
2020-03-31 18:33:12 -07:00
Balachandar Namasivayam
ad1dd4fd9b
Mark the storage servers that are continually lagging as unhealthy and so this will give the Data Distributor the chance to move data out of this server.
2020-03-31 18:25:39 -07:00
tclinken
247ab84323
Merge branch 'master' of https://github.com/apple/foundationdb into add-data-distribution-metrics
2020-03-23 17:01:17 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
Evan Tschannen
7adc916e18
Merge pull request #2806 from ajbeamon/improve-team-request-performance
...
Improve performance of get team requests.
2020-03-16 11:56:45 -07:00
A.J. Beamon
700b13e5f8
Remember the best team from team requests, which will likely be the best again and can save us some computation.
2020-03-13 15:21:33 -07:00
Evan Tschannen
12f2b32770
added additional logging in data distribution
2020-03-13 15:19:33 -07:00
Evan Tschannen
9e99a00c8f
fix: do not use priority 0 left when calculating priorities for empty teams
2020-03-13 13:56:46 -07:00
A.J. Beamon
555db50cd1
Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist.
2020-03-12 11:22:03 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1128666840
added additional logging on the log router
2020-03-05 18:17:06 -08:00
Evan Tschannen
e219c1671f
Merge branch 'release-6.2' into feature-dd-region-queue
...
# Conflicts:
# fdbserver/Knobs.h
2020-03-04 16:25:38 -08:00
Evan Tschannen
125bd13198
fix: in multi-region configurations, the data distribution queue could start too much work, expecting that the remote region would contribute to the read workload
2020-03-04 14:17:17 -08:00
Evan Tschannen
6296465e07
Make the DD priority associated with populating a remote region lower than machine failures
2020-03-04 14:07:32 -08:00
Evan Tschannen
924d335aa7
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# flow/Knobs.cpp
# flow/Knobs.h
2020-02-25 18:25:19 -08:00
Evan Tschannen
c05c95cbe8
forgot to rename the knob
2020-02-25 15:47:39 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
aa4d1357b3
handle the case that there is only one healthy team
2020-02-21 15:41:01 -08:00
Evan Tschannen
457dbc5215
Update fdbserver/DataDistribution.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-21 15:39:17 -08:00
Evan Tschannen
6a634652c4
Update fdbserver/DataDistribution.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-21 15:39:06 -08:00
Evan Tschannen
08914a2acd
Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team
2020-02-21 15:14:32 -08:00
Evan Tschannen
819c55556c
More aggressively attempt to find teams that do not have low disk space
2020-02-20 16:47:50 -08:00
A.J. Beamon
e1fb568fd1
Merge branch 'release-6.2' into dd-use-available-space
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d
Combine some logic that was doing similar computations for free space ratio.
2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253
Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them.
2020-02-20 11:21:03 -08:00
A.J. Beamon
3a1ba5a077
Rename variable for clarity
2020-02-20 10:59:52 -08:00
A.J. Beamon
c164acb88d
Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck.
2020-02-20 09:32:00 -08:00
mpilman
5a9d420cb7
Merge remote-tracking branch 'upstream/release-6.2' into release-merges/20200210
2020-02-10 10:02:05 -08:00
A.J. Beamon
b8a252da40
Clarify the names of a couple trace fields
2020-02-10 08:15:00 -08:00
tclinken
c9363e7e28
Merge branch 'master' of https://github.com/apple/foundationdb into add-data-distribution-metrics
2020-01-22 21:02:21 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
tclinken
1d6ac716a1
Merge remote-tracking branch 'origin' into add-data-distribution-metrics
2020-01-15 13:20:04 -08:00
Evan Tschannen
9b80498180
Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth
2020-01-10 14:58:38 -08:00
Evan Tschannen
c2608f0af9
fix: completeSources could be larger than the teamSize, so we need to check all completeSources
...
we do not need to track bestSize, since all teams in the list will be the same size
2020-01-10 14:46:40 -08:00
Evan Tschannen
ab7071932f
Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly
2020-01-09 16:59:37 -08:00
Evan Tschannen
83ad9caf54
implemented a load balancing algorithm which evens out the number of requests processes by each proxy
2020-01-08 01:59:01 -08:00
Evan Tschannen
59738e8ef1
fixed compiler error
2019-11-22 16:19:34 -08:00
Evan Tschannen
3c769fcf60
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2019-11-22 15:39:19 -08:00
Evan Tschannen
3a3ab5664b
fix: team trackers for bad teams that contain a removed servers must be cancelled or the cluster will falsely report those teams as failed
2019-11-22 10:20:13 -08:00
Andrew Noyes
d4de608bb6
Fix OPEN_FOR_IDE build
2019-10-25 10:42:22 -07:00
Evan Tschannen
f8e44d2f71
fix: If a storage server was offline, it would not be checked for being in an undesired dc
2019-10-23 23:04:39 -07:00
Jon Fu
d2b6626d5c
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-21 13:47:06 -07:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Jon Fu
b1fd6b4443
addressed review comments
2019-10-18 09:43:25 -07:00
Jon Fu
896701006f
addressed code review changes
2019-10-16 11:30:20 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
Jon Fu
34baa37e60
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-10 10:14:58 -07:00
Meng Xu
1bd6151f54
Update fdbserver/DataDistribution.actor.cpp
...
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-10-09 21:17:03 -07:00
Meng Xu
26e1d565f6
StorageServerTracker:Fix OOM bug caused by server healthyness toggles infinitely
...
When there is only one healthy team, the bug will set a server's status as unhealthy;
which causes the healthyTeam to 0, triggering StorageServerTracker to loop back;
which resets the server's status to healthy, and thus the healthyTeam to non-zero.
This pattern will cause infinite loop.
Infinite loop will prevent TraceEvent from flushing, which causes
TraceEvent to use most of memory and out-of-memory.
Kudos to JingYu Zhou (jingyu_zhou@apple.com ) who is the main contributor who found the bug!
2019-10-09 17:45:09 -07:00
Jon Fu
eb41e32876
add extra dd safety check to deny exclude if only 1 team exists
2019-10-08 16:10:09 -07:00