Vishesh Yadav
3068a37e1b
refactor: Remove dead failureDetectionServer code
2020-06-17 15:40:21 -07:00
Markus Pilman
c2bc75516f
Merge branch 'release-6.3' of github.com:apple/foundationdb into features/trace-roles
2020-05-14 10:34:53 -07:00
Evan Tschannen
f17f00fdd5
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2020-05-10 22:33:38 -07:00
Evan Tschannen
3eaa9d6397
fix: do not report datacenter version difference before both datacenters report a correct version
2020-05-10 17:49:09 -07:00
Markus Pilman
5f9b127e56
Emit traces regularly about role assignment
...
We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.
We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
2020-05-08 16:27:57 -07:00
Evan Tschannen
9e5037291d
fix compiler errors
2020-05-01 14:30:50 -07:00
Evan Tschannen
a442565e13
more work towards shrinking locality
2020-04-18 21:29:38 -07:00
Evan Tschannen
b04478704e
fixed improper use of std::set erase
2020-04-17 16:45:22 -07:00
Evan Tschannen
33efb9ec97
code cleanup based on review comments
2020-04-17 15:05:01 -07:00
Evan Tschannen
b667d5442f
fix: not all removed endpoints were actually removed
2020-04-17 13:47:54 -07:00
Evan Tschannen
9b5130194d
avoid updating the same endpoint multiple times
2020-04-11 21:05:30 -07:00
Evan Tschannen
1476057996
properly cache serialization of serverDBInfo
2020-04-11 19:30:05 -07:00
Evan Tschannen
07cc0a8d74
code cleanup
2020-04-10 17:02:11 -07:00
Evan Tschannen
ce4493f679
many bug fixes
2020-04-10 13:45:16 -07:00
Evan Tschannen
a51c92854a
Merge branch 'master' into feature-tree-broadcast
...
# Conflicts:
# fdbserver/WorkerInterface.actor.h
# fdbserver/worker.actor.cpp
2020-04-06 21:09:44 -07:00
Evan Tschannen
2a1bd97120
fix compilation errors
2020-04-06 20:58:43 -07:00
Evan Tschannen
477d66b46d
implemented a tree broadcast for txn state message for proxies, and serverDBInfo for workers
2020-04-05 23:09:36 -07:00
Jingyu Zhou
5b36dcaad5
Fix oldest backup epoch for backup workers
...
The oldest backup epoch is piggybacked in LogSystemConfig from master to
cluster controller and then to all workers. Previously, this epoch is set
to the current master epoch, which is wrong.
2020-03-20 20:15:09 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
Evan Tschannen
2038a56ff4
Merge pull request #2819 from etschannen/feature-first-proxy
...
A "proxy" class process would not be preferred as the "first proxy" for restore and DR purposes
2020-03-16 13:53:28 -07:00
Evan Tschannen
012344e297
refactor getWorkersForRoleInDatacenter
2020-03-16 11:50:17 -07:00
Evan Tschannen
79d5511149
A "proxy" class process would not be preferred as the "first proxy" for restore and DR purposes
2020-03-13 17:49:02 -07:00
Evan Tschannen
4640edf5d6
do not recruit satellite tlogs when usable regions=1
2020-03-13 10:24:52 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
f3ac2c9180
renamed a variable
2020-03-04 18:49:21 -08:00
Evan Tschannen
b3ea9d5896
Do not allow the cluster controller to mark any process as failed within 30 seconds of startup
2020-03-04 18:45:26 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
8b768e66df
Merge pull request #2694 from dongxinEric/feature/2663/specialize-policy-for-zoneid-in-cc
...
Added a specialized algorithm for PolicyOne and PolicyAcross(,'zoneId…
2020-02-20 14:46:23 -08:00
Evan Tschannen
574e88ba8e
updateGoodRemoteRecruitmentTime was unnecessary because the only way findRemoteWorkers would return would be after a new server has joined which already resets goodRemoteRecruitmentTime
2020-02-20 13:46:22 -08:00
Xin Dong
99095c9224
Again make Clang happy.
2020-02-20 09:50:22 -08:00
Xin Dong
298d6cb3d7
Address review comments.
2020-02-20 09:34:01 -08:00
Evan Tschannen
fbd45963d8
The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment
2020-02-19 16:48:30 -08:00
Xin Dong
89fcbb2055
Make clang happy
2020-02-19 09:44:15 -08:00
Xin Dong
efc0d7f9d5
Added a specialized algorithm for PolicyOne and PoilcyAcross(,'zoneId',PolicyOne()) to find a set of TLog servers which will be able to fulfill the policy later.
2020-02-19 09:25:57 -08:00
Evan Tschannen
844c8511c4
Merge pull request #2588 from jzhou77/backup-worker
...
Integrate new backup worker with existing backup command
2020-02-05 14:14:43 -08:00
Jingyu Zhou
52c6737411
Rename backupLoggingEnabled as backupWorkerEnabled
...
To highlight the changes for 7.0 backup changes. By default,
backup_worker_enabled flag is set for 7.0 version.
2020-02-04 10:09:16 -08:00
Jingyu Zhou
0db03f1d3c
Use backup_logging_enabled flag
...
The default is to enable new backup workers. Users can disable this flag to
turn off the backup worker feature.
2020-02-03 20:03:22 -08:00
Evan Tschannen
4524831456
Merge pull request #2518 from vishesh/task/failmon-remove-server
...
FailureMonitoring: Server processes no longer need to talk to ClusterController
2020-02-03 17:22:50 -08:00
Jingyu Zhou
38aa1903fd
Add a DB configuration option for backup workers
...
Right now, the default is to keep the old backup behavior, i.e., do NOT use
backup workers. Specifically, if BackupType is not set (or is set to default),
the master will not recruit backup workers and will not add pseudo locality for
backup workers.
The StartFullBackupTaskFunc is updated to check if backup worker is enabled.
Only when it is not enabled, starting a backup will wait on all backup workers
to be started.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
6ddf73e26a
Remove code introduced when resolving merge conflicts
2020-01-22 21:23:38 -08:00
Jingyu Zhou
c6c39ca99d
Update better master exist with backup workers
...
During recruitment, if there is no desired log router count, use tlog size
instead, because the number of backup workers has to be larger than 0.
2020-01-22 19:43:40 -08:00
Jingyu Zhou
56a2c37071
Recruit backup workers for single region
...
Enable log router tags for single region, which are popped by backup workers.
Need to add noop for backup workers if there is no active backups.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
19d6a889ff
Recruit backup workers for old epochs
...
If there are unfinished ranges in the old epochs, the new master will recruit
backup workers responsible for finishing these ranges. These workers remains in
the cluster until the next epoch, when it will remove itself.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
7da9f47f26
Enable pop from backup workers
...
This is still WIP as some edge cases can trigger test failure, most likely due
to not popping mutations by backup workers when epoch ends.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
ece3cadf8e
Recruit backup worker during master recovery
...
Right now recruit the same number as TLogs. The backup worker does nothing.
2020-01-22 19:37:48 -08:00
Jingyu Zhou
de8d953865
Add backup role, class, and worker skeleton
2020-01-22 19:35:30 -08:00
Vishesh Yadav
daef5f011a
Merge remote-tracking branch 'apple/master' into task/failmon-remove-server
2020-01-21 13:20:15 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
d55e56993d
fix: the cluster controller would not recruit more remote logs before the database became fully_recovered
2020-01-10 12:21:48 -08:00
Alvin Moore
7628d04fb9
Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2020-01-09 07:21:16 -08:00