Xin Dong
fca9aab17a
Merge pull request #2046 from dongxinEric/feature/hot-read-key-detection
...
Added metrics for read hot key detection
2019-10-21 14:31:48 -07:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Evan Tschannen
8b09cd16b2
Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-share-mutations
2019-10-16 14:50:37 -07:00
Evan Tschannen
5667331729
added a buggify + minor code cleanup
2019-10-11 18:31:43 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
Xin Dong
62ffdd54a3
Updated some comments to reflect the correct knob value and also used a more appropiate value for read bandwidth. Set the default value for read bandwidth in some cases.
2019-10-09 16:42:42 -07:00
Xin Dong
cd4757b06c
Address review comments
2019-10-09 16:42:42 -07:00
Xin Dong
6b0f771cc0
Fixex a typo in knobs. Addressed some review comments. Added code for actual metric collecting.
2019-10-09 16:42:42 -07:00
Xin Dong
12293d5497
Added metrics for read hot key detection
2019-10-09 16:42:42 -07:00
Evan Tschannen
628b4e0220
added a warning if multiple log ranges exist for the same range
2019-10-02 17:06:19 -07:00
Meng Xu
d0147e5e5d
Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
...
Resolved Conflicts:
documentation/sphinx/source/release-notes.rst
fdbserver/DataDistribution.actor.cpp
versions.target
2019-10-02 13:22:56 -07:00
Meng Xu
d2fd1f4931
DD:MisconfiguredLocality:Fix review comments
2019-09-17 13:04:21 -07:00
Meng Xu
37d2318eed
DD:Handle worker with incorrect locality
...
When a worker has incorrect locality, the worker will be excluded from
storage recruitment.
When the worker has its locality corrected by system operators,
the worker will be reincluded for storage recruitment.
2019-09-14 12:12:56 -07:00
Meng Xu
78b8e48cef
DD:ValidLocality:Resolve review comment
2019-09-13 15:35:16 -07:00
Meng Xu
3ad7e3adb3
DD:DD_VALIDATE_LOCALITY:Guard the checking of locality validity
2019-09-13 13:19:35 -07:00
Evan Tschannen
8fbd90e2f6
Merge pull request #1985 from xumengpanda/mengxu/storage-engine-switch-PR-v2
...
Graceful storage engine migration
2019-09-09 13:51:53 -07:00
Meng Xu
c2355f721e
Merge branch 'master' into mengxu/performant-restore-PR
2019-09-04 17:11:42 -07:00
Meng Xu
8f9ba3bc09
StorageEngineSwitch:Remove unused code
2019-09-03 17:18:56 -07:00
Meng Xu
bd80a67d46
Merge branch 'master' into mengxu/storage-engine-switch-PR-v2
2019-09-03 14:11:33 -07:00
Evan Tschannen
00424a5108
changed the rate at which the coordinators register with the cluster controller and the clients register with the coordinator so the the connected client number in status will be much more accurate
2019-08-21 15:02:09 -07:00
Evan Tschannen
41b908752e
increased move keys parallelism to be less of a decrease just in case lowering this could effect normal data distribution
...
raised target durability lag versions to give more time for batch limiting to come into play before this limit is hit
changed max_bad_options to better reflect the name
2019-08-21 14:55:21 -07:00
Evan Tschannen
37e2fc86de
Increase the target durability lag versions to be larger than the soft max, so that storage servers will respond with a penalty to clients before ratekeeper controls on the lag
2019-08-19 14:03:42 -07:00
Evan Tschannen
9318b494ad
reduce the DD move keys parallelism to avoid a hot read shard when transitioning from triple replication to double replication
2019-08-19 14:02:18 -07:00
Meng Xu
b448f92d61
StorageEngineSwitch:Remove unnecessary code and format code
...
Uncessary code include debug code and the unnecessary calling of
the removeWrongStoreType actor;
Format the changes with clang-format as well.
2019-08-16 16:53:38 -07:00
Meng Xu
85ba904e2c
StorageEngineSwitch:Stop removeWrongStoreType actor if no SS has wrong storeType
2019-08-16 16:11:28 -07:00
Meng Xu
a588710376
StorageEngineSwitch:Graceful switch
...
When fdbcli change storeType for storage engines,
we switch the store type of storage servers one by one gracefully.
This avoids recruiting multiple storage servers on the same process,
which can cause OOM error.
2019-08-12 17:37:52 -07:00
Meng Xu
7ff46e6772
Merge branch 'master' into mengxu/performant-restore-PR
2019-08-07 20:31:56 -07:00
Evan Tschannen
9382a58390
fix: after a forced recovery it is possible to not have logs from all generations, so only wait at most a second for getting a popped txs version
2019-08-06 16:32:28 -07:00
Meng Xu
7ccaeddf05
Merge branch 'master' into mengxu/performant-restore-PR
2019-08-01 13:23:17 -07:00
Evan Tschannen
7d7aa27c2d
Merge pull request #1814 from dongxinEric/feature/1508/finer-grained-dd-controls
...
Added finer grained controls to DataDistribution in fdbcli.
2019-07-31 17:36:20 -07:00
Evan Tschannen
a0b29ff82f
updated knobs to allow more batch priority traffic
2019-07-31 17:19:41 -07:00
Evan Tschannen
4308ff86f7
increased the MAX_TEAMS_PER_SERVER
2019-07-31 16:08:18 -07:00
Xin Dong
b653ddb30d
Final clean ups after rebasing master
2019-07-30 22:35:34 -07:00
Xin Dong
cda70700cc
Address review comments. 50K correctness with no failures.
2019-07-30 22:24:30 -07:00
Evan Tschannen
6dbaddd0a7
Added a knob to always use CAUSAL_READ_RISKY for GRV
2019-07-30 18:21:46 -07:00
Evan Tschannen
5dd9043fd3
addressed review comments
2019-07-30 17:04:41 -07:00
A.J. Beamon
41605735f5
Merge pull request #1916 from ajbeamon/merge-onto-new-servers
...
Add knob to control whether merges request new servers or not.
2019-07-30 15:04:37 -07:00
A.J. Beamon
bc536757df
Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space.
2019-07-29 15:47:34 -07:00
Evan Tschannen
d8b14fe372
we cannot buggify replace content bytes because it takes too long to recovery when the txnStateStore is too large
2019-07-28 19:34:17 -07:00
Evan Tschannen
5c98dcce6d
revert the proxy forwarding path, because it is no longer necessary as clients keep a persistent connection open with coordinators
2019-07-27 16:46:22 -07:00
Evan Tschannen
b509a441e7
Merge branch 'master' into feature-skip-confirm
...
# Conflicts:
# bindings/flow/tester/Tester.actor.cpp
# bindings/go/src/_stacktester/stacktester.go
# bindings/java/src/test/com/apple/foundationdb/test/AsyncStackTester.java
# bindings/java/src/test/com/apple/foundationdb/test/StackTester.java
# bindings/python/tests/tester.py
# bindings/ruby/tests/tester.rb
# documentation/sphinx/source/api-c.rst
# documentation/sphinx/source/api-python.rst
# documentation/sphinx/source/api-ruby.rst
# documentation/sphinx/source/data-modeling.rst
# documentation/sphinx/source/developer-guide.rst
# fdbclient/vexillographer/fdb.options
# fdbserver/MasterProxyServer.actor.cpp
2019-07-27 15:08:13 -07:00
Evan Tschannen
ee94e8a062
removed a trace event which was causing valgrind errors
2019-07-27 13:51:59 -07:00
Evan Tschannen
90e3b50213
Merge branch 'master' into feature-coordinator-connection
...
# Conflicts:
# fdbclient/DatabaseContext.h
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen
ee92f0574f
fix: lastRequestTime was not updated
...
fix: COORDINATOR_REGISTER_INTERVAL was not set
fixed review comments
2019-07-26 13:23:56 -07:00
Meng Xu
45083edf74
Merge branch 'master' into mengxu/performant-restore-PR
...
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
sramamoorthy
a65c9f92ed
get rid of all timeouts and other changes
2019-07-24 15:36:28 -07:00
sramamoorthy
7e04e3c8be
snap v2: knobs for max snap create timeout
2019-07-24 15:36:28 -07:00
Evan Tschannen
c70e762f0e
Merge pull request #1785 from xumengpanda/mengxu/server-team-remover-PR
...
Remove redundant server teams
2019-07-19 17:44:16 -07:00
Meng Xu
b001a9ebe8
ServerTeamRemover runs after machineTeamRemover finishes
...
If serverTeamRemover removes a team before machineTeamRemover brings
the machine team number down to the desired number, DD may create a new
team (due to teams removed by serverTeamRemover), which may be removed
later by machineTeamRemover. This causes unnnecessary extra data movement.
2019-07-19 16:48:52 -07:00
Evan Tschannen
846038b0e6
Merge pull request #1858 from bnamasivayam/rk-ssfetch-throttle
...
Ratekeeper throttling aggressively when unable to fetch storage server list
2019-07-19 16:41:58 -07:00