Commit Graph

228 Commits

Author SHA1 Message Date
Xin Dong fca9aab17a
Merge pull request #2046 from dongxinEric/feature/hot-read-key-detection
Added metrics for read hot key detection
2019-10-21 14:31:48 -07:00
Evan Tschannen 688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Evan Tschannen 8b09cd16b2 Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-share-mutations 2019-10-16 14:50:37 -07:00
Evan Tschannen 5667331729 added a buggify + minor code cleanup 2019-10-11 18:31:43 -07:00
Evan Tschannen 86bcb84b45 Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards 2019-10-11 17:50:43 -07:00
Xin Dong 62ffdd54a3 Updated some comments to reflect the correct knob value and also used a more appropiate value for read bandwidth. Set the default value for read bandwidth in some cases. 2019-10-09 16:42:42 -07:00
Xin Dong cd4757b06c Address review comments 2019-10-09 16:42:42 -07:00
Xin Dong 6b0f771cc0 Fixex a typo in knobs. Addressed some review comments. Added code for actual metric collecting. 2019-10-09 16:42:42 -07:00
Xin Dong 12293d5497 Added metrics for read hot key detection 2019-10-09 16:42:42 -07:00
Evan Tschannen 628b4e0220 added a warning if multiple log ranges exist for the same range 2019-10-02 17:06:19 -07:00
Meng Xu d0147e5e5d Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
Resolved Conflicts:
	documentation/sphinx/source/release-notes.rst
	fdbserver/DataDistribution.actor.cpp
	versions.target
2019-10-02 13:22:56 -07:00
Meng Xu d2fd1f4931 DD:MisconfiguredLocality:Fix review comments 2019-09-17 13:04:21 -07:00
Meng Xu 37d2318eed DD:Handle worker with incorrect locality
When a worker has incorrect locality, the worker will be excluded from
storage recruitment.
When the worker has its locality corrected by system operators,
the worker will be reincluded for storage recruitment.
2019-09-14 12:12:56 -07:00
Meng Xu 78b8e48cef DD:ValidLocality:Resolve review comment 2019-09-13 15:35:16 -07:00
Meng Xu 3ad7e3adb3 DD:DD_VALIDATE_LOCALITY:Guard the checking of locality validity 2019-09-13 13:19:35 -07:00
Evan Tschannen 8fbd90e2f6
Merge pull request #1985 from xumengpanda/mengxu/storage-engine-switch-PR-v2
Graceful storage engine migration
2019-09-09 13:51:53 -07:00
Meng Xu c2355f721e Merge branch 'master' into mengxu/performant-restore-PR 2019-09-04 17:11:42 -07:00
Meng Xu 8f9ba3bc09 StorageEngineSwitch:Remove unused code 2019-09-03 17:18:56 -07:00
Meng Xu bd80a67d46 Merge branch 'master' into mengxu/storage-engine-switch-PR-v2 2019-09-03 14:11:33 -07:00
Evan Tschannen 00424a5108 changed the rate at which the coordinators register with the cluster controller and the clients register with the coordinator so the the connected client number in status will be much more accurate 2019-08-21 15:02:09 -07:00
Evan Tschannen 41b908752e increased move keys parallelism to be less of a decrease just in case lowering this could effect normal data distribution
raised target durability lag versions to give more time for batch limiting to come into play before this limit is hit
changed max_bad_options to better reflect the name
2019-08-21 14:55:21 -07:00
Evan Tschannen 37e2fc86de Increase the target durability lag versions to be larger than the soft max, so that storage servers will respond with a penalty to clients before ratekeeper controls on the lag 2019-08-19 14:03:42 -07:00
Evan Tschannen 9318b494ad reduce the DD move keys parallelism to avoid a hot read shard when transitioning from triple replication to double replication 2019-08-19 14:02:18 -07:00
Meng Xu b448f92d61 StorageEngineSwitch:Remove unnecessary code and format code
Uncessary code include debug code and the unnecessary calling of
the removeWrongStoreType actor;

Format the changes with clang-format as well.
2019-08-16 16:53:38 -07:00
Meng Xu 85ba904e2c StorageEngineSwitch:Stop removeWrongStoreType actor if no SS has wrong storeType 2019-08-16 16:11:28 -07:00
Meng Xu a588710376 StorageEngineSwitch:Graceful switch
When fdbcli change storeType for storage engines,
we switch the store type of storage servers one by one gracefully.
This avoids recruiting multiple storage servers on the same process,
which can cause OOM error.
2019-08-12 17:37:52 -07:00
Meng Xu 7ff46e6772 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-07 20:31:56 -07:00
Evan Tschannen 9382a58390 fix: after a forced recovery it is possible to not have logs from all generations, so only wait at most a second for getting a popped txs version 2019-08-06 16:32:28 -07:00
Meng Xu 7ccaeddf05 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-01 13:23:17 -07:00
Evan Tschannen 7d7aa27c2d
Merge pull request #1814 from dongxinEric/feature/1508/finer-grained-dd-controls
Added finer grained controls to DataDistribution in fdbcli.
2019-07-31 17:36:20 -07:00
Evan Tschannen a0b29ff82f updated knobs to allow more batch priority traffic 2019-07-31 17:19:41 -07:00
Evan Tschannen 4308ff86f7 increased the MAX_TEAMS_PER_SERVER 2019-07-31 16:08:18 -07:00
Xin Dong b653ddb30d Final clean ups after rebasing master 2019-07-30 22:35:34 -07:00
Xin Dong cda70700cc Address review comments. 50K correctness with no failures. 2019-07-30 22:24:30 -07:00
Evan Tschannen 6dbaddd0a7 Added a knob to always use CAUSAL_READ_RISKY for GRV 2019-07-30 18:21:46 -07:00
Evan Tschannen 5dd9043fd3 addressed review comments 2019-07-30 17:04:41 -07:00
A.J. Beamon 41605735f5
Merge pull request #1916 from ajbeamon/merge-onto-new-servers
Add knob to control whether merges request new servers or not.
2019-07-30 15:04:37 -07:00
A.J. Beamon bc536757df Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space. 2019-07-29 15:47:34 -07:00
Evan Tschannen d8b14fe372 we cannot buggify replace content bytes because it takes too long to recovery when the txnStateStore is too large 2019-07-28 19:34:17 -07:00
Evan Tschannen 5c98dcce6d revert the proxy forwarding path, because it is no longer necessary as clients keep a persistent connection open with coordinators 2019-07-27 16:46:22 -07:00
Evan Tschannen b509a441e7 Merge branch 'master' into feature-skip-confirm
# Conflicts:
#	bindings/flow/tester/Tester.actor.cpp
#	bindings/go/src/_stacktester/stacktester.go
#	bindings/java/src/test/com/apple/foundationdb/test/AsyncStackTester.java
#	bindings/java/src/test/com/apple/foundationdb/test/StackTester.java
#	bindings/python/tests/tester.py
#	bindings/ruby/tests/tester.rb
#	documentation/sphinx/source/api-c.rst
#	documentation/sphinx/source/api-python.rst
#	documentation/sphinx/source/api-ruby.rst
#	documentation/sphinx/source/data-modeling.rst
#	documentation/sphinx/source/developer-guide.rst
#	fdbclient/vexillographer/fdb.options
#	fdbserver/MasterProxyServer.actor.cpp
2019-07-27 15:08:13 -07:00
Evan Tschannen ee94e8a062 removed a trace event which was causing valgrind errors 2019-07-27 13:51:59 -07:00
Evan Tschannen 90e3b50213 Merge branch 'master' into feature-coordinator-connection
# Conflicts:
#	fdbclient/DatabaseContext.h
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen ee92f0574f fix: lastRequestTime was not updated
fix: COORDINATOR_REGISTER_INTERVAL was not set
fixed review comments
2019-07-26 13:23:56 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
sramamoorthy a65c9f92ed get rid of all timeouts and other changes 2019-07-24 15:36:28 -07:00
sramamoorthy 7e04e3c8be snap v2: knobs for max snap create timeout 2019-07-24 15:36:28 -07:00
Evan Tschannen c70e762f0e
Merge pull request #1785 from xumengpanda/mengxu/server-team-remover-PR
Remove redundant server teams
2019-07-19 17:44:16 -07:00
Meng Xu b001a9ebe8 ServerTeamRemover runs after machineTeamRemover finishes
If serverTeamRemover removes a team before machineTeamRemover brings
the machine team number down to the desired number, DD may create a new
team (due to teams removed by serverTeamRemover), which may be removed
later by machineTeamRemover. This causes unnnecessary extra data movement.
2019-07-19 16:48:52 -07:00
Evan Tschannen 846038b0e6
Merge pull request #1858 from bnamasivayam/rk-ssfetch-throttle
Ratekeeper throttling aggressively when unable to fetch storage server list
2019-07-19 16:41:58 -07:00