Meng Xu
962024d8b8
Explain knob SHARD_MAX_BYTES_PER_KSEC
...
Explain why it may cost 100MB data movement.
No code change.
2020-01-31 17:04:11 -08:00
Meng Xu
7f37a90c48
FastRestore:Introduce FASTRESTORE_VB_PARALLELISM
...
for controlling the number of concurrently running version batches.
2020-01-28 10:39:57 -08:00
Meng Xu
0f4dfeda5b
FastRestore:Test random number of appliers and loaders
2020-01-27 20:19:58 -08:00
Meng Xu
3cfe1f031d
FastRestore:Use set instead of vector for keysplitter
...
and disable testing for random number of appliers and loaders
2020-01-27 20:14:33 -08:00
Meng Xu
141609e80a
FastRestore:Improve code style and fix typos
2020-01-27 18:13:14 -08:00
Evan Tschannen
231d7830a0
more accurate calculation on the amount of time that proxy should wait before getting a version from the master
2020-01-26 19:47:12 -08:00
Meng Xu
b04e98771e
FastRestore:Replace FastRestoreOpConfig with Knobs
...
And randomize value for the rest of knobs
2020-01-24 14:24:34 -08:00
Evan Tschannen
e167e63eaf
Add delays between proxy batches which roughly corresponding to the amount of work the proxy needs to do. This will help avoid getting a version from the master and then waiting a long time before committing it.
2020-01-23 18:31:51 -08:00
Jingyu Zhou
1311fec45a
Add an option to get minKnownCommittedVersion from Proxies
...
The backup worker needs to use this version for popping when running in a NOOP
mode. This option is added to GetReadVersionRequest and proxies will send back
minKnownCommittedVersion if the option is set.
Also add a couple of knobs for backup workers.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
0e5f5b50f0
Remove unused backup worker knobs
2020-01-22 19:38:46 -08:00
Jingyu Zhou
a4d6ebe79e
Recruit backup worker in newEpoch
2020-01-22 19:37:48 -08:00
Jingyu Zhou
de8d953865
Add backup role, class, and worker skeleton
2020-01-22 19:35:30 -08:00
Xin Dong
80683c09bb
Remove unused var to make compiler happy.
2020-01-21 11:19:52 -08:00
Xin Dong
b0a1af1288
Added the actual read hot detection algorithm and logging machanism.
...
- When a shard has a read bandwidth larger than a threshold value(configurable via knob), and it's read-bandwidth/byte-size ratio is also larger than a threshold(configurable via knob), the corresponding shard tracker will run the algorithm
- The algorithm will divide the shard into 10MB(configurable via knob) chunks and try to find the chunk(s) that has large aforementioned ratio
- Then those ranges will be logged into TraceEvents. This will later do more like actually cache them.
2020-01-21 11:19:52 -08:00
Xin Dong
33456e7276
Done the plumbing work at the SSI side.
2020-01-21 11:15:52 -08:00
Xin Dong
7708034cc9
Added the function used to determin sub-ranges within a read hot shard that has a readSize-to-byteSize ratio higher than the knob value. Alos added unit tests for that function.
2020-01-21 11:15:52 -08:00
Xin Dong
1d6cd1007b
Instead of using absolute value as the max bytesReadPerKSec threshold, use a pre-defined read traffic to byte size ratio to decide that value dynamically based on the actual size of the shard.
2020-01-21 11:15:52 -08:00
Evan Tschannen
54d77d20b2
Merge branch 'release-6.2'
2020-01-19 15:22:49 -08:00
Evan Tschannen
8197f0562f
merge priority did not need to be raised, because we no longer merge shards until they are untrackable
...
max_commit_updates was too large, and could cause proxies to run out of memory
2020-01-17 14:24:58 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
4b90487b90
occasionally throw wrong_shard_server when waitMetrics times out so that the waitMetrics request can get the correct number of shards if two shards have been merged or split and the same storage server owns all the chunks
2020-01-15 13:22:18 -08:00
A.J. Beamon
9668c4471f
Clamp infinite limit in ratekeeper
2020-01-14 15:45:24 -08:00
Evan Tschannen
17e97f24e4
Merge pull request #2526 from etschannen/feature-dd-improvements
...
Data distribution improvements
2020-01-10 17:53:22 -08:00
Evan Tschannen
fde53cbeef
HasBeenTrueFor was ready immediately after a previous shard merge
2020-01-10 16:28:56 -08:00
Evan Tschannen
9b80498180
Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth
2020-01-10 14:58:38 -08:00
Evan Tschannen
4aab9b7bc8
fix: clients would waste time attempting to read from a remote region when it was in the process of catching up
2020-01-10 12:23:59 -08:00
Evan Tschannen
02a8e8d1e9
batch priority must be heavily throttled before stopping data distribution rebalancing
2020-01-09 17:05:22 -08:00
Evan Tschannen
9842272ced
raised the priority of shard merges, because the tracker cannot track an unmerged shard
2020-01-09 17:04:17 -08:00
Evan Tschannen
e4fa4ad0c9
Data distribution will not merge a shard unless it has been low bandwidth for 5 minutes
2020-01-09 17:02:49 -08:00
Evan Tschannen
ab7071932f
Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly
2020-01-09 16:59:37 -08:00
Meng Xu
39a4f2372f
Change FASTRESTORE_SAMPLING_PERCENT to 0 to 100
2019-12-04 21:26:27 -08:00
Meng Xu
c6b36dbffb
FastRestore:Sampling:Resolve review comments
2019-12-04 17:35:11 -08:00
Meng Xu
dd91d26dfa
FastRestore:Sampling:Add FASTRESTORE_SAMPLING_RATE knob
2019-12-04 11:46:29 -08:00
Evan Tschannen
07331ab5fd
Merge pull request #2362 from etschannen/master
...
Merge 6.2 into master
2019-12-02 15:04:27 -08:00
Evan Tschannen
ebcb2f79ed
Merge branch 'master' of github.com:apple/foundationdb
2019-11-22 15:34:49 -08:00
Xin Dong
14dd5626d7
Resolve review comments
2019-11-22 10:11:45 -08:00
Xin Dong
b6e1839d84
Code clean up
2019-11-21 13:39:19 -08:00
Xin Dong
b282e180d5
Added a knob to disable read sampling
2019-11-20 14:03:20 -08:00
Evan Tschannen
8d3ef89540
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbclient/MutationList.h
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2019-11-14 15:49:56 -08:00
negoyal
a4a0bf18f9
Merging with Master.
2019-11-12 13:01:29 -08:00
Evan Tschannen
396dccbc98
when peeking from satellites we do not need to limit the amount of peeking on log router tags, because that is the only thing that can be peeked from a satellite log
2019-11-08 18:34:05 -08:00
Evan Tschannen
afc9713005
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbclient/FDBTypes.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/OldTLogServer_6_0.actor.cpp
# fdbserver/TLogServer.actor.cpp
# versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen
daac8a2c22
Knobified a few variables
2019-11-04 20:21:38 -08:00
Evan Tschannen
71dfaa3f95
Merge pull request #2275 from dongxinEric/bugfix/2273/fix-read-key-sampling
...
Resolves #2273 : Use a large value for read sampling size threshold. Also at sampling …
2019-10-31 10:21:58 -07:00
Xin Dong
199a34b827
Defined a minimum read cost (a penalty) for empty read or read size smaller than it. Fixed several review comments.
2019-10-30 10:04:19 -07:00
Evan Tschannen
3325980c03
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer_6_0.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.actor.h
# fdbserver/worker.actor.cpp
# versions.target
2019-10-24 17:38:15 -07:00
Xin Dong
a290e2cb2b
Use 8 MiB for real
2019-10-24 11:02:17 -07:00
Xin Dong
fe54a4bde1
- Changed SHARD_MAX_BYTES_READ_PRE_KEYSEC to be equivalent to 8MiB/s, which when times the sample expire interval(120 seconds) yields 960MiB/s. A shard having a read rate larger than that will be marked as read-hot. The number 960MiB was chosen to be roughtly twice the size of the max allowed shard size to avoid wrongly marking a shard as read-hot when doing a table scan on it.
...
- Also tuned down the empty key sampling percentage to be 5%.
2019-10-23 12:00:19 -07:00
Meng Xu
e676348710
Merge pull request #1955 from fzhjon/mark-ss-failed
...
Add fdbcli and API command to mark storage servers as permanently failed
2019-10-22 23:36:30 -07:00
Xin Dong
af72d15566
Update fdbserver/Knobs.cpp
...
From AJ: to match typical aligned format used on other variables.
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:53:28 -07:00
Xin Dong
e6f5748791
Use a large value for read sampling size threshold. Also at sampling site, don't round up small values to avoid sampling every key.
2019-10-22 13:47:58 -07:00
A.J. Beamon
29a0014b41
Fix "bandwith" typo
2019-10-22 09:51:59 -07:00
Evan Tschannen
12c517ab16
limit the number of committed version updates in progress simultaneously to prevent running out of memory
2019-10-21 16:01:45 -07:00
Xin Dong
fca9aab17a
Merge pull request #2046 from dongxinEric/feature/hot-read-key-detection
...
Added metrics for read hot key detection
2019-10-21 14:31:48 -07:00
Jon Fu
d2b6626d5c
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-21 13:47:06 -07:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Evan Tschannen
8b09cd16b2
Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-share-mutations
2019-10-16 14:50:37 -07:00
Evan Tschannen
5667331729
added a buggify + minor code cleanup
2019-10-11 18:31:43 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
Xin Dong
62ffdd54a3
Updated some comments to reflect the correct knob value and also used a more appropiate value for read bandwidth. Set the default value for read bandwidth in some cases.
2019-10-09 16:42:42 -07:00
Xin Dong
cd4757b06c
Address review comments
2019-10-09 16:42:42 -07:00
Xin Dong
6b0f771cc0
Fixex a typo in knobs. Addressed some review comments. Added code for actual metric collecting.
2019-10-09 16:42:42 -07:00
Xin Dong
12293d5497
Added metrics for read hot key detection
2019-10-09 16:42:42 -07:00
Jon Fu
d96a7b2c69
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-03 09:47:45 -07:00
Evan Tschannen
628b4e0220
added a warning if multiple log ranges exist for the same range
2019-10-02 17:06:19 -07:00
Meng Xu
d0147e5e5d
Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
...
Resolved Conflicts:
documentation/sphinx/source/release-notes.rst
fdbserver/DataDistribution.actor.cpp
versions.target
2019-10-02 13:22:56 -07:00
Jon Fu
68f88dea4b
remove buggify setting of new knob
2019-09-27 13:12:41 -07:00
Jon Fu
450a09e117
Code Review Changes
2019-09-24 15:48:50 -07:00
Meng Xu
d2fd1f4931
DD:MisconfiguredLocality:Fix review comments
2019-09-17 13:04:21 -07:00
Meng Xu
37d2318eed
DD:Handle worker with incorrect locality
...
When a worker has incorrect locality, the worker will be excluded from
storage recruitment.
When the worker has its locality corrected by system operators,
the worker will be reincluded for storage recruitment.
2019-09-14 12:12:56 -07:00
Meng Xu
78b8e48cef
DD:ValidLocality:Resolve review comment
2019-09-13 15:35:16 -07:00
Meng Xu
3ad7e3adb3
DD:DD_VALIDATE_LOCALITY:Guard the checking of locality validity
2019-09-13 13:19:35 -07:00
Evan Tschannen
8fbd90e2f6
Merge pull request #1985 from xumengpanda/mengxu/storage-engine-switch-PR-v2
...
Graceful storage engine migration
2019-09-09 13:51:53 -07:00
Meng Xu
c2355f721e
Merge branch 'master' into mengxu/performant-restore-PR
2019-09-04 17:11:42 -07:00
Meng Xu
8f9ba3bc09
StorageEngineSwitch:Remove unused code
2019-09-03 17:18:56 -07:00
Meng Xu
bd80a67d46
Merge branch 'master' into mengxu/storage-engine-switch-PR-v2
2019-09-03 14:11:33 -07:00
Evan Tschannen
00424a5108
changed the rate at which the coordinators register with the cluster controller and the clients register with the coordinator so the the connected client number in status will be much more accurate
2019-08-21 15:02:09 -07:00
Evan Tschannen
41b908752e
increased move keys parallelism to be less of a decrease just in case lowering this could effect normal data distribution
...
raised target durability lag versions to give more time for batch limiting to come into play before this limit is hit
changed max_bad_options to better reflect the name
2019-08-21 14:55:21 -07:00
Evan Tschannen
37e2fc86de
Increase the target durability lag versions to be larger than the soft max, so that storage servers will respond with a penalty to clients before ratekeeper controls on the lag
2019-08-19 14:03:42 -07:00
Evan Tschannen
9318b494ad
reduce the DD move keys parallelism to avoid a hot read shard when transitioning from triple replication to double replication
2019-08-19 14:02:18 -07:00
Meng Xu
b448f92d61
StorageEngineSwitch:Remove unnecessary code and format code
...
Uncessary code include debug code and the unnecessary calling of
the removeWrongStoreType actor;
Format the changes with clang-format as well.
2019-08-16 16:53:38 -07:00
Meng Xu
85ba904e2c
StorageEngineSwitch:Stop removeWrongStoreType actor if no SS has wrong storeType
2019-08-16 16:11:28 -07:00
Meng Xu
a588710376
StorageEngineSwitch:Graceful switch
...
When fdbcli change storeType for storage engines,
we switch the store type of storage servers one by one gracefully.
This avoids recruiting multiple storage servers on the same process,
which can cause OOM error.
2019-08-12 17:37:52 -07:00
Meng Xu
7ff46e6772
Merge branch 'master' into mengxu/performant-restore-PR
2019-08-07 20:31:56 -07:00
Evan Tschannen
9382a58390
fix: after a forced recovery it is possible to not have logs from all generations, so only wait at most a second for getting a popped txs version
2019-08-06 16:32:28 -07:00
Meng Xu
7ccaeddf05
Merge branch 'master' into mengxu/performant-restore-PR
2019-08-01 13:23:17 -07:00
Evan Tschannen
7d7aa27c2d
Merge pull request #1814 from dongxinEric/feature/1508/finer-grained-dd-controls
...
Added finer grained controls to DataDistribution in fdbcli.
2019-07-31 17:36:20 -07:00
Evan Tschannen
a0b29ff82f
updated knobs to allow more batch priority traffic
2019-07-31 17:19:41 -07:00
Evan Tschannen
4308ff86f7
increased the MAX_TEAMS_PER_SERVER
2019-07-31 16:08:18 -07:00
Xin Dong
b653ddb30d
Final clean ups after rebasing master
2019-07-30 22:35:34 -07:00
Xin Dong
cda70700cc
Address review comments. 50K correctness with no failures.
2019-07-30 22:24:30 -07:00
Evan Tschannen
6dbaddd0a7
Added a knob to always use CAUSAL_READ_RISKY for GRV
2019-07-30 18:21:46 -07:00
Evan Tschannen
5dd9043fd3
addressed review comments
2019-07-30 17:04:41 -07:00
A.J. Beamon
41605735f5
Merge pull request #1916 from ajbeamon/merge-onto-new-servers
...
Add knob to control whether merges request new servers or not.
2019-07-30 15:04:37 -07:00
A.J. Beamon
bc536757df
Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space.
2019-07-29 15:47:34 -07:00
Evan Tschannen
d8b14fe372
we cannot buggify replace content bytes because it takes too long to recovery when the txnStateStore is too large
2019-07-28 19:34:17 -07:00
Evan Tschannen
5c98dcce6d
revert the proxy forwarding path, because it is no longer necessary as clients keep a persistent connection open with coordinators
2019-07-27 16:46:22 -07:00
Evan Tschannen
b509a441e7
Merge branch 'master' into feature-skip-confirm
...
# Conflicts:
# bindings/flow/tester/Tester.actor.cpp
# bindings/go/src/_stacktester/stacktester.go
# bindings/java/src/test/com/apple/foundationdb/test/AsyncStackTester.java
# bindings/java/src/test/com/apple/foundationdb/test/StackTester.java
# bindings/python/tests/tester.py
# bindings/ruby/tests/tester.rb
# documentation/sphinx/source/api-c.rst
# documentation/sphinx/source/api-python.rst
# documentation/sphinx/source/api-ruby.rst
# documentation/sphinx/source/data-modeling.rst
# documentation/sphinx/source/developer-guide.rst
# fdbclient/vexillographer/fdb.options
# fdbserver/MasterProxyServer.actor.cpp
2019-07-27 15:08:13 -07:00
Evan Tschannen
ee94e8a062
removed a trace event which was causing valgrind errors
2019-07-27 13:51:59 -07:00
Evan Tschannen
90e3b50213
Merge branch 'master' into feature-coordinator-connection
...
# Conflicts:
# fdbclient/DatabaseContext.h
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen
ee92f0574f
fix: lastRequestTime was not updated
...
fix: COORDINATOR_REGISTER_INTERVAL was not set
fixed review comments
2019-07-26 13:23:56 -07:00
Meng Xu
45083edf74
Merge branch 'master' into mengxu/performant-restore-PR
...
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
sramamoorthy
a65c9f92ed
get rid of all timeouts and other changes
2019-07-24 15:36:28 -07:00
sramamoorthy
7e04e3c8be
snap v2: knobs for max snap create timeout
2019-07-24 15:36:28 -07:00
Evan Tschannen
c70e762f0e
Merge pull request #1785 from xumengpanda/mengxu/server-team-remover-PR
...
Remove redundant server teams
2019-07-19 17:44:16 -07:00
Meng Xu
b001a9ebe8
ServerTeamRemover runs after machineTeamRemover finishes
...
If serverTeamRemover removes a team before machineTeamRemover brings
the machine team number down to the desired number, DD may create a new
team (due to teams removed by serverTeamRemover), which may be removed
later by machineTeamRemover. This causes unnnecessary extra data movement.
2019-07-19 16:48:52 -07:00
Evan Tschannen
846038b0e6
Merge pull request #1858 from bnamasivayam/rk-ssfetch-throttle
...
Ratekeeper throttling aggressively when unable to fetch storage server list
2019-07-19 16:41:58 -07:00
Alex Miller
c3a8ae4752
Merge pull request #1791 from fzhjon/fetch-keys-requests-priority
...
Introduce priority to fetchKeys requests from data distribution
2019-07-19 14:54:51 -07:00
Balachandar Namasivayam
ecb3de3b49
Fixed space issue.
2019-07-17 18:10:05 -07:00
Balachandar Namasivayam
406bcebdc4
Ratekeeper to throttle tpsLimit to 1 if it is not able to fetch storage server list for some configurable amount of time.
2019-07-17 18:08:17 -07:00
Meng Xu
20f067e794
Merge with master:Resolve conflict with PR#1797
2019-07-16 10:52:28 -07:00
Meng Xu
415622f465
MachineTeamRemover:Change to remove MT with most teams
...
Change to remove machine team with most machine teams, using the same
logic as the serverTeamRemover.
The featue is guarded by TR_FLAG_REMOVE_MT_WITH_MOST_TEAMS knob.
2019-07-15 14:29:49 -07:00
Evan Tschannen
db5b4a6331
avoid going to unlimited immediately after going below the durabilityLagTargetVersion
2019-07-12 18:50:56 -07:00
Evan Tschannen
1a18c859c7
knobified the durability lag rate controls
2019-07-12 18:50:56 -07:00
Evan Tschannen
02de53160d
only skip confirm epoch live if CAUSAL_READ_RISKY is enabled
...
time checked on the proxy should be less than the time waited by the master to account for clock speed differences
setting REQUIRED_MIN_RECOVERY_DURATION and ENFORCED_MIN_RECOVERY_DURATION to 0 will go back to the old behavior
2019-07-12 17:58:16 -07:00
Evan Tschannen
a63969afb3
enforce a minimum recovery duration, which allows proxies to avoid checking if the epoch is alive as long as its last commit has been less than MINIMUM_RECOVERY_DURATION ago
2019-07-12 13:10:21 -07:00
Jon Fu
f12a3909f3
renamed workloads and made code style adjustments
2019-07-11 09:56:58 -07:00
Jon Fu
1e9d31597c
removed extra parameter from getRange, added knob to guard new changes, and adjusted style/formatting in several places
2019-07-11 09:56:58 -07:00
Evan Tschannen
7e919e361c
Merge pull request #1817 from etschannen/feature-proxy-forward
...
Proxies will forward clients to the next generation
2019-07-10 13:53:12 -07:00
Evan Tschannen
49121172ea
Merge pull request #1795 from alexmiller-apple/peek-from-satellites
...
Log Routers will prefer to peek from satellite logs.
2019-07-09 17:38:57 -07:00
Evan Tschannen
001abec29d
fixed a compiler error, buggified a new knob
2019-07-09 16:50:59 -07:00
Evan Tschannen
64aee73c4f
we only need to hold the ReplyPromise for messages that we are going to forward to new proxies
2019-07-09 16:47:56 -07:00
Alex Miller
44f11702a8
Log Routers will prefer to peek from satellite logs.
...
Formerly, they would prefer to peek from the primary's logs. Testing of
a failed region rejoining the cluster revealed that this becomes quite a
strain on the primary logs when extremely large volumes of peek requests
are coming from the Log Routers. It happens that we have satellites
that contain the same mutations with Log Router tags, that have no other
peeking load, so we can prefer to use the satellite to peek rather than
the primary to distribute load across TLogs better.
Unfortunately, this revealed a latent bug in how tagged mutations in the
KnownCommittedVersion->RecoveryVersion gap were copied across
generations when the number of log router tags were decreased.
Satellite TLogs would be assigned log router tags using the
team-building based logic in getPushLocations(), whereas TLogs would
internally re-index tags according to tag.id%logRouterTags. This
mismatch would mean that we could have:
Log0 -2:0 ----- -2:0 Log 0
Log1 -2:1 \
>--- -2:1,-2:0 (-2:2 mod 2 becomes -2:0) Log 1
Log2 -2:2 /
And now we have data that's tagged as -2:0 on a TLog that's not the
preferred location for -2:0, and therefore a BestLocationOnly cursor
would miss the mutations.
This was never noticed before, as we never
used a satellite as a preferred location to peek from. Merge cursors
always peek from all locations, and thus a peek for -2:0 that needed
data from the satellites would have gone to both TLogs and merged the
results.
We now take this mod-based re-indexing into account when assigning which
TLogs need to recover which tags from the previous generation, to make
sure that tag.id%logRouterTags always results in the assigned TLog being
the preferred location.
Unfortunately, previously existing will potentially have existing
satellites with log router tags indexed incorrectly, so this transition
needs to be gated on a `log_version` transition. Old LogSets will have
an old LogVersion, and we won't prefer the sattelite for peeking. Log
Sets post-6.2 (opt-in) or post-6.3 (default) will be indexed correctly,
and therefore we can safely offload peeking onto the satellites.
2019-07-08 22:25:01 -07:00
Meng Xu
3b9618fe11
ServerTeamRemover:Speedup removing teams in simulation
...
Otherwise, simulation may time out when team remover needs to
remove hundreds of teams.
2019-07-08 18:17:21 -07:00
Meng Xu
08a721b320
Merge branch 'master' into mengxu/server-team-remover-PR
2019-07-08 16:30:32 -07:00
Evan Tschannen
c348b3da51
After a proxy dies, it will remain alive for an additional 10 seconds to forward clients to the new proxies
2019-07-08 12:53:40 -07:00
Evan Tschannen
310a5fe9a3
fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy
2019-07-05 17:28:22 -07:00
Evan Tschannen
e7c0ecf729
fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy
2019-07-05 15:46:16 -07:00
Meng Xu
599fcb2e6d
Add serverTeamRemover to remove redundant server teams
2019-07-02 17:40:37 -07:00
Evan Tschannen
b9a6271375
local ratekeeper no longer globally limits
2019-06-28 16:54:22 -07:00
Evan Tschannen
18d5fbf1e0
Avoid jumping from rejecting 0% of requests directly to 20% of requests
2019-06-28 16:54:22 -07:00
Evan Tschannen
db413c37f7
restored the STORAGE_DURABILITY_LAG_SOFT_MAX knob and made the rk target slightly smaller than the soft limit, to avoid inaccuracies in ratekeeper control causing behavior changes on the storage servers
2019-06-28 16:54:22 -07:00
Evan Tschannen
92b32855ca
ratekeeper’s control algorithm would oscillate when limited by local ratekeeper
2019-06-28 16:54:22 -07:00
A.J. Beamon
35b6277a50
Fix knob copy paste error
2019-06-27 12:55:39 -07:00
Alex Miller
61901effed
Increase how long FDB will wait before starting DD to repair data loss.
...
10s is a bit short for starting data distribution, which is rather
expensive. 60s is a bit more reasonable.
2019-06-19 13:40:21 -07:00
Evan Tschannen
20e3edeb0a
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/storageserver.actor.cpp
# versions.target
2019-06-14 12:42:59 -07:00
Evan Tschannen
924f92e5aa
Prevent the byte sample recovery from interfering with storage server recovery
2019-06-13 15:55:25 -07:00
Evan Tschannen
054d775343
increase the delay between idle commits to reduce the rate idle clusters fsync
2019-06-13 14:55:37 -07:00
Trevor Clinkenbeard
8144882d7b
Merge branch 'apple-master' into features/local-rk
2019-06-10 19:40:25 -07:00
Meng Xu
022b555b69
FastRestore:Fix bug in finish restore
...
RestoreMaster may not receive all acks. for the last command, i.e., finishRestore,
because RestoreLoaders and RestoreAppliers exit immediately after sending the ack.
If the ack is lost, it will not be resent.
This commit also removes some unneeded code.
This commit passes 50k random tests without errors.
2019-06-05 20:07:18 -07:00
Meng Xu
477fd152c0
FastRestore:Refactor code
...
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
the file only has functionalities related to restore worker.
Passed correctness test
2019-06-04 11:22:47 -07:00
Evan Tschannen
29b96414e2
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/NativeAPI.actor.cpp
# fdbserver/Coordination.actor.cpp
# flow/Arena.h
# versions.target
2019-06-03 18:49:35 -07:00
Evan Tschannen
7c333dbc16
If a process receives a message in its clusterControllerInterface before becoming the cluster controller, if the process does not become the cluster controller in the next minute it should destroy the interface to prevent a memory leak.
2019-05-29 16:57:13 -07:00
sramamoorthy
31b6c86650
ignorePopDeadline to have high limit in simulator
...
- ignorePopDeadline to have highier limit in simulator
to accommdate for the buggify delays and make snapshot succeed.
- introduce a new knob for auto resetting the disabling of tlog pop
2019-05-28 22:07:46 -07:00
A.J. Beamon
603721e125
Merge branch 'master' into thread-safe-random-number-generation
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/AsyncFileCached.actor.h
# fdbrpc/genericactors.actor.cpp
# fdbrpc/sim2.actor.cpp
# fdbserver/DiskQueue.actor.cpp
# fdbserver/workloads/BulkSetup.actor.h
# flow/ActorCollection.actor.cpp
# flow/Net2.actor.cpp
# flow/Trace.cpp
# flow/flow.cpp
2019-05-23 08:35:47 -07:00
Evan Tschannen
8c3516951a
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-05-12 20:13:49 -07:00
Alex Miller
ea12a54946
Rename DISK_QUEUE_MAX_TRUNCATE_EXTENTS -> ..._BYTES
...
So as to not make filesystem assumptions. This knob did technically
appear in (only the) 6.1.5 release, but this feature was broken 6.1.5,
so thus impossible to use anyway.
2019-05-10 18:26:22 -10:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Evan Tschannen
22499666d0
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/LogRouter.actor.cpp
# flow/Trace.cpp
# versions.target
2019-05-08 18:19:35 -07:00
Alex Miller
0685e6c1c7
Avoid large truncates in the DiskQueue.
...
And instead create a new file while incrementally truncating the old one
down. This avoids queueing up a massive number of filesystem metadata
operations in one call, thus flooding the disk with requests and
stalling out all other filesystem operations.
This sets the knobs so that a truncate of >10GB causes us to create a
new file rather than trying to truncate the old one.
2019-05-08 12:33:31 -10:00
Alex Miller
4052f3826a
Add a knob to limit the number of commits indexed per key.
...
Theoretically, we could spill 20MB of 22B mutations for one key, which
would generate a very long value being stored in SQLite, and very
inefficiently read back. This stops that from being a problem, at the
cost of some extra write calls.
2019-05-03 15:27:10 -07:00
Alex Miller
f4e48c3851
Add a knob to limit amount of data read from sqlite for one PeekRequest.
...
This prevents peeking from degrading over time if there are a very large
number of SpilledData entries for one particular tag.
2019-05-02 17:26:45 -07:00
Evan Tschannen
2d5043c665
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-04-30 18:27:04 -07:00
Evan Tschannen
1a4c1759a4
Merge pull request #1429 from jzhou77/pprof
...
Dump heap profiler when memory usage is high
2019-04-29 16:31:44 -07:00
Evan Tschannen
cacd82758e
Reduced data distribution speeds
2019-04-26 13:54:49 -07:00
Evan Tschannen
9ff8aca1da
Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation)
2019-04-26 13:53:56 -07:00
A.J. Beamon
253d2400ef
Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-04-23 14:38:52 -07:00
A.J. Beamon
4ad0496b39
Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process.
2019-04-23 14:01:51 -07:00
Stephen Atherton
83db547306
Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES.
2019-04-23 04:50:58 -07:00
Jingyu Zhou
6870e132b2
Merge branch 'master' into pprof
2019-04-19 14:06:44 -07:00
Andrew Noyes
d1e86779a6
Address review comments
2019-04-18 08:48:27 -07:00
mpilman
32393ec4c9
Prototype of local ratekeeper
2019-04-08 11:04:44 -07:00
Evan Tschannen
05869a8383
do not log a degraded reset message if the previous reset was more than a week ago
2019-04-07 23:00:58 -07:00
Jingyu Zhou
4b08042a88
Change memory profiling threshold to a flag
2019-04-05 16:33:51 -07:00
Jingyu Zhou
09b2c35d11
Dump heap profiler when memory usage is high
...
Set the threshold of dump to 2GB.
2019-04-05 16:12:23 -07:00
Evan Tschannen
390ab9cfed
A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy
2019-04-04 14:11:12 -07:00
A.J. Beamon
71e2fdafb8
Changes to ratekeeper camel case
2019-03-27 08:24:25 -07:00
Evan Tschannen
6254a1a8e4
fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop
2019-03-22 18:37:39 -07:00
A.J. Beamon
2d7b48dadc
Merge pull request #1311 from etschannen/feature-increase-grv-batch
...
Increased the GRV client batch size
2019-03-19 08:23:05 -07:00
Evan Tschannen
2554fed965
reduce max transaction to start
2019-03-18 16:16:03 -07:00
Evan Tschannen
87e2a1a029
The proxy budget is implemented to let one request over its limit through, and then pay back what was over the limit in the next update
2019-03-18 16:09:57 -07:00
Alex Miller
29ab7370cd
Clear versionLocation when spilling, and pop DQ separately.
...
Popping the disk queue now requires potentially recovering the location
to which we can pop from the spilled data itself, and for each tag we
must maintain the first location with relevant data.
The previous queue we had to represent the ordering, queueOrder, was
used by spilling, and popped when a TLog had been spilled. This means
that as soon as a TLog has been fully spilled, we have no idea how it
relates in order to other fully spilled TLogs.
Instead, use queueOrder to keep track of all the TLog UIDs until they're
removed, and use spillOrder to keep track of the order only for
spilling.
2019-03-18 15:09:22 -07:00
Evan Tschannen
ec6c843124
increased the GRV client batch size, similarly increased the proxy limits related to the number of transactions started in a batch
2019-03-16 16:18:58 -07:00
Evan Tschannen
e068c478b5
merge master
2019-03-12 18:31:25 -07:00
Evan Tschannen
c6e94293bf
reset a process to not be degraded after 2 days
2019-03-10 22:39:21 -07:00
Evan Tschannen
53f16b5347
when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded
2019-03-08 11:46:34 -05:00
Jingyu Zhou
3c86643822
Separate Ratekeeper from data distribution.
...
Add a new role for ratekeeper.
Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Alex Miller
94bf75cb00
Allow the disk queue to shrink if it has unneeded slack space.
2019-03-04 01:42:38 -08:00
Alex Miller
9ef283d4e7
Implement hard limiting of memory used to serve peek requests.
2019-03-04 01:42:38 -08:00
Alex Miller
e7d8520c63
Batch more when spilling data.
2019-03-04 01:42:38 -08:00
Trevor Clinkenbeard
39f612d132
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-03-02 17:07:00 -08:00
A.J. Beamon
a051055caf
Initial implementation of adding separate limits for batch priority in ratekeeper
2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard
abfe057805
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-25 13:47:16 -08:00
Evan Tschannen
b8910ba7cd
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.h
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Meng Xu
9445ac0b0c
Status: Use new data distributor worker to publish status
...
After we add a new data distributor role, we publish the data
related to data distributor and rate keeper through the new
role (and new worker).
So the status needs to contact the data distributor, instead of master,
to get the status information.
2019-02-21 18:05:50 -08:00
Meng Xu
7cca439e00
TeamRemover: Add status to show redundant team removing
...
Distinguish the removal of unhealthy team and redundant team.
Change status report to include redundant team removal report.
2019-02-21 14:16:46 -08:00
Trevor Clinkenbeard
fa96b8dd33
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-20 16:56:16 -08:00
Meng Xu
d86ba0e811
TeamRemover: Change it to run periodically
...
This simplifies the problem of when we should invoke the teamRemover
2019-02-20 16:08:34 -08:00
Evan Tschannen
27e3617548
fix: remove bad teams needed to use dd_stall_check delay, because in simulation the buggified delay time could make us remove bad teams before they submit their ranges to the queue
2019-02-20 14:18:36 -08:00
Evan Tschannen
d4737fac0f
knobify force recovery recovery check delay
2019-02-19 16:05:20 -08:00
Evan Tschannen
065a45e05f
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen
d492395f84
fix: simulation could buggify a delay such that data distribution incorrectly thinks the queue is not processing unhealthy relocations
2019-02-18 14:57:07 -08:00
Meng Xu
6d09ac483c
Merge with master
2019-02-15 17:03:40 -08:00
Jingyu Zhou
fc3a784963
Fix another build team bug
...
The buildTeam() can create teams with undesired storage servers, which are
considered unhealthy. As a result, the data movement can become stuck.
Fix this by adding an ACTOR monitorHealthyTeams that builds team every one
second whenever there is no healthy teams.
Clean up storageServerTracker() interface.
2019-02-14 16:37:16 -08:00
Jingyu Zhou
816f8b1ae1
Per review comments
...
Add a knob for starting distributor delay.
Move distributor failed variable to a local loop.
2019-02-14 16:37:16 -08:00
Jingyu Zhou
e0a7162cf8
Add a failure timeout knob for data distributor.
...
Set default time to 1.0s.
2019-02-14 16:37:16 -08:00
Meng Xu
5481851e82
TeamCollection: Add knobs for team remover
...
Added three knobs to control team remover
bool TR_FLAG_DISABLE_TEAM_REMOVER:
Disable the teamRemover actor
double TR_REMOVE_MACHINE_TEAM_DELAY:
Wait for the specified time before try to remove next machine team
double TR_WAIT_FOR_ALL_MACHINES_HEALTHY_DELAY:
Wait before checking if all machines are healthy
2019-02-13 15:11:56 -08:00
Meng Xu
214a72fba3
TeamCollection: Resolve review comments
...
1) Reduce the frequency of checking if we need to call teamRemover
2) Improve code efficiency in finding the machine team to remove
3) Remove unused code
4) Add sanity check
2019-02-12 10:59:57 -08:00
Meng Xu
3b8ae0fe95
TeamCollection: Add into 6.1 release note
2019-02-08 13:50:27 -08:00
Meng Xu
7cfe6de27e
TeamCollection: Server team number must match machine team number
...
DESIRED_TEAMS_PER_MACHINE must equal to DESIRED_TEAMS_PER_SERVER.
Otherwise, we may have to few machine teams to create enough server teams.
Note that BUGGIFY macro value is based on a random number generator.
When you have two BUGGIFY, one may be true and the other is false.
Also fix a bug in get the number of healthy machine teams.
2019-02-07 13:53:55 -08:00