Xiaoxi Wang
7ee6ca342e
merge with master
2020-08-11 01:01:15 +00:00
Young Liu
2e41391690
Merge master branch
2020-08-06 00:11:00 -07:00
Young Liu
d6a23a4d6b
Resolve comments to make GRV proxy a separate process class
2020-08-06 00:01:57 -07:00
Meng Xu
fe5902994c
Merge pull request #3605 from apple/release-6.3
...
Merge Release 6.3 to master
2020-08-05 23:37:44 -07:00
Meng Xu
7992cef025
FR:Fix sample network pkg can be too big
2020-08-04 22:35:21 -07:00
Xiaoxi Wang
d1cc87452c
merge with master; solve conflicts; solve initialization;
2020-08-02 22:44:07 +00:00
Xiaoxi Wang
0352e8ee0b
pick busiest commit tag periodically
2020-08-02 18:38:56 +00:00
Meng Xu
2f5293fcc7
Introduce knob FASTRESTORE_USE_LOG_FILE and FASTRESTORE_USE_RANGE_FILE
2020-07-31 10:40:29 -07:00
Daniel Smith
a94c4cce85
Add an unsafe option to disable manual fsyncing rocksdb
2020-07-30 22:31:18 +00:00
Meng Xu
ad915e462e
Add knob FASTRESTORE_NOT_WRITE_DB to skip writting to DB
2020-07-30 10:17:17 -07:00
Daniel Smith
abd2e6b979
Add some knobs for tuning and lz4 compaction
2020-07-30 15:42:26 +00:00
Evan Tschannen
a49cb41de7
Merge branch 'release-6.3'
...
# Conflicts:
# CMakeLists.txt
# cmake/ConfigureCompiler.cmake
# fdbserver/Knobs.cpp
# fdbserver/StorageCache.actor.cpp
# fdbserver/storageserver.actor.cpp
# flow/ThreadHelper.actor.h
# flow/serialize.h
# tests/CMakeLists.txt
2020-07-29 00:31:55 -07:00
Evan Tschannen
937df4f839
Merge branch 'release-6.3' of github.com:apple/foundationdb into feature-lifetimetoken-fix
...
# Conflicts:
# documentation/sphinx/source/release-notes/release-notes-630.rst
2020-07-27 10:03:02 -07:00
Young Liu
06c081c714
Merge master branch and resolve conflicts
2020-07-23 22:41:10 -07:00
Evan Tschannen
be67e9cfc7
wait for the correct cluster controller interface before starting master recovery
2020-07-20 11:29:37 -07:00
Steve Atherton
e646361501
Merge branch 'release-6.3' of github.com:apple/foundationdb into feature-redwood
2020-07-20 07:25:29 -07:00
Steve Atherton
d05b7ee785
Pager remap remover now accumulates a configurable version interval of page updates behind the oldest retained version which are used to coalesce updates to the same original page ID to reduce write amplification for many workloads.
2020-07-20 04:08:33 -07:00
Meng Xu
ef8c1060a2
Merge branch 'master' into mengxu/tmp-merge-6.3
2020-07-13 10:15:56 -07:00
A.J. Beamon
b09dddc07e
Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
...
# Conflicts:
# cmake/ConfigureCompiler.cmake
# documentation/sphinx/source/downloads.rst
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/fdbrpc.vcxproj
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen
dd10dbe7c7
Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-ha-fixes
2020-07-09 23:09:14 -07:00
Evan Tschannen
717242a0ee
reset WAN network connections every 5 minutes is responses take more than 500ms
2020-07-09 22:50:47 -07:00
A.J. Beamon
04d1217941
Track statistics about server-side request latency on each process, to include min, max, mean, and various percentiles.
2020-07-09 16:39:15 -07:00
Evan Tschannen
0e2f5e8bb5
Added a flow lock to prevent too many source server fetches from happening at the same time and running the data distributor out of memory
2020-07-09 10:38:19 -07:00
Young Liu
7afee53f4c
Clean up code that serves GRV through other proxies
2020-07-07 21:19:11 -07:00
Meng Xu
f3302833ce
Merge pull request #3435 from apple/release-6.3
...
Merge Release 6.3 to master
2020-06-30 10:08:28 -07:00
Meng Xu
efb61bcac0
Rename knob to FASTRESTORE_TXN_EXTRA_DELAY
2020-06-29 21:16:30 -07:00
Meng Xu
78c45c1200
Knob for txn delay and add back FlowLock to control txn concurrency
2020-06-27 10:13:34 -07:00
Young Liu
f211a54593
Merged from upstream master
2020-06-13 16:47:12 -07:00
A.J. Beamon
739767b838
Delay cluster controller candidacy for all worst fit processes, not just storage servers.
2020-06-10 09:59:56 -07:00
Steve Atherton
c3c7db6a9d
Added flowlock around reads done on the write path so that those reads cannot starve reads to support reads done by the storage server.
2020-06-09 17:00:21 -07:00
Young Liu
a038a02cdd
Serve GetReadVersion through master
2020-06-09 11:16:23 -07:00
A.J. Beamon
d128252e90
Merge release-6.3 into master
2020-05-22 09:25:32 -07:00
Evan Tschannen
7d91a3a919
Merge branch 'release-6.3' of github.com:apple/foundationdb into release-6.3
2020-05-20 16:30:14 -07:00
Meng Xu
c0c15130b8
Merge pull request #3172 from jzhou77/backup-fix
...
Limit memory usage of backup workers
2020-05-20 15:46:11 -07:00
Steve Atherton
4a827c9304
Merge branch 'release-6.3' of github.com:apple/foundationdb into feature-redwood
2020-05-19 03:08:26 -07:00
Evan Tschannen
5013bc9106
fixed client compile issues; reduced the size of storage server interface
2020-05-18 17:20:15 -07:00
Steve Atherton
e8626724f9
More refactor of Redwood metrics. Added per-level BTree stats. Added Pager total physical disk reads/writes, and uncacheable read hit/miss counts. Added logging to TraceEvents.
2020-05-16 02:51:57 -07:00
Steve Atherton
32f4639168
Refactored page remap removal. The process is now called remap cleanup, does batches of remap entries in parallel, coalesces remaps of the same page ID to skip unnecessary writes, and has some knobs for controlling it. FIFOQueue now has peek() to support remap cleanup version lag limits. Added counters for remap cleanup and lazy subtree deletion. Refactored Redwood counters, normalized and grouped their names.
2020-05-15 02:10:51 -07:00
Jingyu Zhou
17915e13b0
Limit memory usage of backup workers
2020-05-14 13:24:56 -07:00
Evan Tschannen
1e0c10e9e8
control tag encoding of keyServersValue using a knob
2020-05-13 13:29:13 -07:00
Steve Atherton
421a6581c1
Lazy delete cycles now run more often and not just when the btree is having mutations actively merged into it. Specifically, it is launched upon recovery, then stopped and relaunched after every commit. This will make better use of I/O in between calls to commit(). Also added knobs to control lazy delete cycle parallelism and work limits.
2020-05-13 02:27:03 -07:00
Alex Miller
283fd3af27
Add a knob which controls writing prefix compressed kvs mem snapshots.
...
Which will be set to on by default in 7.0
2020-05-12 17:01:52 -07:00
Evan Tschannen
a8e0f1d581
removed knob
2020-05-11 18:20:46 -07:00
Evan Tschannen
d0b414ddf2
Merge pull request #3121 from etschannen/master
...
Added a large random delay on failure detection
2020-05-10 17:59:51 -07:00
Evan Tschannen
b1bd5ef83e
Merge pull request #3120 from satherton/feature-redwood
...
Redwood read concurrency limit, some knobs, and memory-only Pager mode.
2020-05-10 17:34:41 -07:00
Evan Tschannen
07111f0e41
add a large random delay on failure detection so that not all storage servers need to attempt to become the cluster controller
2020-05-10 17:09:33 -07:00
Steve Atherton
43f9e4dfad
Implemented concurrent read limit in IKeyValueStore interface for Redwood. Added knobs for Redwood page size, concurrent read limit, and page fill factor. Changed commitSubtree() recursion back to use a vector and waitForAll() because it seems to be lower overhead than ActorCollection.
2020-05-10 16:13:22 -07:00
Evan Tschannen
f9518c3441
Merge pull request #3069 from alexmiller-apple/tls-connection-count
...
YOLO at reducing TLS connection count via doing monitorLeader on coordinators
2020-05-09 17:12:27 -07:00
Evan Tschannen
69affebe40
merge master
2020-05-09 13:29:18 -07:00
Evan Tschannen
2dfae85dc7
the delay for reads is about 15% of the total cost of the read, so start multiple reads with the same delay
2020-05-09 13:26:38 -07:00
A.J. Beamon
02307ba7b6
Merge branch 'master' into transaction-tagging
...
# Conflicts:
# fdbclient/DatabaseContext.h
2020-05-09 07:50:29 -07:00
Jingyu Zhou
a833724322
Merge pull request #3078 from xumengpanda/mengxu/fr-circus-stall-PR
...
Performant restore: Various improvements based on circus test
2020-05-07 20:07:23 -07:00
Meng Xu
a93c23d239
Resovle review comments
2020-05-07 15:06:59 -07:00
A.J. Beamon
fbf436f45f
Various cleanup and knob adjustments.
2020-05-07 09:15:33 -07:00
Alex Miller
8a6e177950
Merge remote-tracking branch 'upstream/master' into tls-connection-count
2020-05-05 16:49:36 -07:00
A.J. Beamon
b1055a8501
Merge branch 'master' into transaction-tagging
2020-05-05 16:03:39 -07:00
Evan Tschannen
f329164fb4
Merge pull request #2532 from dongxinEric/feature/hot-read-key-detection-part-2
...
Feature/hot read key detection part 2
2020-05-05 14:33:34 -07:00
Meng Xu
c49b6756fe
FastRestoreApplier:Trace clear range op when it has too many for debug
2020-05-05 09:28:50 -07:00
Meng Xu
2fec56e7e2
FastRestore:Logging for getReplyBatches
2020-05-04 20:12:59 -07:00
A.J. Beamon
36454bb3b8
Merge branch 'master' into transaction-tagging
...
# Conflicts:
# fdbclient/MasterProxyInterface.h
# fdbclient/NativeAPI.actor.cpp
2020-05-04 10:23:25 -07:00
A.J. Beamon
decf3e82b0
Fix various bugs and make sure to cleanup throttles from the database when they expire
2020-05-01 21:36:28 -07:00
Alex Miller
43a63452d8
YOLO at reducing TLS connection count via doing monitorLeader on coordinators
2020-05-01 14:40:21 -07:00
Meng Xu
a0d67cac16
Merge branch 'master' into mengxu/fr-code-improvement-PR
2020-04-29 21:07:33 -07:00
A.J. Beamon
6ada5359b8
Merge branch 'master' into transaction-tagging
2020-04-29 14:27:21 -07:00
A.J. Beamon
b80225dde0
Initial support for ramping load back up. Fix some logging. Update auto-throttles less frequently.
2020-04-28 15:50:45 -07:00
A.J. Beamon
0ed70accfa
Reorganization of throttle storage in ratekeeper to support various auto-throttling related actions
2020-04-28 14:30:37 -07:00
Evan Tschannen
b7f5f3be48
merge in master
2020-04-28 13:11:47 -07:00
A.J. Beamon
41c517a5dd
Merge branch 'master' into transaction-tagging
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
2020-04-27 13:05:24 -07:00
A.J. Beamon
239876351b
Add some initial auto-throttling. Move the definition of the priority enum to a more global place and use it for all transaction priorites (except in ClientLogEvents, because of serialization incompatibilites).
2020-04-24 11:31:16 -07:00
Evan Tschannen
c87aa33941
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/go/src/fdb/generated.go
# documentation/sphinx/source/api-common.rst.inc
# documentation/sphinx/source/api-ruby.rst
# documentation/sphinx/source/release-notes.rst
# fdbclient/FailureMonitorClient.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/vexillographer/fdb.options
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/OldTLogServer_6_0.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# versions.target
2020-04-23 13:47:53 -07:00
Evan Tschannen
0c84ad4bc6
Merge pull request #2917 from bnamasivayam/fail-slow-ss
...
Mark the storage servers that are continually lagging as unhealthy
2020-04-22 23:18:35 -07:00
A.J. Beamon
9bf5c06d15
Adjust and knobify cost function for ops on the storage server
2020-04-22 14:39:32 -07:00
Evan Tschannen
d0cc2a1ee4
added logging for parallel peeks on TLogs
2020-04-22 14:24:45 -07:00
A.J. Beamon
434704fbd9
Various bug fixes
2020-04-22 12:28:51 -07:00
Meng Xu
2960a2fe8a
FastRestore:Add knob to control parallelism in waiting requests
2020-04-19 21:34:11 -07:00
Evan Tschannen
ba3e2af473
Merge commit '5288033bcfe40c3ade97c8bf2d04cf31b3f16cb1' into feature-tree-broadcast
2020-04-17 15:17:37 -07:00
A.J. Beamon
dfec896438
Enforce a throttle limit. Don't count transaction tags on RK if the proxy has updated us in a while.
2020-04-17 11:48:02 -07:00
Meng Xu
2d9e9a0502
FastRestore:Use knob to guard the expensive way to get range versions
2020-04-17 10:02:58 -07:00
A.J. Beamon
78d48a0dad
Merge branch 'master' into transaction-tagging
...
# Conflicts:
# fdbcli/fdbcli.actor.cpp
# fdbclient/Knobs.h
# fdbclient/NativeAPI.actor.cpp
# fdbclient/fdbclient.vcxproj
# fdbserver/MasterProxyServer.actor.cpp
2020-04-17 09:23:18 -07:00
Xin Dong
7dd7406c59
Merge branch 'master' into feature/hot-read-key-detection-part-2
2020-04-16 14:54:05 -07:00
A.J. Beamon
0fba8c47be
Checkpoint: Ratekeeper sets absolute limits for tag throttles and enforces them by distributing requests to proxies, who distribute them to clients.
...
A few refactorings.
2020-04-16 14:43:22 -07:00
A.J. Beamon
9d6f2352d9
Merge commit 'cf01233f28a2c42908656a39f458a4475c1d44a3' into grv-proxy-perf-improvements
...
# Conflicts:
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/MasterProxyServer.actor.cpp
2020-04-14 13:49:09 -07:00
Meng Xu
dbc9c23193
FastRestore:Loader:Send mutations at different versions in the same message to appliers
...
This increases the bandwidth sent from loaders to appliers.
2020-04-12 10:46:58 -07:00
Evan Tschannen
8f78912483
knobified parameter
2020-04-11 20:54:17 -07:00
Evan Tschannen
07cc0a8d74
code cleanup
2020-04-10 17:02:11 -07:00
A.J. Beamon
ebeca10bce
Change the serialization of tags sent in some messages. Add communication of the sampling rate from cluster to clients.
2020-04-09 16:55:56 -07:00
A.J. Beamon
36da61dd9c
Merge branch 'master' into transaction-tagging
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbclient/vexillographer/fdb.options
2020-04-07 21:12:14 -07:00
Balachandar Namasivayam
73272fc72e
Version difference is now the diff between TLog versions and SS version.
2020-04-03 19:04:43 -07:00
A.J. Beamon
2336f073ad
Checkpointing a bunch of work on throttles. Rudimentary implementation of auto-throttling. Support for manual throttling via fdbcli. Throttles are stored in the system keyspace.
2020-04-03 15:24:14 -07:00
tclinken
884e92bb49
Atomically update dependent knobs
2020-04-01 15:18:49 -07:00
Balachandar Namasivayam
a5af31de23
Addressed simple review comments
2020-03-31 18:34:13 -07:00
Balachandar Namasivayam
b1c3893d40
Fix some corner case bugs exposed by simulation.
...
In one case, when a SS joins the cluster and DD doesn't find any healthy server to form a team with the newly added server, then the SS does not get added to any team even when the other servers get healthy.
Another is an extreme case where a data center is down, and a SS in the active DC joins and then dies immediately but not before DD adds it to a destination team for a relocating shard which will result in DD waiting indefinitely for the dead data center to come back up for the cluster to be fully recovered.
2020-03-31 18:33:12 -07:00
Jingyu Zhou
40b17e1e9b
Remove a no longer unused knob
2020-03-26 13:04:00 -07:00
A.J. Beamon
e0424a52f8
Merge branch 'master' into transaction-tagging
2020-03-25 08:23:11 -07:00
Jingyu Zhou
80d3fa1222
Add delay for master to recruit backup workers
...
This delay is to ensure old epoch's backup workers can save their progress in
the database. Otherwise, the new master could attempts to recruit backup
workers for the old epoch on version ranges that have already been popped. As
a result, the logs will lose data.
2020-03-20 20:15:08 -07:00
Jingyu Zhou
940bea102a
Add a knob to switch mutation logs for parallel restore
...
Knob FASTRESTORE_USE_PARTITIONED_LOGS, default is true to enable partitioned
mutation logs. Otherwise, old mutation logs are used.
2020-03-20 20:13:38 -07:00
A.J. Beamon
26b7e02d4c
Some initial work to support tagging transactions and passing them around.
2020-03-20 11:23:11 -07:00
Jingyu Zhou
34415f82b3
Merge pull request #2832 from xumengpanda/mengxu/backup-code-review-PR
...
Buggify upload delay when backup worker upload data to blob
2020-03-19 21:42:28 -07:00
Meng Xu
94276076de
BackupWorker:Buggify upload delay
...
Add questions to code as well.
2020-03-18 19:04:45 -07:00
Balachandar Namasivayam
58a9bfa78b
Merge pull request #2820 from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report
...
Fix/1977/add back trace event flush failure report
2020-03-18 16:11:44 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
Evan Tschannen
ed4d02a3e4
Merge pull request #2812 from etschannen/feature-proxy-mem-limit
...
Limit the amount of requests the proxy can queue up in memory
2020-03-16 14:56:56 -07:00
Evan Tschannen
d6d347f665
treat a tlog which takes a long time to create its disk queue as failed
2020-03-13 10:31:59 -07:00
Evan Tschannen
243c268d9d
Limit the amount of requests the proxy can queue up in memory
2020-03-13 10:17:49 -07:00
Xin Dong
5967ef5eab
Added back the changes that report trace log flush failures and fix the random crash
2020-03-12 14:34:19 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
e219c1671f
Merge branch 'release-6.2' into feature-dd-region-queue
...
# Conflicts:
# fdbserver/Knobs.h
2020-03-04 16:25:38 -08:00
Evan Tschannen
6d6f184e2f
added a knob which reverts the new queue behavior
2020-03-04 16:23:49 -08:00
Evan Tschannen
b7834b2995
Merge pull request #2774 from etschannen/feature-dd-repopulate-priority
...
Make the DD priority of populating a region lower than machine failures
2020-03-04 16:15:18 -08:00
Xin Dong
39610d15f8
Revert this change since it somehow introduced a random crash detected on circus
2020-03-04 16:14:38 -08:00
Evan Tschannen
6296465e07
Make the DD priority associated with populating a remote region lower than machine failures
2020-03-04 14:07:32 -08:00
Meng Xu
2c6f82e1ab
FastRestore:Add unit name to threshold knob name
2020-03-02 10:52:44 -08:00
Meng Xu
2520e8d44c
FastRestore:Use more concise code as suggested in review
2020-03-01 22:32:36 -08:00
Meng Xu
1ef4cb432b
Merge branch 'master' into mengxu/fast-restore-robust-and-visibility-PR-v2
2020-03-01 20:08:07 -08:00
Meng Xu
ad9b3fb4a8
DD:Add trace for detailed relocate shard info
2020-02-29 13:45:10 -08:00
Meng Xu
01c1a15caf
FastRestore:Applier:Limit fetch keys number in a txn in getAndComputeStagingKeys
2020-02-28 16:53:36 -08:00
A.J. Beamon
993c6e478e
Merge branch 'master' into grv-proxy-perf-improvements
2020-02-28 14:25:42 -08:00
Xin Dong
13e72f7b3b
Merge pull request #2605 from dongxinEric/fix/1977/report-inability-to-flush-trace-log
...
Report inability to flush trace logs.
2020-02-27 12:36:55 -08:00
Meng Xu
97d7eb49b5
FastRestore:Master:Report unavailable role periodically
...
Ping all restore roles and report unavailable ones.
2020-02-26 16:14:55 -08:00
Meng Xu
ca726fc68e
FastRestore:Introduce OOM protection
...
An actor is schedulable to run if the current worker has enough resourc, i.e.,
the worker's memory usage is below the threshold;
Exception: If the actor is working on the current version batch, we have to schedule
the actor to run to avoid dead-lock.
Future: When we release the actors that are blocked by memory usage, we should release them
in increasing order of their version batch.
2020-02-26 14:09:18 -08:00
Evan Tschannen
924d335aa7
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# flow/Knobs.cpp
# flow/Knobs.h
2020-02-25 18:25:19 -08:00
Evan Tschannen
12b5064041
a high free_space_ratio_cutoff is not needed anymore because avoid teams with low disk space is no longer the responsibility of getLoadBytes()
2020-02-25 15:47:10 -08:00
Evan Tschannen
6e7d2ff7dd
prevent the proxy from delaying too long based on an incorrect estimate of the compute time
2020-02-25 15:46:13 -08:00
Xin Dong
090c89e90a
Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request.
2020-02-25 15:39:38 -08:00
Xin Dong
a6580dc15f
Added the ability to ping a trace log writer thread and the monitoring in worker.actor.cpp. The current solution is simple a loose check. We can change this to be accurate check by using 'pthread_kill(writer_thread, 0)'
2020-02-25 15:37:53 -08:00
Xin Dong
034dfe5e42
Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object.
...
- Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler
- A successful flush will reset the accumulated counter.
Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.
2020-02-25 15:37:32 -08:00
A.J. Beamon
80c2848af6
Change the algorithm for the proxy handing out read versions to improve performance and increase responsiveness to changes in workload.
2020-02-24 09:52:31 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Meng Xu
505997ba0a
FastRestore:Switch to new sendBatchRequests that tracks performance and straggler
2020-02-21 15:45:32 -08:00
Meng Xu
05ea79f584
FastRestore:Profile performance for getBatchReplies
...
Generic approach to profile getBatchReplies performance
and detect straggler.
2020-02-21 15:20:22 -08:00
Evan Tschannen
08914a2acd
Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team
2020-02-21 15:14:32 -08:00
Meng Xu
ab2dd36bdc
FastRestore:Generic way to detect stragger
2020-02-21 14:30:08 -08:00
Evan Tschannen
819c55556c
More aggressively attempt to find teams that do not have low disk space
2020-02-20 16:47:50 -08:00
Evan Tschannen
d7c841a28a
Merge pull request #2589 from etschannen/feature-proxy-delay
...
Improve version pipelining on the proxy
2020-02-20 15:23:30 -08:00
Evan Tschannen
fbd45963d8
The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment
2020-02-19 16:48:30 -08:00
Meng Xu
03f699f2f9
Merge branch 'master' into mengxu/fast-restore-applier-multi-applying-PR
2020-02-19 15:22:33 -08:00
Meng Xu
551f1ba4d2
FastRestore:Minor revision for code review
2020-02-19 11:52:24 -08:00
Meng Xu
31a6ec34b7
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-18 16:17:59 -08:00
Meng Xu
c603b20e7e
FastRestore:Resolve review comments
2020-02-18 14:08:27 -08:00
Meng Xu
acf34319c1
FastRestore:Applier:Precompute mutations and apply in parallel
...
Precompute mutations received by an applier;
Only apply the final result to the destination DB;
Execute multiple txns in parallel to apply final results to the destination DB.
2020-02-12 22:47:48 -08:00
Meng Xu
cda8fc189e
FastRestore:AtomicOp:Intro weighted size for atomicOp
...
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Meng Xu
dbce1e9974
FastRestore:Applier:Add metrics counter and proc counter
2020-02-10 16:38:26 -08:00
Meng Xu
1fc793d6a7
FastRestore:Loader:Add metrics counter
2020-02-09 22:06:14 -08:00
Meng Xu
fd5b4af05a
FastRestore:Add trace for each phase on master
2020-02-09 18:54:10 -08:00
Meng Xu
cf331b9a03
FastRestore:monitorFinishedVersion for measuring perf quickly
2020-02-05 14:26:25 -08:00
Meng Xu
7f37a90c48
FastRestore:Introduce FASTRESTORE_VB_PARALLELISM
...
for controlling the number of concurrently running version batches.
2020-01-28 10:39:57 -08:00
Meng Xu
141609e80a
FastRestore:Improve code style and fix typos
2020-01-27 18:13:14 -08:00
Evan Tschannen
231d7830a0
more accurate calculation on the amount of time that proxy should wait before getting a version from the master
2020-01-26 19:47:12 -08:00
Meng Xu
b04e98771e
FastRestore:Replace FastRestoreOpConfig with Knobs
...
And randomize value for the rest of knobs
2020-01-24 14:24:34 -08:00
Evan Tschannen
e167e63eaf
Add delays between proxy batches which roughly corresponding to the amount of work the proxy needs to do. This will help avoid getting a version from the master and then waiting a long time before committing it.
2020-01-23 18:31:51 -08:00