Commit Graph

452 Commits

Author SHA1 Message Date
Xiaoxi Wang 7ee6ca342e merge with master 2020-08-11 01:01:15 +00:00
Young Liu 2e41391690 Merge master branch 2020-08-06 00:11:00 -07:00
Young Liu d6a23a4d6b Resolve comments to make GRV proxy a separate process class 2020-08-06 00:01:57 -07:00
Meng Xu fe5902994c
Merge pull request from apple/release-6.3
Merge Release 6.3 to master
2020-08-05 23:37:44 -07:00
Meng Xu 7992cef025 FR:Fix sample network pkg can be too big 2020-08-04 22:35:21 -07:00
Xiaoxi Wang d1cc87452c merge with master; solve conflicts; solve initialization; 2020-08-02 22:44:07 +00:00
Xiaoxi Wang 0352e8ee0b pick busiest commit tag periodically 2020-08-02 18:38:56 +00:00
Meng Xu 2f5293fcc7 Introduce knob FASTRESTORE_USE_LOG_FILE and FASTRESTORE_USE_RANGE_FILE 2020-07-31 10:40:29 -07:00
Daniel Smith a94c4cce85 Add an unsafe option to disable manual fsyncing rocksdb 2020-07-30 22:31:18 +00:00
Meng Xu ad915e462e Add knob FASTRESTORE_NOT_WRITE_DB to skip writting to DB 2020-07-30 10:17:17 -07:00
Daniel Smith abd2e6b979 Add some knobs for tuning and lz4 compaction 2020-07-30 15:42:26 +00:00
Evan Tschannen a49cb41de7 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	cmake/ConfigureCompiler.cmake
#	fdbserver/Knobs.cpp
#	fdbserver/StorageCache.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/ThreadHelper.actor.h
#	flow/serialize.h
#	tests/CMakeLists.txt
2020-07-29 00:31:55 -07:00
Evan Tschannen 937df4f839 Merge branch 'release-6.3' of github.com:apple/foundationdb into feature-lifetimetoken-fix
# Conflicts:
#	documentation/sphinx/source/release-notes/release-notes-630.rst
2020-07-27 10:03:02 -07:00
Young Liu 06c081c714 Merge master branch and resolve conflicts 2020-07-23 22:41:10 -07:00
Evan Tschannen be67e9cfc7 wait for the correct cluster controller interface before starting master recovery 2020-07-20 11:29:37 -07:00
Steve Atherton e646361501 Merge branch 'release-6.3' of github.com:apple/foundationdb into feature-redwood 2020-07-20 07:25:29 -07:00
Steve Atherton d05b7ee785 Pager remap remover now accumulates a configurable version interval of page updates behind the oldest retained version which are used to coalesce updates to the same original page ID to reduce write amplification for many workloads. 2020-07-20 04:08:33 -07:00
Meng Xu ef8c1060a2 Merge branch 'master' into mengxu/tmp-merge-6.3 2020-07-13 10:15:56 -07:00
A.J. Beamon b09dddc07e Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/ConfigureCompiler.cmake
#	documentation/sphinx/source/downloads.rst
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/fdbrpc.vcxproj
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen dd10dbe7c7 Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-ha-fixes 2020-07-09 23:09:14 -07:00
Evan Tschannen 717242a0ee reset WAN network connections every 5 minutes is responses take more than 500ms 2020-07-09 22:50:47 -07:00
A.J. Beamon 04d1217941 Track statistics about server-side request latency on each process, to include min, max, mean, and various percentiles. 2020-07-09 16:39:15 -07:00
Evan Tschannen 0e2f5e8bb5 Added a flow lock to prevent too many source server fetches from happening at the same time and running the data distributor out of memory 2020-07-09 10:38:19 -07:00
Young Liu 7afee53f4c Clean up code that serves GRV through other proxies 2020-07-07 21:19:11 -07:00
Meng Xu f3302833ce
Merge pull request from apple/release-6.3
Merge Release 6.3 to master
2020-06-30 10:08:28 -07:00
Meng Xu efb61bcac0 Rename knob to FASTRESTORE_TXN_EXTRA_DELAY 2020-06-29 21:16:30 -07:00
Meng Xu 78c45c1200 Knob for txn delay and add back FlowLock to control txn concurrency 2020-06-27 10:13:34 -07:00
Young Liu f211a54593 Merged from upstream master 2020-06-13 16:47:12 -07:00
A.J. Beamon 739767b838 Delay cluster controller candidacy for all worst fit processes, not just storage servers. 2020-06-10 09:59:56 -07:00
Steve Atherton c3c7db6a9d Added flowlock around reads done on the write path so that those reads cannot starve reads to support reads done by the storage server. 2020-06-09 17:00:21 -07:00
Young Liu a038a02cdd Serve GetReadVersion through master 2020-06-09 11:16:23 -07:00
A.J. Beamon d128252e90 Merge release-6.3 into master 2020-05-22 09:25:32 -07:00
Evan Tschannen 7d91a3a919 Merge branch 'release-6.3' of github.com:apple/foundationdb into release-6.3 2020-05-20 16:30:14 -07:00
Meng Xu c0c15130b8
Merge pull request from jzhou77/backup-fix
Limit memory usage of backup workers
2020-05-20 15:46:11 -07:00
Steve Atherton 4a827c9304 Merge branch 'release-6.3' of github.com:apple/foundationdb into feature-redwood 2020-05-19 03:08:26 -07:00
Evan Tschannen 5013bc9106 fixed client compile issues; reduced the size of storage server interface 2020-05-18 17:20:15 -07:00
Steve Atherton e8626724f9 More refactor of Redwood metrics. Added per-level BTree stats. Added Pager total physical disk reads/writes, and uncacheable read hit/miss counts. Added logging to TraceEvents. 2020-05-16 02:51:57 -07:00
Steve Atherton 32f4639168 Refactored page remap removal. The process is now called remap cleanup, does batches of remap entries in parallel, coalesces remaps of the same page ID to skip unnecessary writes, and has some knobs for controlling it. FIFOQueue now has peek() to support remap cleanup version lag limits. Added counters for remap cleanup and lazy subtree deletion. Refactored Redwood counters, normalized and grouped their names. 2020-05-15 02:10:51 -07:00
Jingyu Zhou 17915e13b0 Limit memory usage of backup workers 2020-05-14 13:24:56 -07:00
Evan Tschannen 1e0c10e9e8 control tag encoding of keyServersValue using a knob 2020-05-13 13:29:13 -07:00
Steve Atherton 421a6581c1 Lazy delete cycles now run more often and not just when the btree is having mutations actively merged into it. Specifically, it is launched upon recovery, then stopped and relaunched after every commit. This will make better use of I/O in between calls to commit(). Also added knobs to control lazy delete cycle parallelism and work limits. 2020-05-13 02:27:03 -07:00
Alex Miller 283fd3af27 Add a knob which controls writing prefix compressed kvs mem snapshots.
Which will be set to on by default in 7.0
2020-05-12 17:01:52 -07:00
Evan Tschannen a8e0f1d581 removed knob 2020-05-11 18:20:46 -07:00
Evan Tschannen d0b414ddf2
Merge pull request from etschannen/master
Added a large random delay on failure detection
2020-05-10 17:59:51 -07:00
Evan Tschannen b1bd5ef83e
Merge pull request from satherton/feature-redwood
Redwood read concurrency limit, some knobs, and memory-only Pager mode.
2020-05-10 17:34:41 -07:00
Evan Tschannen 07111f0e41 add a large random delay on failure detection so that not all storage servers need to attempt to become the cluster controller 2020-05-10 17:09:33 -07:00
Steve Atherton 43f9e4dfad Implemented concurrent read limit in IKeyValueStore interface for Redwood. Added knobs for Redwood page size, concurrent read limit, and page fill factor. Changed commitSubtree() recursion back to use a vector and waitForAll() because it seems to be lower overhead than ActorCollection. 2020-05-10 16:13:22 -07:00
Evan Tschannen f9518c3441
Merge pull request from alexmiller-apple/tls-connection-count
YOLO at reducing TLS connection count via doing monitorLeader on coordinators
2020-05-09 17:12:27 -07:00
Evan Tschannen 69affebe40 merge master 2020-05-09 13:29:18 -07:00
Evan Tschannen 2dfae85dc7 the delay for reads is about 15% of the total cost of the read, so start multiple reads with the same delay 2020-05-09 13:26:38 -07:00
A.J. Beamon 02307ba7b6 Merge branch 'master' into transaction-tagging
# Conflicts:
#	fdbclient/DatabaseContext.h
2020-05-09 07:50:29 -07:00
Jingyu Zhou a833724322
Merge pull request from xumengpanda/mengxu/fr-circus-stall-PR
Performant restore: Various improvements based on circus test
2020-05-07 20:07:23 -07:00
Meng Xu a93c23d239 Resovle review comments 2020-05-07 15:06:59 -07:00
A.J. Beamon fbf436f45f Various cleanup and knob adjustments. 2020-05-07 09:15:33 -07:00
Alex Miller 8a6e177950 Merge remote-tracking branch 'upstream/master' into tls-connection-count 2020-05-05 16:49:36 -07:00
A.J. Beamon b1055a8501 Merge branch 'master' into transaction-tagging 2020-05-05 16:03:39 -07:00
Evan Tschannen f329164fb4
Merge pull request from dongxinEric/feature/hot-read-key-detection-part-2
Feature/hot read key detection part 2
2020-05-05 14:33:34 -07:00
Meng Xu c49b6756fe FastRestoreApplier:Trace clear range op when it has too many for debug 2020-05-05 09:28:50 -07:00
Meng Xu 2fec56e7e2 FastRestore:Logging for getReplyBatches 2020-05-04 20:12:59 -07:00
A.J. Beamon 36454bb3b8 Merge branch 'master' into transaction-tagging
# Conflicts:
#	fdbclient/MasterProxyInterface.h
#	fdbclient/NativeAPI.actor.cpp
2020-05-04 10:23:25 -07:00
A.J. Beamon decf3e82b0 Fix various bugs and make sure to cleanup throttles from the database when they expire 2020-05-01 21:36:28 -07:00
Alex Miller 43a63452d8 YOLO at reducing TLS connection count via doing monitorLeader on coordinators 2020-05-01 14:40:21 -07:00
Meng Xu a0d67cac16 Merge branch 'master' into mengxu/fr-code-improvement-PR 2020-04-29 21:07:33 -07:00
A.J. Beamon 6ada5359b8 Merge branch 'master' into transaction-tagging 2020-04-29 14:27:21 -07:00
A.J. Beamon b80225dde0 Initial support for ramping load back up. Fix some logging. Update auto-throttles less frequently. 2020-04-28 15:50:45 -07:00
A.J. Beamon 0ed70accfa Reorganization of throttle storage in ratekeeper to support various auto-throttling related actions 2020-04-28 14:30:37 -07:00
Evan Tschannen b7f5f3be48 merge in master 2020-04-28 13:11:47 -07:00
A.J. Beamon 41c517a5dd Merge branch 'master' into transaction-tagging
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
2020-04-27 13:05:24 -07:00
A.J. Beamon 239876351b Add some initial auto-throttling. Move the definition of the priority enum to a more global place and use it for all transaction priorites (except in ClientLogEvents, because of serialization incompatibilites). 2020-04-24 11:31:16 -07:00
Evan Tschannen c87aa33941 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/go/src/fdb/generated.go
#	documentation/sphinx/source/api-common.rst.inc
#	documentation/sphinx/source/api-ruby.rst
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/FailureMonitorClient.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/vexillographer/fdb.options
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	versions.target
2020-04-23 13:47:53 -07:00
Evan Tschannen 0c84ad4bc6
Merge pull request from bnamasivayam/fail-slow-ss
Mark the storage servers that are continually lagging as unhealthy
2020-04-22 23:18:35 -07:00
A.J. Beamon 9bf5c06d15 Adjust and knobify cost function for ops on the storage server 2020-04-22 14:39:32 -07:00
Evan Tschannen d0cc2a1ee4 added logging for parallel peeks on TLogs 2020-04-22 14:24:45 -07:00
A.J. Beamon 434704fbd9 Various bug fixes 2020-04-22 12:28:51 -07:00
Meng Xu 2960a2fe8a FastRestore:Add knob to control parallelism in waiting requests 2020-04-19 21:34:11 -07:00
Evan Tschannen ba3e2af473 Merge commit '5288033bcfe40c3ade97c8bf2d04cf31b3f16cb1' into feature-tree-broadcast 2020-04-17 15:17:37 -07:00
A.J. Beamon dfec896438 Enforce a throttle limit. Don't count transaction tags on RK if the proxy has updated us in a while. 2020-04-17 11:48:02 -07:00
Meng Xu 2d9e9a0502 FastRestore:Use knob to guard the expensive way to get range versions 2020-04-17 10:02:58 -07:00
A.J. Beamon 78d48a0dad Merge branch 'master' into transaction-tagging
# Conflicts:
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/Knobs.h
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/fdbclient.vcxproj
#	fdbserver/MasterProxyServer.actor.cpp
2020-04-17 09:23:18 -07:00
Xin Dong 7dd7406c59
Merge branch 'master' into feature/hot-read-key-detection-part-2 2020-04-16 14:54:05 -07:00
A.J. Beamon 0fba8c47be Checkpoint: Ratekeeper sets absolute limits for tag throttles and enforces them by distributing requests to proxies, who distribute them to clients.
A few refactorings.
2020-04-16 14:43:22 -07:00
A.J. Beamon 9d6f2352d9 Merge commit 'cf01233f28a2c42908656a39f458a4475c1d44a3' into grv-proxy-perf-improvements
# Conflicts:
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/MasterProxyServer.actor.cpp
2020-04-14 13:49:09 -07:00
Meng Xu dbc9c23193 FastRestore:Loader:Send mutations at different versions in the same message to appliers
This increases the bandwidth sent from loaders to appliers.
2020-04-12 10:46:58 -07:00
Evan Tschannen 8f78912483 knobified parameter 2020-04-11 20:54:17 -07:00
Evan Tschannen 07cc0a8d74 code cleanup 2020-04-10 17:02:11 -07:00
A.J. Beamon ebeca10bce Change the serialization of tags sent in some messages. Add communication of the sampling rate from cluster to clients. 2020-04-09 16:55:56 -07:00
A.J. Beamon 36da61dd9c Merge branch 'master' into transaction-tagging
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/vexillographer/fdb.options
2020-04-07 21:12:14 -07:00
Balachandar Namasivayam 73272fc72e Version difference is now the diff between TLog versions and SS version. 2020-04-03 19:04:43 -07:00
A.J. Beamon 2336f073ad Checkpointing a bunch of work on throttles. Rudimentary implementation of auto-throttling. Support for manual throttling via fdbcli. Throttles are stored in the system keyspace. 2020-04-03 15:24:14 -07:00
tclinken 884e92bb49 Atomically update dependent knobs 2020-04-01 15:18:49 -07:00
Balachandar Namasivayam a5af31de23 Addressed simple review comments 2020-03-31 18:34:13 -07:00
Balachandar Namasivayam b1c3893d40 Fix some corner case bugs exposed by simulation.
In one case, when a SS joins the cluster and DD doesn't find any healthy server to form a team with the newly added server, then the SS does not get added to any team even when the other servers get healthy.
Another is an extreme case where a data center is down, and a SS in the active DC joins and then dies immediately but not before DD adds it to a destination team for a relocating shard which will result in DD waiting indefinitely for the dead data center to come back up for the cluster to be fully recovered.
2020-03-31 18:33:12 -07:00
Jingyu Zhou 40b17e1e9b Remove a no longer unused knob 2020-03-26 13:04:00 -07:00
A.J. Beamon e0424a52f8 Merge branch 'master' into transaction-tagging 2020-03-25 08:23:11 -07:00
Jingyu Zhou 80d3fa1222 Add delay for master to recruit backup workers
This delay is to ensure old epoch's backup workers can save their progress in
the database. Otherwise, the new master could attempts to recruit backup
workers for the old epoch on version ranges that have already been popped. As
a result, the logs will lose data.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 940bea102a Add a knob to switch mutation logs for parallel restore
Knob FASTRESTORE_USE_PARTITIONED_LOGS, default is true to enable partitioned
mutation logs. Otherwise, old mutation logs are used.
2020-03-20 20:13:38 -07:00
A.J. Beamon 26b7e02d4c Some initial work to support tagging transactions and passing them around. 2020-03-20 11:23:11 -07:00
Jingyu Zhou 34415f82b3
Merge pull request from xumengpanda/mengxu/backup-code-review-PR
Buggify upload delay when backup worker upload data to blob
2020-03-19 21:42:28 -07:00
Meng Xu 94276076de BackupWorker:Buggify upload delay
Add questions to code as well.
2020-03-18 19:04:45 -07:00
Balachandar Namasivayam 58a9bfa78b
Merge pull request from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report
Fix/1977/add back trace event flush failure report
2020-03-18 16:11:44 -07:00
Evan Tschannen e08f0201f1 merge release 6.2 into master 2020-03-17 12:51:47 -07:00
Evan Tschannen ed4d02a3e4
Merge pull request from etschannen/feature-proxy-mem-limit
Limit the amount of requests the proxy can queue up in memory
2020-03-16 14:56:56 -07:00
Evan Tschannen d6d347f665 treat a tlog which takes a long time to create its disk queue as failed 2020-03-13 10:31:59 -07:00
Evan Tschannen 243c268d9d Limit the amount of requests the proxy can queue up in memory 2020-03-13 10:17:49 -07:00
Xin Dong 5967ef5eab Added back the changes that report trace log flush failures and fix the random crash 2020-03-12 14:34:19 -07:00
Evan Tschannen 303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen e219c1671f Merge branch 'release-6.2' into feature-dd-region-queue
# Conflicts:
#	fdbserver/Knobs.h
2020-03-04 16:25:38 -08:00
Evan Tschannen 6d6f184e2f added a knob which reverts the new queue behavior 2020-03-04 16:23:49 -08:00
Evan Tschannen b7834b2995
Merge pull request from etschannen/feature-dd-repopulate-priority
Make the DD priority of populating a region lower than machine failures
2020-03-04 16:15:18 -08:00
Xin Dong 39610d15f8 Revert this change since it somehow introduced a random crash detected on circus 2020-03-04 16:14:38 -08:00
Evan Tschannen 6296465e07 Make the DD priority associated with populating a remote region lower than machine failures 2020-03-04 14:07:32 -08:00
Meng Xu 2c6f82e1ab FastRestore:Add unit name to threshold knob name 2020-03-02 10:52:44 -08:00
Meng Xu 2520e8d44c FastRestore:Use more concise code as suggested in review 2020-03-01 22:32:36 -08:00
Meng Xu 1ef4cb432b Merge branch 'master' into mengxu/fast-restore-robust-and-visibility-PR-v2 2020-03-01 20:08:07 -08:00
Meng Xu ad9b3fb4a8 DD:Add trace for detailed relocate shard info 2020-02-29 13:45:10 -08:00
Meng Xu 01c1a15caf FastRestore:Applier:Limit fetch keys number in a txn in getAndComputeStagingKeys 2020-02-28 16:53:36 -08:00
A.J. Beamon 993c6e478e Merge branch 'master' into grv-proxy-perf-improvements 2020-02-28 14:25:42 -08:00
Xin Dong 13e72f7b3b
Merge pull request from dongxinEric/fix/1977/report-inability-to-flush-trace-log
Report inability to flush trace logs.
2020-02-27 12:36:55 -08:00
Meng Xu 97d7eb49b5 FastRestore:Master:Report unavailable role periodically
Ping all restore roles and report unavailable ones.
2020-02-26 16:14:55 -08:00
Meng Xu ca726fc68e FastRestore:Introduce OOM protection
An actor is schedulable to run if the current worker has enough resourc, i.e.,
the worker's memory usage is below the threshold;
Exception: If the actor is working on the current version batch, we have to schedule
the actor to run to avoid dead-lock.
Future: When we release the actors that are blocked by memory usage, we should release them
in increasing order of their version batch.
2020-02-26 14:09:18 -08:00
Evan Tschannen 924d335aa7 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	flow/Knobs.cpp
#	flow/Knobs.h
2020-02-25 18:25:19 -08:00
Evan Tschannen 12b5064041 a high free_space_ratio_cutoff is not needed anymore because avoid teams with low disk space is no longer the responsibility of getLoadBytes() 2020-02-25 15:47:10 -08:00
Evan Tschannen 6e7d2ff7dd prevent the proxy from delaying too long based on an incorrect estimate of the compute time 2020-02-25 15:46:13 -08:00
Xin Dong 090c89e90a Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request. 2020-02-25 15:39:38 -08:00
Xin Dong a6580dc15f Added the ability to ping a trace log writer thread and the monitoring in worker.actor.cpp. The current solution is simple a loose check. We can change this to be accurate check by using 'pthread_kill(writer_thread, 0)' 2020-02-25 15:37:53 -08:00
Xin Dong 034dfe5e42 Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object.
- Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler
    - A successful flush will reset the accumulated counter.
    Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.
2020-02-25 15:37:32 -08:00
A.J. Beamon 80c2848af6 Change the algorithm for the proxy handing out read versions to improve performance and increase responsiveness to changes in workload. 2020-02-24 09:52:31 -08:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Meng Xu 505997ba0a FastRestore:Switch to new sendBatchRequests that tracks performance and straggler 2020-02-21 15:45:32 -08:00
Meng Xu 05ea79f584 FastRestore:Profile performance for getBatchReplies
Generic approach to profile getBatchReplies performance
and detect straggler.
2020-02-21 15:20:22 -08:00
Evan Tschannen 08914a2acd Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team 2020-02-21 15:14:32 -08:00
Meng Xu ab2dd36bdc FastRestore:Generic way to detect stragger 2020-02-21 14:30:08 -08:00
Evan Tschannen 819c55556c More aggressively attempt to find teams that do not have low disk space 2020-02-20 16:47:50 -08:00
Evan Tschannen d7c841a28a
Merge pull request from etschannen/feature-proxy-delay
Improve version pipelining on the proxy
2020-02-20 15:23:30 -08:00
Evan Tschannen fbd45963d8 The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment 2020-02-19 16:48:30 -08:00
Meng Xu 03f699f2f9 Merge branch 'master' into mengxu/fast-restore-applier-multi-applying-PR 2020-02-19 15:22:33 -08:00
Meng Xu 551f1ba4d2 FastRestore:Minor revision for code review 2020-02-19 11:52:24 -08:00
Meng Xu 31a6ec34b7 Merge branch 'master' into mengxu/fast-restore-agent-PR 2020-02-18 16:17:59 -08:00
Meng Xu c603b20e7e FastRestore:Resolve review comments 2020-02-18 14:08:27 -08:00
Meng Xu acf34319c1 FastRestore:Applier:Precompute mutations and apply in parallel
Precompute mutations received by an applier;
Only apply the final result to the destination DB;
Execute multiple txns in parallel to apply final results to the destination DB.
2020-02-12 22:47:48 -08:00
Meng Xu cda8fc189e FastRestore:AtomicOp:Intro weighted size for atomicOp
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Meng Xu dbce1e9974 FastRestore:Applier:Add metrics counter and proc counter 2020-02-10 16:38:26 -08:00
Meng Xu 1fc793d6a7 FastRestore:Loader:Add metrics counter 2020-02-09 22:06:14 -08:00
Meng Xu fd5b4af05a FastRestore:Add trace for each phase on master 2020-02-09 18:54:10 -08:00
Meng Xu cf331b9a03 FastRestore:monitorFinishedVersion for measuring perf quickly 2020-02-05 14:26:25 -08:00
Meng Xu 7f37a90c48 FastRestore:Introduce FASTRESTORE_VB_PARALLELISM
for controlling the number of concurrently running version batches.
2020-01-28 10:39:57 -08:00
Meng Xu 141609e80a FastRestore:Improve code style and fix typos 2020-01-27 18:13:14 -08:00
Evan Tschannen 231d7830a0 more accurate calculation on the amount of time that proxy should wait before getting a version from the master 2020-01-26 19:47:12 -08:00
Meng Xu b04e98771e FastRestore:Replace FastRestoreOpConfig with Knobs
And randomize value for the rest of knobs
2020-01-24 14:24:34 -08:00
Evan Tschannen e167e63eaf Add delays between proxy batches which roughly corresponding to the amount of work the proxy needs to do. This will help avoid getting a version from the master and then waiting a long time before committing it. 2020-01-23 18:31:51 -08:00