Commit Graph

489 Commits

Author SHA1 Message Date
Evan Tschannen f298436db7 made the style consistent 2020-05-09 15:17:22 -07:00
Evan Tschannen d497910fa2 fixed merge conflict 2020-05-09 13:34:51 -07:00
Evan Tschannen 69affebe40 merge master 2020-05-09 13:29:18 -07:00
Evan Tschannen 2dfae85dc7 the delay for reads is about 15% of the total cost of the read, so start multiple reads with the same delay 2020-05-09 13:26:38 -07:00
Evan Tschannen a4fc593a3a refactored some actors in the storage server to improve performance 2020-05-08 21:08:52 -07:00
Markus Pilman 5f9b127e56 Emit traces regularly about role assignment
We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.

We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
2020-05-08 16:27:57 -07:00
negoyal 749fcd13b0 Merge branch 'master' into fdb_cache_wo_allocator 2020-05-08 16:23:29 -07:00
negoyal 116e186af6 Add code to fetch cached ranges when a cache server startsup. 2020-05-06 18:56:42 -07:00
A.J. Beamon b1055a8501 Merge branch 'master' into transaction-tagging 2020-05-05 16:03:39 -07:00
Evan Tschannen f329164fb4
Merge pull request #2532 from dongxinEric/feature/hot-read-key-detection-part-2
Feature/hot read key detection part 2
2020-05-05 14:33:34 -07:00
negoyal dd033736ed Merge branch 'master' into fdb_cache_subfeature2 2020-05-04 17:29:43 -07:00
Evan Tschannen b68980d686
Merge pull request #2028 from negoyal/cache_storageq_results
Cache storageq results
2020-05-04 11:27:02 -07:00
A.J. Beamon 36454bb3b8 Merge branch 'master' into transaction-tagging
# Conflicts:
#	fdbclient/MasterProxyInterface.h
#	fdbclient/NativeAPI.actor.cpp
2020-05-04 10:23:25 -07:00
A.J. Beamon bb3d4b6b89 Add a bunch of TEST macros and some other little things 2020-05-04 10:11:36 -07:00
negoyal 4fa04c5891 Fixed a corruptionbug. 2020-05-02 09:22:15 -07:00
Evan Tschannen aed2d34bcb Merge branch 'master' into feature-proxy-load-balance
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	flow/Knobs.cpp
2020-05-01 09:19:39 -07:00
A.J. Beamon 41c517a5dd Merge branch 'master' into transaction-tagging
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
2020-04-27 13:05:24 -07:00
A.J. Beamon 9bf5c06d15 Adjust and knobify cost function for ops on the storage server 2020-04-22 14:39:32 -07:00
negoyal 2fa7d485f5 Merge branch 'master' into cache_storageq_results 2020-04-21 17:28:17 -07:00
A.J. Beamon 6619a1a36a Rename transaction tag map. 2020-04-17 09:06:45 -07:00
Xin Dong 7dd7406c59
Merge branch 'master' into feature/hot-read-key-detection-part-2 2020-04-16 14:54:05 -07:00
A.J. Beamon 0fba8c47be Checkpoint: Ratekeeper sets absolute limits for tag throttles and enforces them by distributing requests to proxies, who distribute them to clients.
A few refactorings.
2020-04-16 14:43:22 -07:00
negoyal 7a9bcf8222 Review comments. 2020-04-14 17:45:34 -07:00
Neelam Goyal cd4f3f84b2
Update fdbserver/storageserver.actor.cpp
Co-Authored-By: Trevor Clinkenbeard <trevorclinkenbeard@gmail.com>
2020-04-13 12:19:17 -07:00
A.J. Beamon 6508c891fc Make the TagSet sent to the storage servers optional so we can distinguish no tags from unsampled. 2020-04-10 13:29:28 -07:00
A.J. Beamon 29b2c2f3aa Add hash to StringRef. Use unordered maps for storing tags. Create some helpful typedefs. 2020-04-10 12:54:59 -07:00
A.J. Beamon ebeca10bce Change the serialization of tags sent in some messages. Add communication of the sampling rate from cluster to clients. 2020-04-09 16:55:56 -07:00
Balachandar Namasivayam 73272fc72e Version difference is now the diff between TLog versions and SS version. 2020-04-03 19:04:43 -07:00
Balachandar Namasivayam ad1dd4fd9b Mark the storage servers that are continually lagging as unhealthy and so this will give the Data Distributor the chance to move data out of this server. 2020-03-31 18:25:39 -07:00
Alex Miller 72e5891058 Clean up and rework the debugMutation API.
As a relatively unknown debugging tool for simulation tests, one could
have simulation print when a particular key is handled in various stages
of the commit process.  This functionality was enabled by changing a 0
to a 1 in an #if, and changing a constant to the key in question.

As a proxy and storage server handle mutations, they call debugMutation
or debugKeyRange, which then checks against the mutation against the key
in question, and logs if they match.  A mixture of printfs and
TraceEvents would then be emitted, and for this to actually be usable,
one also needs to comment out some particularly spammy debugKeyRange()
calls.

This PR reworks the API of debugMutation/debugKeyRange, pulls it out
into its own file, and trims what is logged by default into something
useful and understandable:
* debugMutation() now returns a TraceEvent, that one can add more details to before it is logged.
* Data distribution and storage server cleanup operations are no longer logged by default
2020-03-27 03:30:28 -07:00
negoyal 8abac91033 Fixed a bug in cache server while peeking at a version lower than popped version and added some logging. 2020-03-26 12:39:07 -07:00
A.J. Beamon 26b7e02d4c Some initial work to support tagging transactions and passing them around. 2020-03-20 11:23:11 -07:00
negoyal 99a5cb0572 Fix a corruption bug. 2020-03-13 18:52:34 -07:00
negoyal 3acd3ad3af Some bugfixes and cleanup. 2020-03-02 17:11:23 -08:00
negoyal cd949eca71 Merge branch 'master' into fdb_cache_subfeature2 2020-02-26 11:22:08 -08:00
negoyal 0cbd253967 Reduce the amount of tracing in cache server. 2020-02-25 17:32:24 -08:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
A.J. Beamon 5586e6f6d8
Merge pull request #2697 from etschannen/feature-correctness-fixes
A variety of correctness fixes
2020-02-20 13:32:18 -08:00
negoyal a21f29b586 Bugfixes. Identify the cached range correctly. 2020-02-19 16:27:06 -08:00
Evan Tschannen 4326984b1d fix: wait metrics can take a really long time to detect that two shards have been merged into one if both shards are assigned to the same team. Additional information should be added to the request to improve this. 2020-02-19 15:20:38 -08:00
Meng Xu 132f5aa9ba FastRestore:Improve trace name and cosmetic change 2020-02-18 16:41:19 -08:00
A.J. Beamon 60f6b928f6 Slight reorganization of code to make it clearer. 2020-02-12 14:07:02 -08:00
A.J. Beamon 529200018a Merge branch 'release-6.2' into fix-reverse-range-read-byte-limit-bug
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2020-02-10 12:23:52 -08:00
mpilman 5a9d420cb7 Merge remote-tracking branch 'upstream/release-6.2' into release-merges/20200210 2020-02-10 10:02:05 -08:00
A.J. Beamon fa920a6cef Step 5 of fixing storage server range reads: update the logic of reverse range reads to match forward range reads 2020-02-07 10:02:52 -08:00
A.J. Beamon 16167b07d5 Step 4 of fixing storage server range reads: remove another unneeded iteration case in the forward direction when we don't exhaust our limits in the disk read. This also hopefully makes the code a bit clearer. 2020-02-06 13:27:04 -08:00
A.J. Beamon df2b0452b4 Step 3 of fixing storage server range reads: change return type of readRange from VectorRef<KeyValueRef> to RangeResultRef. 2020-02-06 13:19:24 -08:00
A.J. Beamon 1c61957ca1 Step 2 of fixing storage server range reads: eliminate some unnecessary iterations in the forward case 2020-02-06 12:58:59 -08:00
A.J. Beamon 7037edc3f8 Step 1 of fixing storage server range reads: cleanup of the forward direction. This should not change any behavior. 2020-02-06 12:49:02 -08:00
negoyal 85cc35e81e Merge branch 'master' into HEAD 2020-02-05 14:59:55 -08:00
A.J. Beamon f32d515fda Reverse range reads on the storage server would not pass the specified byte limit to the storage engine but would apply it to the results returned, causing a potentially significant amount of wasted reading. 2020-02-05 11:16:40 -08:00
mpilman d09e07f1f5 Merge remote-tracking branch 'upstream/master' into features/icc 2020-02-04 10:26:18 -08:00
Alex Miller ee6490c9d1
Merge pull request #2314 from mengranwo/memory-engine
New Radix-Tree based Memory Storage Engine
2020-01-30 16:20:13 -08:00
A.J. Beamon d1b87f8b7f The storage server could fail to update its version to the latest processed if the peeked data contained a non-empty commit and ended with an empty commit. 2020-01-29 13:17:58 -08:00
Xin Dong b0a1af1288 Added the actual read hot detection algorithm and logging machanism.
- When a shard has a read bandwidth larger than a threshold value(configurable via knob), and it's read-bandwidth/byte-size ratio is also larger than a threshold(configurable via knob), the corresponding shard tracker will run the algorithm
- The algorithm will divide the shard into 10MB(configurable via knob) chunks and try to find the chunk(s) that has large aforementioned ratio
- Then those ranges will be logged into TraceEvents. This will later do more like actually cache them.
2020-01-21 11:19:52 -08:00
Xin Dong 33456e7276 Done the plumbing work at the SSI side. 2020-01-21 11:15:52 -08:00
Evan Tschannen 3f9d9d8b84 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	cmake/FlowCommands.cmake
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/StorageServerInterface.h
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Knobs.h
#	flow/Platform.cpp
#	versions.target
2020-01-16 18:37:47 -08:00
mengranwo f597aa7e18 WIP : deployable/stable version since Nov 3. Start rebase to master branch 2020-01-15 13:49:45 -08:00
Evan Tschannen 4b90487b90 occasionally throw wrong_shard_server when waitMetrics times out so that the waitMetrics request can get the correct number of shards if two shards have been merged or split and the same storage server owns all the chunks 2020-01-15 13:22:18 -08:00
negoyal 34adf9c92c Storages now return a bool flag to indicate if the key[s] might be cached. 2020-01-14 12:33:17 -08:00
Evan Tschannen 855f03a41f ratekeeper needed to check remoteDC in another location
the storage server scoped a transaction incorrectly
2020-01-10 15:58:36 -08:00
Evan Tschannen 4aab9b7bc8 fix: clients would waste time attempting to read from a remote region when it was in the process of catching up 2020-01-10 12:23:59 -08:00
Evan Tschannen 83ad9caf54 implemented a load balancing algorithm which evens out the number of requests processes by each proxy 2020-01-08 01:59:01 -08:00
Alvin Moore 0373b1af91 Added missing braces 2019-12-12 07:36:19 -08:00
Alvin Moore 3bf971ba8b Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/storageserver.actor.cpp
2019-12-12 07:13:12 -08:00
Andrew Noyes 9ef1f4da5c
Update fdbserver/storageserver.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-12-04 09:21:05 -08:00
Andrew Noyes 854c94c5ad Fix another "binding reference to nullptr" 2019-12-03 17:39:17 -08:00
Andrew Noyes f320f6c174 Fix occurrence of undefined behavior
UBSAN has this to say:

flow/Arena.h:982:10: runtime error: reference binding to null pointer of type 'KeyValueRef'

After this change UBSAN no longer complains about this occurrence
2019-11-26 21:34:24 -08:00
Xin Dong c95fa062b2 For the read sampling, use a specialized notify function to avoid unnecessary stack object allocation and a lot branch misses. 2019-11-22 15:21:09 -08:00
Xin Dong b6e1839d84 Code clean up 2019-11-21 13:39:19 -08:00
Xin Dong b282e180d5 Added a knob to disable read sampling 2019-11-20 14:03:20 -08:00
Xin Dong 25fb63e68a For performance concerns, change the read sampling when doing a range read. Now it bills the total cost of a range read to the start key of the range returned. 2019-11-20 14:03:20 -08:00
Xin Dong 3d3e186c83 Removed a place where it's essentially double logging the read size 2019-11-20 14:03:20 -08:00
negoyal a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
Xin Dong 199a34b827 Defined a minimum read cost (a penalty) for empty read or read size smaller than it. Fixed several review comments. 2019-10-30 10:04:19 -07:00
Xin Dong fe54a4bde1 - Changed SHARD_MAX_BYTES_READ_PRE_KEYSEC to be equivalent to 8MiB/s, which when times the sample expire interval(120 seconds) yields 960MiB/s. A shard having a read rate larger than that will be marked as read-hot. The number 960MiB was chosen to be roughtly twice the size of the max allowed shard size to avoid wrongly marking a shard as read-hot when doing a table scan on it.
- Also tuned down the empty key sampling percentage to be 5%.
2019-10-23 12:00:19 -07:00
Xin Dong e6f5748791 Use a large value for read sampling size threshold. Also at sampling site, don't round up small values to avoid sampling every key. 2019-10-22 13:47:58 -07:00
Xin Dong 3efeff04e6 Remove iosPerKSecond metric increment. 2019-10-09 16:42:42 -07:00
Xin Dong cd4757b06c Address review comments 2019-10-09 16:42:42 -07:00
Xin Dong 6b0f771cc0 Fixex a typo in knobs. Addressed some review comments. Added code for actual metric collecting. 2019-10-09 16:42:42 -07:00
Meng Xu d0147e5e5d Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
Resolved Conflicts:
	documentation/sphinx/source/release-notes.rst
	fdbserver/DataDistribution.actor.cpp
	versions.target
2019-10-02 13:22:56 -07:00
Evan Tschannen d0e5b0d3a1 Added a buggify 2019-09-30 13:24:28 -07:00
Evan Tschannen 27db0ca530
Update fdbserver/storageserver.actor.cpp
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-09-30 13:16:31 -07:00
Evan Tschannen eee4404e4e fix: when the shard pointer is replaced with a new AddingShard, we need to restart the warningLogger because the old one will have a pointer to the deleted AddingShard 2019-09-27 19:11:34 -07:00
Evan Tschannen b495cc697b Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-09-13 09:25:08 -07:00
A.J. Beamon e6fbc602df Add metric to track empty reads. 2019-09-12 15:09:22 -07:00
A.J. Beamon 41fd3d9467 Merge branch 'master' into remove-unused-ssi-get-version
# Conflicts:
#	fdbclient/StorageServerInterface.h
#	fdbserver/storageserver.actor.cpp
2019-09-10 08:11:43 -07:00
A.J. Beamon 3d5f769ea3 Add a storage server metric for bytes cleared based on the byte sample. 2019-09-05 11:31:26 -07:00
Meng Xu c2355f721e Merge branch 'master' into mengxu/performant-restore-PR 2019-09-04 17:11:42 -07:00
Meng Xu d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
negoyal 042fb62771 Code cleanup 2019-09-04 16:44:19 -07:00
negoyal 468cffa5d0 More cleanup around const versions of the VectorRef operators.
This PR changes the readRange() to cache the PTree results during the first traversal.
So that we avoid the second traversal during the merge between PTree data and Storage engine data.

I microbenchmarked the VersionedMap standalone. i.e. the set and clear-range mutations were
performed solely at the in-memory storage queue. No other FDB components were involved
in this test. And hence the numbers presented here are the best case numbers.

Test setup:

- 100M mutations: about 5% clearRange and 95% set mutations
- 100M rangeReads
- Keys/Values generated using deterministicRandom()
- Numbers presented below are for two setups, one with a single version generated for all set of mutations
and another where a new version is generated for each mutation (i.e. it's an extreme version test)

                                       Single Versioned       Extreme Versioned
Time to regular readRange                  175.15                 202.365
Time to readRange with cached results      158.798                184.423
2019-09-04 15:39:21 -07:00
negoyal 04405d910f Use the same arena as the the final readRange result for resultCache. 2019-09-03 17:01:24 -07:00
Evan Tschannen a7237c4302
Merge pull request #2045 from atn34/disallow-scalar-network-messages
Disallow scalar network messages
2019-08-30 13:38:54 -07:00
A.J. Beamon 1fdabe62c2
Merge pull request #2048 from etschannen/feature-fix-connections
Fixed two different ways useful connections were being closed
2019-08-30 11:05:02 -07:00
Evan Tschannen 1c0484cffc fix: do not close connections which have outstanding tryGetReplies with the peer 2019-08-29 16:49:57 -07:00
Andrew Noyes 6aa0ada7b1 Replace scalar root types with proper messages 2019-08-28 14:40:50 -07:00
A.J. Beamon 64ce0c3285 Remove the unused getVersion from StorageServerInterface. 2019-08-26 13:53:54 -07:00
negoyal 4a8c301de1 Cache the storage queue results during first pass in readRange.
Currently, we may traverse the PTree backing the storae queue more than once
during the rangeRead operation. This is an attempt to cache the results during
the first traversal and avoid multiple PTree traversals in turn.
2019-08-22 15:23:16 -07:00
negoyal 4ba1725bb5 Cache the storage queue results during first pass in readRange.
Currently, we may traverse the PTree backing the storae queue more than once
during the rangeRead operation. This is an attempt to cache the results during
the first traversal and avoid multiple PTree traversals in turn.
2019-08-22 14:08:22 -07:00
Evan Tschannen 297b65236f added additional trace events to warn when different parts of shard relocations take more than 10 minutes 2019-08-16 14:56:58 -07:00
Meng Xu 7ccaeddf05 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-01 13:23:17 -07:00
Andrew Noyes 1bad0fd44e Make requestTime private 2019-07-31 17:59:35 -07:00
A.J. Beamon 14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
Evan Tschannen 3ad1d95049
Merge pull request #1894 from ajbeamon/trace-file-detail-rename
Expand undefined acronym in trace event detail
2019-07-26 13:34:45 -07:00
Evan Tschannen 8149b5b352
Merge pull request #1413 from atn34/change-connection-file
Switch cluster file feature
2019-07-26 13:27:37 -07:00
Meng Xu 1706aaf199 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflict in TlogServer.actor.cpp by accepting master changes
2019-07-26 11:46:27 -07:00
sramamoorthy 9afd162e2f remove snap v1 related code 2019-07-25 17:29:31 -07:00
A.J. Beamon b91795d288 Send bytes input rate to DD. 2019-07-25 16:27:32 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
sramamoorthy 8f1f0c0435 snap v2: worker and other helper related changes 2019-07-24 15:36:28 -07:00
Trevor Clinkenbeard 9ad9bd4c1f Merge branch 'master' of https://github.com/apple/foundationdb into change-connection-file 2019-07-24 15:22:26 -07:00
A.J. Beamon 639df02f20 Expand undefined acronym in trace event detail 2019-07-24 08:38:36 -07:00
Evan Tschannen 3045826e3c
Merge pull request #1819 from mpilman/flatbuffers-fixes2
Flatbuffers fixes2
2019-07-19 16:33:50 -07:00
Alex Miller c3a8ae4752
Merge pull request #1791 from fzhjon/fetch-keys-requests-priority
Introduce priority to fetchKeys requests from data distribution
2019-07-19 14:54:51 -07:00
Alex Miller 9863ace96c Replace usages with intialization lists.
But C++ needs a bit of help to inference though the templates.
2019-07-18 22:27:36 -07:00
mpilman 1ac2d01b03 Merge remote-tracking branch 'upstream/master' into flatbuffers-fixes2 2019-07-18 09:50:08 -07:00
A.J. Beamon 2cd05e9ac9
Merge pull request #1712 from tclinken/add-local-rk-to-status
Track the local ratekeeper rate in status
2019-07-15 15:17:11 -07:00
mpilman 54416f46fd Pass type as param to VectorRef instead of bool 2019-07-15 15:08:49 -07:00
Trevor Clinkenbeard e1541778ab Added readsRejected counter to storage server 2019-07-15 10:53:19 -07:00
Jon Fu 4b0fdabae5 mark test file as IGNORE and comment out dead placeholder code 2019-07-15 09:45:16 -07:00
mpilman b68f2d925f Serialize range result to string for speed 2019-07-11 23:03:31 -07:00
Jon Fu 652cd77689 fixed merge conflicts and use new TaskPriority enum class 2019-07-11 09:56:59 -07:00
Jon Fu 1e9d31597c removed extra parameter from getRange, added knob to guard new changes, and adjusted style/formatting in several places 2019-07-11 09:56:58 -07:00
Jon Fu f707d186fe added new priority for fetchkeys requests and adjusted ddmetrics workload to run parallel with mako 2019-07-11 09:56:58 -07:00
Trevor Clinkenbeard 1582a2a24d Merge branch 'master' of https://github.com/apple/foundationdb into change-connection-file 2019-07-09 13:41:54 -07:00
Trevor Clinkenbeard 1bac04509e Track the local ratekeeper rate as a percentage
This value is reported in status for each storage server.
2019-07-09 12:46:53 -07:00
Evan Tschannen 310a5fe9a3 fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy 2019-07-05 17:28:22 -07:00
Evan Tschannen 15e894c724 Merge in master 2019-07-05 15:49:24 -07:00
Evan Tschannen e7c0ecf729 fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy 2019-07-05 15:46:16 -07:00
Alex Miller ea6898144d Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-07-03 20:44:15 -07:00
Evan Tschannen 3fb0999e10 revert storage server priority changes 2019-07-02 16:54:47 -07:00
mengranwo e54eedf0e2 Address pr comments, remove wait(tr.commit()) for read-only txn 2019-07-01 16:09:51 -07:00
mengranwo 0ad151e70a style formatting 2019-07-01 16:09:51 -07:00
mengranwo 819b6e3d6d fix compiling error 2019-07-01 16:09:51 -07:00
mengranwo c7148bbb14 address cr comments: 2019-07-01 16:09:51 -07:00
mengranwo d96cdacdd5 fix format issue 2019-07-01 16:09:51 -07:00
mengranwo 11161746f8 add try catch block around tx.onerror() 2019-07-01 16:09:51 -07:00
mengranwo 6b61b0e030 fix syntax error, pass compile 2019-07-01 16:09:51 -07:00
mengranwo 0b9cd18fb4 checking cluster is healthy or not during recovery process(for storage engine), if healthy, delete data files and join as new 2019-07-01 16:09:51 -07:00
Alex Miller 8e1ab6e7db Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-28 17:32:54 -07:00
Evan Tschannen 4cef1d3937 Experimental change of storage write priority 2019-06-28 16:54:22 -07:00
Evan Tschannen 18d5fbf1e0 Avoid jumping from rejecting 0% of requests directly to 20% of requests 2019-06-28 16:54:22 -07:00
Evan Tschannen db413c37f7 restored the STORAGE_DURABILITY_LAG_SOFT_MAX knob and made the rk target slightly smaller than the soft limit, to avoid inaccuracies in ratekeeper control causing behavior changes on the storage servers 2019-06-28 16:54:22 -07:00
Evan Tschannen 92b32855ca ratekeeper’s control algorithm would oscillate when limited by local ratekeeper 2019-06-28 16:54:22 -07:00
Evan Tschannen cfce1e1705 fix: buffered peek cursor would advance very slowly through large ranges of empty versions 2019-06-28 15:54:08 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Stephen Atherton f1f1081202 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
2019-06-24 20:17:49 -07:00
Trevor Clinkenbeard afb0dbcd1c Merge branch 'master' of https://github.com/apple/foundationdb into change-connection-file 2019-06-20 19:11:29 -07:00
mpilman 844dd60202 FDB compiling with intel compiler 2019-06-20 09:29:01 -07:00
mpilman 68ce9a5e75 ProtocolVersion type - second try 2019-06-18 17:55:27 -07:00
mpilman 8576665a90 Revert "Revert "Make protocol version a type""
This reverts commit 455bf3b3ec.
2019-06-18 14:49:04 -07:00
Alex Miller 455bf3b3ec Revert "Make protocol version a type" 2019-06-18 10:59:17 -07:00
mpilman da53a92bec Make protocol version a type
This fixes #1214

The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
Evan Tschannen 20e3edeb0a Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/storageserver.actor.cpp
#	versions.target
2019-06-14 12:42:59 -07:00
Evan Tschannen 924f92e5aa Prevent the byte sample recovery from interfering with storage server recovery 2019-06-13 15:55:25 -07:00
Evan Tschannen 55f7e7d372 fix: The delay inside the disabledMap was causing the storage server updateStorage actor to run on the client process 2019-06-13 14:28:30 -07:00
Evan Tschannen dccb9bc26d fixed a number of correctness problems 2019-06-12 19:40:50 -07:00
Trevor Clinkenbeard 1e8f7e5b82 Refactor NextFastAllocatedSize to be constexpr function 2019-06-11 15:55:23 -07:00
Andrew Noyes bc03421d05 Open Database as switchable only for client 2019-06-11 13:58:22 -07:00
Andrew Noyes 02e173b601 Add changeConnectionFile method to Transaction
Also add tests
2019-06-11 13:58:22 -07:00
Trevor Clinkenbeard cb420ea4bd Only construct waitDescription in simulator 2019-06-11 12:43:39 -07:00
Trevor Clinkenbeard 8144882d7b Merge branch 'apple-master' into features/local-rk 2019-06-10 19:40:25 -07:00
Trevor Clinkenbeard 8dbb231f33 Don't reject read requests until the storage server durability lag gets large enough 2019-06-05 15:42:58 -07:00
Trevor Clinkenbeard d1d98f298a Changed storage server getPenalty calculation.
Penalty should always be >= 1.0
2019-06-05 14:14:40 -07:00
sramamoorthy 4083af0b01 Avoid using trackLatest for TLog pop test cases 2019-05-28 22:07:46 -07:00
sramamoorthy ec7834e2f7 code re-orgnaization and address comments 2019-05-28 22:07:46 -07:00
sramamoorthy b6e037ffbc Replace fork with boost::process::child 2019-05-28 22:07:46 -07:00
sramamoorthy 61e93a9304 Address review comments and minor fixes 2019-05-28 22:07:46 -07:00
sramamoorthy 9e3104c2d4 Fix: races in async exec leading to bad backup 2019-05-28 22:07:46 -07:00
sramamoorthy 090bb53034 ShardInfo::addMutation to handle exec mutation 2019-05-28 22:07:46 -07:00
sramamoorthy cfdad0c5e6 tlog to snapshot exactly at exec version 2019-05-28 22:07:46 -07:00
sramamoorthy 17ecba8313 trace cleanup and other indentation changes 2019-05-28 22:07:46 -07:00
sramamoorthy aa79480d69 changes to make fdbfork asynchronous 2019-05-28 22:07:46 -07:00
sramamoorthy 72dd067173 Trace message changes and fix few FIXMEs 2019-05-28 22:07:46 -07:00
sramamoorthy 69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
A.J. Beamon 603721e125 Merge branch 'master' into thread-safe-random-number-generation
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/genericactors.actor.cpp
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/workloads/BulkSetup.actor.h
#	flow/ActorCollection.actor.cpp
#	flow/Net2.actor.cpp
#	flow/Trace.cpp
#	flow/flow.cpp
2019-05-23 08:35:47 -07:00
Stephen Atherton ebc96a7e0e Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
2019-05-21 23:49:27 -07:00
A.J. Beamon a8b9d8e34b
Merge pull request #1336 from tclinken/fast-allocate-ptree-nodes
Create 96-byte fast allocator for storage queue PTree nodes
2019-05-17 14:22:46 -07:00
Jingyu Zhou b8e7fc1b84 Refactor: add std:: qualifier and use emplace_back 2019-05-17 09:38:50 -10:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Austin Seipp bf378952cb fdbserver: fix some print/scan format warnings
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Trevor Clinkenbeard d339becd7c Fix currentRate calculation for local ratekeeper 2019-04-25 15:35:34 -07:00
Meng Xu 529ce66b6c Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-18 18:02:45 -07:00
Andrew Noyes ef04471a66 Fix more unused-variable warnings 2019-04-17 16:04:10 -07:00
Trevor Clinkenbeard 3426205167 Fixed readGuard usage bug 2019-04-16 15:05:57 -07:00
Trevor Clinkenbeard 1d921da170 readGuard sends server_overloaded error if request is rejected 2019-04-16 11:29:01 -07:00
Trevor Clinkenbeard 0594154644 Fixed getPenalty calculation 2019-04-16 10:17:41 -07:00
Evan Tschannen cd5c9d91fa
Merge pull request #1443 from etschannen/master
Merge 6.1 into master
2019-04-10 17:43:07 -07:00
A.J. Beamon 538b431656 Apply suggestions from code review 2019-04-08 14:55:58 -07:00
A.J. Beamon a7288e1325 Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate. 2019-04-08 14:21:24 -07:00
mpilman d2e74cb2c0 Fix stupid rounding error 2019-04-08 11:05:29 -07:00
mpilman aaa8f73bdc fixed missing refactoring code 2019-04-08 11:05:29 -07:00
mpilman bdba8e22eb Added test and bugfixes 2019-04-08 11:05:29 -07:00
mpilman b944e0b116 generalized read guards, allow for penalty+error 2019-04-08 11:04:44 -07:00
mpilman 32393ec4c9 Prototype of local ratekeeper 2019-04-08 11:04:44 -07:00
mpilman d01cbf3455 Addressed code review comments 2019-04-05 13:12:20 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Meng Xu 70d7c289f4 Merge branch 'master' into mengxu/restore/parallel-v7 2019-03-30 22:13:10 -07:00
Stephen Atherton d5c8b6b083 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
#	flow/flow.h
2019-03-27 13:37:15 -07:00