Commit Graph

257 Commits

Author SHA1 Message Date
Young Liu 16ef2bd3bd Pending commit 2020-08-12 10:34:07 -07:00
Xiaoxi Wang c0b7ac0b7d try to throttle write tag and read tag seperatedly 2020-08-10 23:43:58 +00:00
Xiaoxi Wang cae6f04070 sample on cost instead of bytes 2020-08-10 22:30:06 +00:00
Xiaoxi Wang ba66b1f668 trivial changes 2020-08-06 14:53:26 +00:00
Xiaoxi Wang 13307679c5 use median shard size" 2020-08-05 03:57:25 +00:00
Xiaoxi Wang 484a75dd7b compare read and write tag together 2020-08-04 00:02:41 +00:00
Xiaoxi Wang 52073738c0 fix bugs; change some KNOB values 2020-08-03 22:18:34 +00:00
Xiaoxi Wang 6db7040f17 update DDMetrics periodically 2020-08-03 05:52:38 +00:00
Xiaoxi Wang d1cc87452c merge with master; solve conflicts; solve initialization; 2020-08-02 22:44:07 +00:00
Xiaoxi Wang 0352e8ee0b pick busiest commit tag periodically 2020-08-02 18:38:56 +00:00
Xiaoxi Wang 92c1112c74 consider clear single key 2020-08-01 18:20:13 +00:00
Xiaoxi Wang 4f7dab4951 sample clear op on client 2020-08-01 06:14:52 +00:00
Xiaoxi Wang c3a629588f add client transaction tag sample 2020-07-31 19:08:42 +00:00
Young Liu 30ea639666 Remove debug traces 2020-07-29 07:55:05 -07:00
Young Liu f7b76a92af pass joshua 2020-07-29 07:26:55 -07:00
Xiaoxi Wang 41a3e6c853 add write throttling 2020-07-28 03:49:47 +00:00
Xiaoxi Wang 48a0fb5154 ask DD for shard info 2020-07-25 04:08:12 +00:00
Meng Xu b2a3b4fd83 Merge branch 'master' into mengxu/merge-6.3-PR 2020-07-20 11:34:18 -07:00
Xiaoxi Wang 0df2a8d014 better code style 2020-07-18 01:48:58 +00:00
Xiaoxi Wang 9d0d189cc8 better serialize; TransactionOption::clear patch 2020-07-15 22:39:21 +00:00
Meng Xu 27a21e23bd Add number comment to limitReason_t entries 2020-07-15 10:57:50 -07:00
Xiaoxi Wang eb44ae0e86 finish local shard estimation 2020-07-15 16:08:00 +00:00
Xiaoxi Wang a310faf9d1 solve some code reviews 2020-07-14 17:19:55 +00:00
Xiaoxi Wang c56daf3be7 merge with master 2020-07-14 01:08:56 +00:00
Xiaoxi Wang d512170cd8 add clear cost estimation 2020-07-14 00:18:52 +00:00
A.J. Beamon 11b136c745 Various fixes to tag throttling:
* Master proxy reports transaction counts to ratekeeper for throttled tags only
* The ramp up behavior at the end of an auto-throttle was broken
* Fixed some issues with computing the initial transaction rate for auto-throttles
2020-06-30 16:24:41 -07:00
Evan Tschannen ced65cd30b finished explicitly versioning everything stored in the database 2020-05-22 17:14:21 -07:00
A.J. Beamon cc4874918a Merge branch 'release-6.3' into tag-throttling-by-priority
# Conflicts:
#	fdbserver/Ratekeeper.actor.cpp
2020-05-20 14:26:35 -07:00
A.J. Beamon 14b23c146f Support throttling and unthrottling tags by priority and their auto/manual state in fdbcli. 2020-05-15 12:47:55 -07:00
A.J. Beamon d3f465fd56
Merge pull request #3102 from mpilman/features/trace-roles
Emit traces regularly about role assignment
2020-05-15 08:12:25 -07:00
A.J. Beamon 3ee4912312
Merge pull request #3152 from ajbeamon/tag-throttling-status-improvements
Add and fix tag throttling status fields
2020-05-14 16:08:05 -07:00
Markus Pilman 2cdcab5aa7 address review comments 2020-05-14 14:54:38 -07:00
Markus Pilman 5230668a76 Pass Ratekeeper ID to all RK traces 2020-05-14 14:17:43 -07:00
Markus Pilman c2bc75516f Merge branch 'release-6.3' of github.com:apple/foundationdb into features/trace-roles 2020-05-14 10:34:53 -07:00
A.J. Beamon 634c988059 Tag throttles reacted poorly to the rate being set to max 2020-05-12 18:09:43 -07:00
A.J. Beamon acf1244317 Fix: ratekeeper was logging the auto throttle count for the manual throttle count 2020-05-12 14:10:40 -07:00
Markus Pilman 5f9b127e56 Emit traces regularly about role assignment
We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.

We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
2020-05-08 16:27:57 -07:00
A.J. Beamon fbf436f45f Various cleanup and knob adjustments. 2020-05-07 09:15:33 -07:00
A.J. Beamon 35d382811f Fix compilation error 2020-05-07 08:59:47 -07:00
A.J. Beamon 66b4920fc3 Fix off-by-one error in auto-throttle limit. Allow updating existing auto-throttle when the limit is reached. 2020-05-05 09:58:00 -07:00
A.J. Beamon bb3d4b6b89 Add a bunch of TEST macros and some other little things 2020-05-04 10:11:36 -07:00
A.J. Beamon 31cef6075a Do the auto-throttle ramp up in a better place. Only commit manual throttle limit once. Add some asserts. 2020-05-03 19:15:29 -07:00
A.J. Beamon decf3e82b0 Fix various bugs and make sure to cleanup throttles from the database when they expire 2020-05-01 21:36:28 -07:00
A.J. Beamon 3a7a026aae Fix a couple bugs 2020-04-30 22:24:17 -07:00
A.J. Beamon b80225dde0 Initial support for ramping load back up. Fix some logging. Update auto-throttles less frequently. 2020-04-28 15:50:45 -07:00
A.J. Beamon 0ed70accfa Reorganization of throttle storage in ratekeeper to support various auto-throttling related actions 2020-04-28 14:30:37 -07:00
A.J. Beamon a65e97209a Fix bug in flag management. Pass priority into ratekeeper's updateRate and use it when setting autothrottles. 2020-04-27 11:34:12 -07:00
A.J. Beamon 239876351b Add some initial auto-throttling. Move the definition of the priority enum to a more global place and use it for all transaction priorites (except in ClientLogEvents, because of serialization incompatibilites). 2020-04-24 11:31:16 -07:00
A.J. Beamon 7343c1b333 Logging for throttle changes was moved 2020-04-23 20:51:53 -07:00
A.J. Beamon 35c18ac60a Improvements to expiration. Encode throttles with auto/manual and priority in the key to support throttling the same tag with different values in these parameters. 2020-04-23 20:50:40 -07:00
A.J. Beamon 18f860d9d8 Minor cleanup around expiration on ratekeeper. 2020-04-23 10:42:16 -07:00
A.J. Beamon d2504c08c3 Send throttles at all priorities from RK->MP 2020-04-22 13:47:16 -07:00
A.J. Beamon 434704fbd9 Various bug fixes 2020-04-22 12:28:51 -07:00
A.J. Beamon f1dd0ee298 Protect against a ratekeeper starting up with a clock set in the past (compared to old ratekeeper) extending the duration of throttles excessively. 2020-04-21 16:35:25 -07:00
A.J. Beamon d5fb4d26fe Send tag throttle updates from ratekeeper to proxy only when they change 2020-04-21 16:33:56 -07:00
A.J. Beamon b2c14611f3 Ratekeeper maintains its throttle data a bit differently, and now it aggregates the effects of multiple priorities before sending results to the proxy 2020-04-21 11:58:59 -07:00
A.J. Beamon dfec896438 Enforce a throttle limit. Don't count transaction tags on RK if the proxy has updated us in a while. 2020-04-17 11:48:02 -07:00
A.J. Beamon 6619a1a36a Rename transaction tag map. 2020-04-17 09:06:45 -07:00
A.J. Beamon 2b66dcd24a Some more refactoring. Reduce what is sent from RK->MP->clients 2020-04-17 08:07:01 -07:00
A.J. Beamon 0fba8c47be Checkpoint: Ratekeeper sets absolute limits for tag throttles and enforces them by distributing requests to proxies, who distribute them to clients.
A few refactorings.
2020-04-16 14:43:22 -07:00
A.J. Beamon 7f3fa00897 Reorganize a bit of code. 2020-04-10 13:53:23 -07:00
A.J. Beamon 29b2c2f3aa Add hash to StringRef. Use unordered maps for storing tags. Create some helpful typedefs. 2020-04-10 12:54:59 -07:00
A.J. Beamon 55a0d00ad4 Encoding of tags in the database now supports multiple tags per throttle. Remove throttle prefix search. 2020-04-10 10:12:26 -07:00
A.J. Beamon ebeca10bce Change the serialization of tags sent in some messages. Add communication of the sampling rate from cluster to clients. 2020-04-09 16:55:56 -07:00
A.J. Beamon a1d8623e5f Various changes to the throttling scheme, including most notably that clients now enforce the throttles with info they receive from the proxy. 2020-04-07 16:28:09 -07:00
A.J. Beamon 2336f073ad Checkpointing a bunch of work on throttles. Rudimentary implementation of auto-throttling. Support for manual throttling via fdbcli. Throttles are stored in the system keyspace. 2020-04-03 15:24:14 -07:00
Evan Tschannen 303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen 1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen 08914a2acd Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team 2020-02-21 15:14:32 -08:00
Evan Tschannen 819c55556c More aggressively attempt to find teams that do not have low disk space 2020-02-20 16:47:50 -08:00
Evan Tschannen bf7d7e2f1e
Merge pull request #2499 from ajbeamon/ratekeeper-durable-version-smoother-fix
Fix inaccurate limiting durability lag
2020-02-04 13:04:58 -08:00
Evan Tschannen 3f9d9d8b84 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	cmake/FlowCommands.cmake
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/StorageServerInterface.h
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Knobs.h
#	flow/Platform.cpp
#	versions.target
2020-01-16 18:37:47 -08:00
A.J. Beamon 9668c4471f Clamp infinite limit in ratekeeper 2020-01-14 15:45:24 -08:00
Evan Tschannen 855f03a41f ratekeeper needed to check remoteDC in another location
the storage server scoped a transaction incorrectly
2020-01-10 15:58:36 -08:00
Evan Tschannen 7898f4425f fix: ratekeeper could limit based on remote storage servers 2020-01-10 12:21:08 -08:00
A.J. Beamon 4109be3aca Switch durable version tracking in ratekeeper to use a faster smoother that matches the latest version's smoother. 2019-12-23 12:48:39 -08:00
Alvin Moore 3bf971ba8b Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/storageserver.actor.cpp
2019-12-12 07:13:12 -08:00
Andrew Noyes 6bde67f2b3 Fix UBSAN report
/home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:86:8: runtime error: load of value 1231493777, which is not a valid value for type 'limitReason_t'
    #0 0x310e961 in StorageQueueInfo::StorageQueueInfo(StorageQueueInfo&&) /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:86
    #1 0x310eacd in MapPair<UID, StorageQueueInfo>::MapPair<UID, StorageQueueInfo>(UID&&, StorageQueueInfo&&) /home/anoyes/workspace/foundationdb/flow/IndexedSet.h:242
    #2 0x310b35e in MapPair<std::decay<UID>::type, std::decay<StorageQueueInfo>::type> mapPair<UID, StorageQueueInfo>(UID&&, StorageQueueInfo&&) /home/anoyes/workspace/foundationdb/flow/IndexedSet.h:258
    #3 0x30a8b79 in a_body1 /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:195
    #4 0x309b529 in TrackStorageServerQueueInfoActor /home/anoyes/build/foundationdb/fdbserver/Ratekeeper.actor.g.cpp:495
    #5 0x309b9be in trackStorageServerQueueInfo(RatekeeperData* const&, StorageServerInterface const&) /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:194
    #6 0x30cff63 in a_body1loopBody1when1cont1 /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:303
    #7 0x30cd9da in a_body1loopBody1when1when1 /home/anoyes/build/foundationdb/fdbserver/Ratekeeper.actor.g.cpp:1170
    #8 0x30ed4dd in a_callback_fire /home/anoyes/build/foundationdb/fdbserver/Ratekeeper.actor.g.cpp:1185
    #9 0x30e6d81 in fire /home/anoyes/workspace/foundationdb/flow/flow.h:998
    #10 0x4df0dc in void SAV<Void>::send<Void>(Void&&) /home/anoyes/workspace/foundationdb/flow/flow.h:447
    #11 0x959891 in void Promise<Void>::send<Void>(Void&&) const /home/anoyes/workspace/foundationdb/flow/flow.h:778
    #12 0x7b4b018 in Sim2::execTask(Sim2::Task&) (/home/anoyes/build/foundationdb/bin/fdbserver+0x7b4b018)
    #13 0x7bf9168 in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:979
    #14 0x7be7b68 in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1when1(Void const&, int) /home/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5391
    #15 0x7c329ff in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_callback_fire(ActorCallback<Sim2::RunLoopActor, 0, Void>*, Void) /home/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5406
    #16 0x7c1fc73 in ActorCallback<Sim2::RunLoopActor, 0, Void>::fire(Void const&) /home/anoyes/workspace/foundationdb/flow/flow.h:998
    #17 0x4df0dc in void SAV<Void>::send<Void>(Void&&) /home/anoyes/workspace/foundationdb/flow/flow.h:447
    #18 0x959891 in void Promise<Void>::send<Void>(Void&&) const /home/anoyes/workspace/foundationdb/flow/flow.h:778
    #19 0x7fe74a4 in N2::PromiseTask::operator()() /home/anoyes/workspace/foundationdb/flow/Net2.actor.cpp:481
    #20 0x7fb6ff7 in N2::Net2::run() /home/anoyes/workspace/foundationdb/flow/Net2.actor.cpp:657
    #21 0x7b71bd3 in Sim2::_runActorState<Sim2::_runActor>::a_body1(int) /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:989
    #22 0x7b2ee51 in Sim2::_runActor::_runActor(Sim2* const&) /home/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5608
    #23 0x7b2f268 in Sim2::_run(Sim2* const&) /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:987
    #24 0x7b2f2c8 in Sim2::run() /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:996
    #25 0x21040a6 in main /home/anoyes/workspace/foundationdb/fdbserver/fdbserver.actor.cpp:1793
    #26 0x7f03492ba504 in __libc_start_main (/lib64/libc.so.6+0x22504)
    #27 0x464914  (/home/anoyes/build/foundationdb/bin/fdbserver+0x464914)
2019-12-03 12:49:12 -08:00
Andrew Noyes e0bf7c4d65 Fix signed integer overflow
Not sure if this is the right fix or not

fdbserver/Ratekeeper.actor.cpp:557:40: runtime error: signed integer overflow: -9223372036854775808 - 9223372036854775807 cannot be represented in type 'long long'
2019-12-02 12:51:33 -08:00
Jon Fu d96a7b2c69 Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-03 09:47:45 -07:00
Evan Tschannen 3cc5d484a5 the include and exclude commands do not need to set the moveKeysLockOwnerKey, which will kill the data distribution algorithm 2019-09-27 18:33:56 -07:00
Evan Tschannen 1f2499c74f
Merge pull request #2012 from ajbeamon/rk-durability-lag-considers-mvcc-window
Ratekeeper ignores intentionally non-durable versions on the SS for durability lag computations
2019-08-19 14:24:21 -07:00
Evan Tschannen 2bd59d1055
Merge pull request #2003 from ajbeamon/add-rk-durability-lag-to-status
Add ratekeeper's durability lag statistics to status
2019-08-19 14:19:59 -07:00
A.J. Beamon ac2f310104 Ratekeeper ignores intentionally non-durable versions on the SS for durability lag computations. 2019-08-16 14:46:44 -07:00
A.J. Beamon 6581161dd3 Add ratekeeper's durability lag statistics to status 2019-08-15 11:07:04 -07:00
A.J. Beamon f6ba8509ae Remove unused local rate limit variables in ratekeeper. 2019-08-15 10:08:28 -07:00
Balachandar Namasivayam 14e54f44b3 Address review comments. 2019-07-18 12:32:35 -07:00
Balachandar Namasivayam 406bcebdc4 Ratekeeper to throttle tpsLimit to 1 if it is not able to fetch storage server list for some configurable amount of time. 2019-07-17 18:08:17 -07:00
Evan Tschannen db5b4a6331 avoid going to unlimited immediately after going below the durabilityLagTargetVersion 2019-07-12 18:50:56 -07:00
Evan Tschannen 6e34e16699 durable version needs more smoothing because it will be updated in bursts 2019-07-12 18:50:56 -07:00
Evan Tschannen b2b2e25324 the durabilityLagLimit needs to be tracked separately for batch priority and normal priority 2019-07-12 18:50:56 -07:00
Evan Tschannen fef58e13a4 adding logging for durability lag in ratekeeper 2019-07-12 18:50:56 -07:00
Evan Tschannen 1a18c859c7 knobified the durability lag rate controls 2019-07-12 18:50:56 -07:00
Evan Tschannen c5fb5494f5 a better attempt a ratekeeper control on durability lag 2019-07-12 18:50:56 -07:00
Evan Tschannen dc171b3eae fixed compiler error 2019-07-12 18:50:56 -07:00
Evan Tschannen e85c05c906 experimental slow control on durability lag 2019-07-12 18:50:56 -07:00
Jingyu Zhou 50e7593c5b
Merge pull request #1796 from ajbeamon/remove-trace-event-underscores
Remove trace event underscores
2019-07-05 21:45:55 -07:00
A.J. Beamon 9f4b6fd770 Remove additional underscores 2019-07-05 08:12:25 -07:00
Alex Miller 8e1ab6e7db Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-28 17:32:54 -07:00
Evan Tschannen 5041ff38b1 removed unneeded description 2019-06-28 16:54:22 -07:00
Evan Tschannen a124fc6e8a fixed compiler error 2019-06-28 16:54:22 -07:00
Evan Tschannen b9a6271375 local ratekeeper no longer globally limits 2019-06-28 16:54:22 -07:00
Evan Tschannen f539b5f09a fix: a large targetRateRatio means limiting more 2019-06-28 16:54:22 -07:00
Evan Tschannen db413c37f7 restored the STORAGE_DURABILITY_LAG_SOFT_MAX knob and made the rk target slightly smaller than the soft limit, to avoid inaccuracies in ratekeeper control causing behavior changes on the storage servers 2019-06-28 16:54:22 -07:00
Evan Tschannen a97940a10b fixed compiler error 2019-06-28 16:54:22 -07:00
Evan Tschannen 92b32855ca ratekeeper’s control algorithm would oscillate when limited by local ratekeeper 2019-06-28 16:54:22 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen dccb9bc26d fixed a number of correctness problems 2019-06-12 19:40:50 -07:00
Trevor Clinkenbeard 8144882d7b Merge branch 'apple-master' into features/local-rk 2019-06-10 19:40:25 -07:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
mpilman bdba8e22eb Added test and bugfixes 2019-04-08 11:05:29 -07:00
mpilman 207049e852 fixed serialization 2019-04-08 11:04:44 -07:00
mpilman 32393ec4c9 Prototype of local ratekeeper 2019-04-08 11:04:44 -07:00
A.J. Beamon 91014d4529 Add file changes that I accidentally failed to commit; fix naming issue in worker. 2019-03-27 08:41:19 -07:00
Evan Tschannen 36ab852bb1 Merge branch 'master' into ratekeeper
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-22 18:41:00 -07:00
Evan Tschannen 3ced178348 maxVersionDifference is a copy of a knob which is a double 2019-03-21 12:58:48 -07:00
Jingyu Zhou 99d521ef4f Monitor Ratekeeper and DataDistributor to use stateless processes
Since Ratekeeper and DataDistributor are no longer running with Master, they
might be running with stateful processes before a new Master becomes alive,
which is undesirable.

This PR adds a monitoring of both Ratekeeper and DataDistributor at Cluster
Controller -- if Master runs on a stateless class and RK/DD runs at a worse
class, then RK/DD will be killed. I.e., RK/DD should be running at their own
classes or on the same stateless process as Master. After restart, RK/DD should
be running at a better process class.
2019-03-14 15:00:57 -07:00
Jingyu Zhou 2b0139670e Fix review comment for PR 1176 2019-03-12 12:02:30 -07:00
Jingyu Zhou cdfe906c30 Data distributor pulls batch limited info from proxy
Add a flag in HealthMetrics to indicate that batch priority is rate limited.
Data distributor pulls this flag from proxy to know roughly when rate limiting
happens.

DD uses this information to determine when to do the rebalance in the background,
i.e., moving data from heavily loaded servers to lighter ones. If the cluster is
currently rate limited for batch commits, then the rebalance will use longer
time intervals, otherwise use shorter intervals. See BgDDMountainChopper() and
BgDDValleyFiller() in DataDistributionQueue.actor.cpp.
2019-03-07 13:16:20 -08:00
Jingyu Zhou f43277e819 Format Ratekeeper.actor.cpp code 2019-03-07 13:16:20 -08:00
Jingyu Zhou dc129207a9 Minor fix after rebase. 2019-03-07 13:16:20 -08:00
Jingyu Zhou 517966fce2 Remove lastLimited from rate keeper
Refactor code to make IDE happy.
2019-03-07 13:16:20 -08:00
Jingyu Zhou b2ee41ba33 Remove lastLimited from data distribution
Fix a serialization bug in ServerDBInfo, which causes test failures.
2019-03-07 13:16:20 -08:00
Jingyu Zhou e6ac3f7fe8 Minor fix on ratekeeper work registration. 2019-03-07 13:16:20 -08:00
Jingyu Zhou 3c86643822 Separate Ratekeeper from data distribution.
Add a new role for ratekeeper.

Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Trevor Clinkenbeard 39f612d132 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-03-02 17:07:00 -08:00
Trevor Clinkenbeard 2940b8d5fd Update all per-process storage server health metrics at once
Ratekeeper updates all storage servers' health metrics in updateRate
with only a single map lookup
2019-03-02 16:08:28 -08:00
A.J. Beamon 655c9d82c7 Various cleanup from review 2019-03-01 14:06:47 -08:00
A.J. Beamon 93f7849261 Fix typo 2019-02-28 12:02:47 -08:00
A.J. Beamon 3e6a6a6569 Update status schema for correctness. Send the count of batch transactions started back to ratekeeper so that it can be logged with other ratekeeper metrics. 2019-02-28 12:00:58 -08:00
A.J. Beamon eb629d87a5 Add information about batch ratekeeper to status. Make it possible to track latencies in the ReadWrite workload for concurrently run instances separately. 2019-02-28 09:53:16 -08:00
Trevor Clinkenbeard 3f59f82670 Calculate durabilityLag instead of NDV for health metrics 2019-02-27 16:30:01 -08:00
A.J. Beamon a051055caf Initial implementation of adding separate limits for batch priority in ratekeeper 2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard abfe057805 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-02-25 13:47:16 -08:00
Trevor Clinkenbeard 07f800eeee Got rid of detailed field in GetRateInfoReply message 2019-02-23 17:52:11 -08:00
Trevor Clinkenbeard f3a73963b4 Got rid of detailedLeaseDuration in GetRateInfoReply message 2019-02-23 16:42:11 -08:00
Trevor Clinkenbeard a20f5482bc Created StorageStats struct to combine health metrics for storage servers 2019-02-20 11:57:41 -08:00
Trevor Clinkenbeard 7594606ee2 Use DETAILED_METRIC_UPDATE_RATE knob to determine GetRateInfoReply lease duration 2019-02-20 11:40:17 -08:00
Evan Tschannen 3a572b010f fix: a forced recovery needed to force the data distributor to restart 2019-02-19 16:04:52 -08:00
Trevor Clinkenbeard 80cf5e057f Compute worstStorageNDV for Ratekeeper health metrics 2019-02-02 21:03:02 -08:00
Trevor Clinkenbeard 5822bd65bf Track health metrics in Ratekeeper and send these metrics to proxies in GetRateInfoReply messages 2019-01-31 12:56:58 -08:00
Trevor Clinkenbeard d7930af2cb Storage server periodically calculates cpuUsage and diskUsage metrics. These metrics (as well as all other metrics necessary for health metrics calculation) are sent in the StorageQueuingMetricsReply message. 2019-01-31 12:23:04 -08:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
A.J. Beamon 2a97139d5d This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated. 2018-08-16 10:24:12 -07:00
Alex Miller fb31a6999f Rewrite all files to have #include actorcompiler.h as the last include. 2018-08-14 15:50:26 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen 0e699a3c23 fix: ratekeeper should only control on local logs 2018-05-29 10:51:23 -07:00