Young Liu
16ef2bd3bd
Pending commit
2020-08-12 10:34:07 -07:00
Xiaoxi Wang
c0b7ac0b7d
try to throttle write tag and read tag seperatedly
2020-08-10 23:43:58 +00:00
Xiaoxi Wang
cae6f04070
sample on cost instead of bytes
2020-08-10 22:30:06 +00:00
Xiaoxi Wang
ba66b1f668
trivial changes
2020-08-06 14:53:26 +00:00
Xiaoxi Wang
13307679c5
use median shard size"
2020-08-05 03:57:25 +00:00
Xiaoxi Wang
484a75dd7b
compare read and write tag together
2020-08-04 00:02:41 +00:00
Xiaoxi Wang
52073738c0
fix bugs; change some KNOB values
2020-08-03 22:18:34 +00:00
Xiaoxi Wang
6db7040f17
update DDMetrics periodically
2020-08-03 05:52:38 +00:00
Xiaoxi Wang
d1cc87452c
merge with master; solve conflicts; solve initialization;
2020-08-02 22:44:07 +00:00
Xiaoxi Wang
0352e8ee0b
pick busiest commit tag periodically
2020-08-02 18:38:56 +00:00
Xiaoxi Wang
92c1112c74
consider clear single key
2020-08-01 18:20:13 +00:00
Xiaoxi Wang
4f7dab4951
sample clear op on client
2020-08-01 06:14:52 +00:00
Xiaoxi Wang
c3a629588f
add client transaction tag sample
2020-07-31 19:08:42 +00:00
Young Liu
30ea639666
Remove debug traces
2020-07-29 07:55:05 -07:00
Young Liu
f7b76a92af
pass joshua
2020-07-29 07:26:55 -07:00
Xiaoxi Wang
41a3e6c853
add write throttling
2020-07-28 03:49:47 +00:00
Xiaoxi Wang
48a0fb5154
ask DD for shard info
2020-07-25 04:08:12 +00:00
Meng Xu
b2a3b4fd83
Merge branch 'master' into mengxu/merge-6.3-PR
2020-07-20 11:34:18 -07:00
Xiaoxi Wang
0df2a8d014
better code style
2020-07-18 01:48:58 +00:00
Xiaoxi Wang
9d0d189cc8
better serialize; TransactionOption::clear patch
2020-07-15 22:39:21 +00:00
Meng Xu
27a21e23bd
Add number comment to limitReason_t entries
2020-07-15 10:57:50 -07:00
Xiaoxi Wang
eb44ae0e86
finish local shard estimation
2020-07-15 16:08:00 +00:00
Xiaoxi Wang
a310faf9d1
solve some code reviews
2020-07-14 17:19:55 +00:00
Xiaoxi Wang
c56daf3be7
merge with master
2020-07-14 01:08:56 +00:00
Xiaoxi Wang
d512170cd8
add clear cost estimation
2020-07-14 00:18:52 +00:00
A.J. Beamon
11b136c745
Various fixes to tag throttling:
...
* Master proxy reports transaction counts to ratekeeper for throttled tags only
* The ramp up behavior at the end of an auto-throttle was broken
* Fixed some issues with computing the initial transaction rate for auto-throttles
2020-06-30 16:24:41 -07:00
Evan Tschannen
ced65cd30b
finished explicitly versioning everything stored in the database
2020-05-22 17:14:21 -07:00
A.J. Beamon
cc4874918a
Merge branch 'release-6.3' into tag-throttling-by-priority
...
# Conflicts:
# fdbserver/Ratekeeper.actor.cpp
2020-05-20 14:26:35 -07:00
A.J. Beamon
14b23c146f
Support throttling and unthrottling tags by priority and their auto/manual state in fdbcli.
2020-05-15 12:47:55 -07:00
A.J. Beamon
d3f465fd56
Merge pull request #3102 from mpilman/features/trace-roles
...
Emit traces regularly about role assignment
2020-05-15 08:12:25 -07:00
A.J. Beamon
3ee4912312
Merge pull request #3152 from ajbeamon/tag-throttling-status-improvements
...
Add and fix tag throttling status fields
2020-05-14 16:08:05 -07:00
Markus Pilman
2cdcab5aa7
address review comments
2020-05-14 14:54:38 -07:00
Markus Pilman
5230668a76
Pass Ratekeeper ID to all RK traces
2020-05-14 14:17:43 -07:00
Markus Pilman
c2bc75516f
Merge branch 'release-6.3' of github.com:apple/foundationdb into features/trace-roles
2020-05-14 10:34:53 -07:00
A.J. Beamon
634c988059
Tag throttles reacted poorly to the rate being set to max
2020-05-12 18:09:43 -07:00
A.J. Beamon
acf1244317
Fix: ratekeeper was logging the auto throttle count for the manual throttle count
2020-05-12 14:10:40 -07:00
Markus Pilman
5f9b127e56
Emit traces regularly about role assignment
...
We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.
We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
2020-05-08 16:27:57 -07:00
A.J. Beamon
fbf436f45f
Various cleanup and knob adjustments.
2020-05-07 09:15:33 -07:00
A.J. Beamon
35d382811f
Fix compilation error
2020-05-07 08:59:47 -07:00
A.J. Beamon
66b4920fc3
Fix off-by-one error in auto-throttle limit. Allow updating existing auto-throttle when the limit is reached.
2020-05-05 09:58:00 -07:00
A.J. Beamon
bb3d4b6b89
Add a bunch of TEST macros and some other little things
2020-05-04 10:11:36 -07:00
A.J. Beamon
31cef6075a
Do the auto-throttle ramp up in a better place. Only commit manual throttle limit once. Add some asserts.
2020-05-03 19:15:29 -07:00
A.J. Beamon
decf3e82b0
Fix various bugs and make sure to cleanup throttles from the database when they expire
2020-05-01 21:36:28 -07:00
A.J. Beamon
3a7a026aae
Fix a couple bugs
2020-04-30 22:24:17 -07:00
A.J. Beamon
b80225dde0
Initial support for ramping load back up. Fix some logging. Update auto-throttles less frequently.
2020-04-28 15:50:45 -07:00
A.J. Beamon
0ed70accfa
Reorganization of throttle storage in ratekeeper to support various auto-throttling related actions
2020-04-28 14:30:37 -07:00
A.J. Beamon
a65e97209a
Fix bug in flag management. Pass priority into ratekeeper's updateRate and use it when setting autothrottles.
2020-04-27 11:34:12 -07:00
A.J. Beamon
239876351b
Add some initial auto-throttling. Move the definition of the priority enum to a more global place and use it for all transaction priorites (except in ClientLogEvents, because of serialization incompatibilites).
2020-04-24 11:31:16 -07:00
A.J. Beamon
7343c1b333
Logging for throttle changes was moved
2020-04-23 20:51:53 -07:00
A.J. Beamon
35c18ac60a
Improvements to expiration. Encode throttles with auto/manual and priority in the key to support throttling the same tag with different values in these parameters.
2020-04-23 20:50:40 -07:00
A.J. Beamon
18f860d9d8
Minor cleanup around expiration on ratekeeper.
2020-04-23 10:42:16 -07:00
A.J. Beamon
d2504c08c3
Send throttles at all priorities from RK->MP
2020-04-22 13:47:16 -07:00
A.J. Beamon
434704fbd9
Various bug fixes
2020-04-22 12:28:51 -07:00
A.J. Beamon
f1dd0ee298
Protect against a ratekeeper starting up with a clock set in the past (compared to old ratekeeper) extending the duration of throttles excessively.
2020-04-21 16:35:25 -07:00
A.J. Beamon
d5fb4d26fe
Send tag throttle updates from ratekeeper to proxy only when they change
2020-04-21 16:33:56 -07:00
A.J. Beamon
b2c14611f3
Ratekeeper maintains its throttle data a bit differently, and now it aggregates the effects of multiple priorities before sending results to the proxy
2020-04-21 11:58:59 -07:00
A.J. Beamon
dfec896438
Enforce a throttle limit. Don't count transaction tags on RK if the proxy has updated us in a while.
2020-04-17 11:48:02 -07:00
A.J. Beamon
6619a1a36a
Rename transaction tag map.
2020-04-17 09:06:45 -07:00
A.J. Beamon
2b66dcd24a
Some more refactoring. Reduce what is sent from RK->MP->clients
2020-04-17 08:07:01 -07:00
A.J. Beamon
0fba8c47be
Checkpoint: Ratekeeper sets absolute limits for tag throttles and enforces them by distributing requests to proxies, who distribute them to clients.
...
A few refactorings.
2020-04-16 14:43:22 -07:00
A.J. Beamon
7f3fa00897
Reorganize a bit of code.
2020-04-10 13:53:23 -07:00
A.J. Beamon
29b2c2f3aa
Add hash to StringRef. Use unordered maps for storing tags. Create some helpful typedefs.
2020-04-10 12:54:59 -07:00
A.J. Beamon
55a0d00ad4
Encoding of tags in the database now supports multiple tags per throttle. Remove throttle prefix search.
2020-04-10 10:12:26 -07:00
A.J. Beamon
ebeca10bce
Change the serialization of tags sent in some messages. Add communication of the sampling rate from cluster to clients.
2020-04-09 16:55:56 -07:00
A.J. Beamon
a1d8623e5f
Various changes to the throttling scheme, including most notably that clients now enforce the throttles with info they receive from the proxy.
2020-04-07 16:28:09 -07:00
A.J. Beamon
2336f073ad
Checkpointing a bunch of work on throttles. Rudimentary implementation of auto-throttling. Support for manual throttling via fdbcli. Throttles are stored in the system keyspace.
2020-04-03 15:24:14 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1128666840
added additional logging on the log router
2020-03-05 18:17:06 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
08914a2acd
Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team
2020-02-21 15:14:32 -08:00
Evan Tschannen
819c55556c
More aggressively attempt to find teams that do not have low disk space
2020-02-20 16:47:50 -08:00
Evan Tschannen
bf7d7e2f1e
Merge pull request #2499 from ajbeamon/ratekeeper-durable-version-smoother-fix
...
Fix inaccurate limiting durability lag
2020-02-04 13:04:58 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
A.J. Beamon
9668c4471f
Clamp infinite limit in ratekeeper
2020-01-14 15:45:24 -08:00
Evan Tschannen
855f03a41f
ratekeeper needed to check remoteDC in another location
...
the storage server scoped a transaction incorrectly
2020-01-10 15:58:36 -08:00
Evan Tschannen
7898f4425f
fix: ratekeeper could limit based on remote storage servers
2020-01-10 12:21:08 -08:00
A.J. Beamon
4109be3aca
Switch durable version tracking in ratekeeper to use a faster smoother that matches the latest version's smoother.
2019-12-23 12:48:39 -08:00
Alvin Moore
3bf971ba8b
Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/storageserver.actor.cpp
2019-12-12 07:13:12 -08:00
Andrew Noyes
6bde67f2b3
Fix UBSAN report
...
/home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:86:8: runtime error: load of value 1231493777, which is not a valid value for type 'limitReason_t'
#0 0x310e961 in StorageQueueInfo::StorageQueueInfo(StorageQueueInfo&&) /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:86
#1 0x310eacd in MapPair<UID, StorageQueueInfo>::MapPair<UID, StorageQueueInfo>(UID&&, StorageQueueInfo&&) /home/anoyes/workspace/foundationdb/flow/IndexedSet.h:242
#2 0x310b35e in MapPair<std::decay<UID>::type, std::decay<StorageQueueInfo>::type> mapPair<UID, StorageQueueInfo>(UID&&, StorageQueueInfo&&) /home/anoyes/workspace/foundationdb/flow/IndexedSet.h:258
#3 0x30a8b79 in a_body1 /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:195
#4 0x309b529 in TrackStorageServerQueueInfoActor /home/anoyes/build/foundationdb/fdbserver/Ratekeeper.actor.g.cpp:495
#5 0x309b9be in trackStorageServerQueueInfo(RatekeeperData* const&, StorageServerInterface const&) /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:194
#6 0x30cff63 in a_body1loopBody1when1cont1 /home/anoyes/workspace/foundationdb/fdbserver/Ratekeeper.actor.cpp:303
#7 0x30cd9da in a_body1loopBody1when1when1 /home/anoyes/build/foundationdb/fdbserver/Ratekeeper.actor.g.cpp:1170
#8 0x30ed4dd in a_callback_fire /home/anoyes/build/foundationdb/fdbserver/Ratekeeper.actor.g.cpp:1185
#9 0x30e6d81 in fire /home/anoyes/workspace/foundationdb/flow/flow.h:998
#10 0x4df0dc in void SAV<Void>::send<Void>(Void&&) /home/anoyes/workspace/foundationdb/flow/flow.h:447
#11 0x959891 in void Promise<Void>::send<Void>(Void&&) const /home/anoyes/workspace/foundationdb/flow/flow.h:778
#12 0x7b4b018 in Sim2::execTask(Sim2::Task&) (/home/anoyes/build/foundationdb/bin/fdbserver+0x7b4b018)
#13 0x7bf9168 in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:979
#14 0x7be7b68 in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1when1(Void const&, int) /home/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5391
#15 0x7c329ff in Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_callback_fire(ActorCallback<Sim2::RunLoopActor, 0, Void>*, Void) /home/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5406
#16 0x7c1fc73 in ActorCallback<Sim2::RunLoopActor, 0, Void>::fire(Void const&) /home/anoyes/workspace/foundationdb/flow/flow.h:998
#17 0x4df0dc in void SAV<Void>::send<Void>(Void&&) /home/anoyes/workspace/foundationdb/flow/flow.h:447
#18 0x959891 in void Promise<Void>::send<Void>(Void&&) const /home/anoyes/workspace/foundationdb/flow/flow.h:778
#19 0x7fe74a4 in N2::PromiseTask::operator()() /home/anoyes/workspace/foundationdb/flow/Net2.actor.cpp:481
#20 0x7fb6ff7 in N2::Net2::run() /home/anoyes/workspace/foundationdb/flow/Net2.actor.cpp:657
#21 0x7b71bd3 in Sim2::_runActorState<Sim2::_runActor>::a_body1(int) /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:989
#22 0x7b2ee51 in Sim2::_runActor::_runActor(Sim2* const&) /home/anoyes/build/foundationdb/fdbrpc/sim2.actor.g.cpp:5608
#23 0x7b2f268 in Sim2::_run(Sim2* const&) /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:987
#24 0x7b2f2c8 in Sim2::run() /home/anoyes/workspace/foundationdb/fdbrpc/sim2.actor.cpp:996
#25 0x21040a6 in main /home/anoyes/workspace/foundationdb/fdbserver/fdbserver.actor.cpp:1793
#26 0x7f03492ba504 in __libc_start_main (/lib64/libc.so.6+0x22504)
#27 0x464914 (/home/anoyes/build/foundationdb/bin/fdbserver+0x464914)
2019-12-03 12:49:12 -08:00
Andrew Noyes
e0bf7c4d65
Fix signed integer overflow
...
Not sure if this is the right fix or not
fdbserver/Ratekeeper.actor.cpp:557:40: runtime error: signed integer overflow: -9223372036854775808 - 9223372036854775807 cannot be represented in type 'long long'
2019-12-02 12:51:33 -08:00
Jon Fu
d96a7b2c69
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-03 09:47:45 -07:00
Evan Tschannen
3cc5d484a5
the include and exclude commands do not need to set the moveKeysLockOwnerKey, which will kill the data distribution algorithm
2019-09-27 18:33:56 -07:00
Evan Tschannen
1f2499c74f
Merge pull request #2012 from ajbeamon/rk-durability-lag-considers-mvcc-window
...
Ratekeeper ignores intentionally non-durable versions on the SS for durability lag computations
2019-08-19 14:24:21 -07:00
Evan Tschannen
2bd59d1055
Merge pull request #2003 from ajbeamon/add-rk-durability-lag-to-status
...
Add ratekeeper's durability lag statistics to status
2019-08-19 14:19:59 -07:00
A.J. Beamon
ac2f310104
Ratekeeper ignores intentionally non-durable versions on the SS for durability lag computations.
2019-08-16 14:46:44 -07:00
A.J. Beamon
6581161dd3
Add ratekeeper's durability lag statistics to status
2019-08-15 11:07:04 -07:00
A.J. Beamon
f6ba8509ae
Remove unused local rate limit variables in ratekeeper.
2019-08-15 10:08:28 -07:00
Balachandar Namasivayam
14e54f44b3
Address review comments.
2019-07-18 12:32:35 -07:00
Balachandar Namasivayam
406bcebdc4
Ratekeeper to throttle tpsLimit to 1 if it is not able to fetch storage server list for some configurable amount of time.
2019-07-17 18:08:17 -07:00
Evan Tschannen
db5b4a6331
avoid going to unlimited immediately after going below the durabilityLagTargetVersion
2019-07-12 18:50:56 -07:00
Evan Tschannen
6e34e16699
durable version needs more smoothing because it will be updated in bursts
2019-07-12 18:50:56 -07:00
Evan Tschannen
b2b2e25324
the durabilityLagLimit needs to be tracked separately for batch priority and normal priority
2019-07-12 18:50:56 -07:00
Evan Tschannen
fef58e13a4
adding logging for durability lag in ratekeeper
2019-07-12 18:50:56 -07:00
Evan Tschannen
1a18c859c7
knobified the durability lag rate controls
2019-07-12 18:50:56 -07:00
Evan Tschannen
c5fb5494f5
a better attempt a ratekeeper control on durability lag
2019-07-12 18:50:56 -07:00
Evan Tschannen
dc171b3eae
fixed compiler error
2019-07-12 18:50:56 -07:00
Evan Tschannen
e85c05c906
experimental slow control on durability lag
2019-07-12 18:50:56 -07:00
Jingyu Zhou
50e7593c5b
Merge pull request #1796 from ajbeamon/remove-trace-event-underscores
...
Remove trace event underscores
2019-07-05 21:45:55 -07:00
A.J. Beamon
9f4b6fd770
Remove additional underscores
2019-07-05 08:12:25 -07:00
Alex Miller
8e1ab6e7db
Merge remote-tracking branch 'upstream/master' into flowlock-api
2019-06-28 17:32:54 -07:00
Evan Tschannen
5041ff38b1
removed unneeded description
2019-06-28 16:54:22 -07:00
Evan Tschannen
a124fc6e8a
fixed compiler error
2019-06-28 16:54:22 -07:00
Evan Tschannen
b9a6271375
local ratekeeper no longer globally limits
2019-06-28 16:54:22 -07:00
Evan Tschannen
f539b5f09a
fix: a large targetRateRatio means limiting more
2019-06-28 16:54:22 -07:00
Evan Tschannen
db413c37f7
restored the STORAGE_DURABILITY_LAG_SOFT_MAX knob and made the rk target slightly smaller than the soft limit, to avoid inaccuracies in ratekeeper control causing behavior changes on the storage servers
2019-06-28 16:54:22 -07:00
Evan Tschannen
a97940a10b
fixed compiler error
2019-06-28 16:54:22 -07:00
Evan Tschannen
92b32855ca
ratekeeper’s control algorithm would oscillate when limited by local ratekeeper
2019-06-28 16:54:22 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen
dccb9bc26d
fixed a number of correctness problems
2019-06-12 19:40:50 -07:00
Trevor Clinkenbeard
8144882d7b
Merge branch 'apple-master' into features/local-rk
2019-06-10 19:40:25 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
mpilman
bdba8e22eb
Added test and bugfixes
2019-04-08 11:05:29 -07:00
mpilman
207049e852
fixed serialization
2019-04-08 11:04:44 -07:00
mpilman
32393ec4c9
Prototype of local ratekeeper
2019-04-08 11:04:44 -07:00
A.J. Beamon
91014d4529
Add file changes that I accidentally failed to commit; fix naming issue in worker.
2019-03-27 08:41:19 -07:00
Evan Tschannen
36ab852bb1
Merge branch 'master' into ratekeeper
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
2019-03-22 18:41:00 -07:00
Evan Tschannen
3ced178348
maxVersionDifference is a copy of a knob which is a double
2019-03-21 12:58:48 -07:00
Jingyu Zhou
99d521ef4f
Monitor Ratekeeper and DataDistributor to use stateless processes
...
Since Ratekeeper and DataDistributor are no longer running with Master, they
might be running with stateful processes before a new Master becomes alive,
which is undesirable.
This PR adds a monitoring of both Ratekeeper and DataDistributor at Cluster
Controller -- if Master runs on a stateless class and RK/DD runs at a worse
class, then RK/DD will be killed. I.e., RK/DD should be running at their own
classes or on the same stateless process as Master. After restart, RK/DD should
be running at a better process class.
2019-03-14 15:00:57 -07:00
Jingyu Zhou
2b0139670e
Fix review comment for PR 1176
2019-03-12 12:02:30 -07:00
Jingyu Zhou
cdfe906c30
Data distributor pulls batch limited info from proxy
...
Add a flag in HealthMetrics to indicate that batch priority is rate limited.
Data distributor pulls this flag from proxy to know roughly when rate limiting
happens.
DD uses this information to determine when to do the rebalance in the background,
i.e., moving data from heavily loaded servers to lighter ones. If the cluster is
currently rate limited for batch commits, then the rebalance will use longer
time intervals, otherwise use shorter intervals. See BgDDMountainChopper() and
BgDDValleyFiller() in DataDistributionQueue.actor.cpp.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
f43277e819
Format Ratekeeper.actor.cpp code
2019-03-07 13:16:20 -08:00
Jingyu Zhou
dc129207a9
Minor fix after rebase.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
517966fce2
Remove lastLimited from rate keeper
...
Refactor code to make IDE happy.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
b2ee41ba33
Remove lastLimited from data distribution
...
Fix a serialization bug in ServerDBInfo, which causes test failures.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
e6ac3f7fe8
Minor fix on ratekeeper work registration.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
3c86643822
Separate Ratekeeper from data distribution.
...
Add a new role for ratekeeper.
Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Trevor Clinkenbeard
39f612d132
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-03-02 17:07:00 -08:00
Trevor Clinkenbeard
2940b8d5fd
Update all per-process storage server health metrics at once
...
Ratekeeper updates all storage servers' health metrics in updateRate
with only a single map lookup
2019-03-02 16:08:28 -08:00
A.J. Beamon
655c9d82c7
Various cleanup from review
2019-03-01 14:06:47 -08:00
A.J. Beamon
93f7849261
Fix typo
2019-02-28 12:02:47 -08:00
A.J. Beamon
3e6a6a6569
Update status schema for correctness. Send the count of batch transactions started back to ratekeeper so that it can be logged with other ratekeeper metrics.
2019-02-28 12:00:58 -08:00
A.J. Beamon
eb629d87a5
Add information about batch ratekeeper to status. Make it possible to track latencies in the ReadWrite workload for concurrently run instances separately.
2019-02-28 09:53:16 -08:00
Trevor Clinkenbeard
3f59f82670
Calculate durabilityLag instead of NDV for health metrics
2019-02-27 16:30:01 -08:00
A.J. Beamon
a051055caf
Initial implementation of adding separate limits for batch priority in ratekeeper
2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard
abfe057805
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-25 13:47:16 -08:00
Trevor Clinkenbeard
07f800eeee
Got rid of detailed field in GetRateInfoReply message
2019-02-23 17:52:11 -08:00
Trevor Clinkenbeard
f3a73963b4
Got rid of detailedLeaseDuration in GetRateInfoReply message
2019-02-23 16:42:11 -08:00
Trevor Clinkenbeard
a20f5482bc
Created StorageStats struct to combine health metrics for storage servers
2019-02-20 11:57:41 -08:00
Trevor Clinkenbeard
7594606ee2
Use DETAILED_METRIC_UPDATE_RATE knob to determine GetRateInfoReply lease duration
2019-02-20 11:40:17 -08:00
Evan Tschannen
3a572b010f
fix: a forced recovery needed to force the data distributor to restart
2019-02-19 16:04:52 -08:00
Trevor Clinkenbeard
80cf5e057f
Compute worstStorageNDV for Ratekeeper health metrics
2019-02-02 21:03:02 -08:00
Trevor Clinkenbeard
5822bd65bf
Track health metrics in Ratekeeper and send these metrics to proxies in GetRateInfoReply messages
2019-01-31 12:56:58 -08:00
Trevor Clinkenbeard
d7930af2cb
Storage server periodically calculates cpuUsage and diskUsage metrics. These metrics (as well as all other metrics necessary for health metrics calculation) are sent in the StorageQueuingMetricsReply message.
2019-01-31 12:23:04 -08:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
A.J. Beamon
2a97139d5d
This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.
2018-08-16 10:24:12 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
372ed67497
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen
0e699a3c23
fix: ratekeeper should only control on local logs
2018-05-29 10:51:23 -07:00