Commit Graph

218 Commits

Author SHA1 Message Date
Evan Tschannen 482ac38ca6 added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs 2017-12-01 13:04:32 -08:00
Evan Tschannen c3918d892a do not use bandwidth splitting on the keyServer shard, lots of sets and clears to this shard generally means you do not want to create additional data distribution work 2017-11-30 18:28:16 -08:00
Evan Tschannen 7f72aa7de5 fix: a storage server does not ever need to rollback before a version restored from disk 2017-11-30 11:19:43 -08:00
Evan Tschannen e5a682948c Merge pull request #212 from cie/check-cluster-controller-desired-class
Check cluster controller using desired process class in consistency c…
2017-11-29 15:57:51 -08:00
Yichi Chiang 8ba0eaebff Check cluster controller using desired process class in consistency check 2017-11-29 15:09:23 -08:00
Evan Tschannen 8c51bc4ac4 fixed low latency tests in a way that gives us better test coverage 2017-11-28 18:20:29 -08:00
Evan Tschannen dc624a54dc fix: avoid flushing large queues in simulation when checking latency 2017-11-27 17:23:20 -08:00
Alex Miller f19cb3bbbd Merge pull request #208 from cie/alexmiller/grvtfix
Fix the GRV performance regression
2017-11-17 15:00:44 -08:00
Yichi Chiang d9a98aa968 Remove commented code 2017-11-16 17:25:37 -08:00
Yichi Chiang 0d5dc15ac8 Fix double recoveries 2017-11-16 16:58:55 -08:00
Alex Miller e9412bbb11 Fix the GRV performance regression introduced by adding the policy engine to GRV calculations.
Construction of LocalityGroup from LocalityData is expensive, and the previous
code greatly ran afoul of that.  The policy engine does a large amount of
interning of strings and building compressed maps to make the expected many
future selectReplica calls cheap.  Unfortunately we don't call selectReplicas,
so much of this work is undesireable for us, and a large amount of CPU time is
spent doing this initialization work.

The new changes aggressively do the minimal LocalityGroup::add() calls
necessary, and make them as cheap as possibly by removing all elements from
LocalityData that don't need to be considered by the policy.

This optimization was also applied to the PeekCursor used during recovery,
which should speed recoveries up by a small amount.
2017-11-16 16:15:52 -08:00
Evan Tschannen ad456a939a Merge pull request #206 from cie/change-excluded-cluster-controller
Change excluded cluster controller
2017-11-15 17:28:33 -08:00
Yichi Chiang f96faf72d9 Add fullyRecoveredConfig for checking exclusions 2017-11-15 17:15:24 -08:00
Evan Tschannen 30464e943c Merge pull request #205 from cie/cleanup-spammy-traceevents
Cleanup spammy traceevents
2017-11-15 12:41:37 -08:00
Evan Tschannen e113dba0e3 added a new trace event tracking master recovery durations 2017-11-15 12:38:26 -08:00
Yichi Chiang df922bc973 Change excluded cluster controller 2017-11-14 13:57:37 -08:00
A.J. Beamon bb1297c686 Remove RkServerQueueInfo and RkTLogQueueInfo trace events, since this information is more or less already logged on the storage servers and tlogs. Update the quiet database check and magnesium to use the information from the logs and storage servers. 2017-11-14 12:59:42 -08:00
A.J. Beamon 3b952efb4e Remove events from cluster controller that get logged for roughly every worker upon recovery, master registration, etc. 2017-11-14 10:15:45 -08:00
A.J. Beamon 0fea5e9c2f Convert client_invalid_operation errors to ASSERTs. 2017-11-13 11:38:34 -08:00
A.J. Beamon cd085764f1 Do not automatically change a cluster file that does not match what you expect. 2017-11-10 14:12:45 -08:00
Alex Miller 311d1ca87d A variety of fixes that collectively fix using flow profiling in circus.
To run, use --co=flow_profiling=-1, because reasons.
2017-11-07 13:55:16 -08:00
Evan Tschannen 706bf1e018 fix: we cannot trigger better master exists before a master is fully recovered because exclusions changed by the provisional master will not be committed until the master is fully recovered 2017-11-04 12:48:04 -07:00
Evan Tschannen 57aba0b3bc fix: excluded servers were the same fitness as storage servers for the master role
fix: better master exists did not considers exclusion for master fitness
2017-11-03 17:09:14 -07:00
Yichi Chiang 42fad5efe5 Introduce cluster controller process class in circus 2017-11-03 14:22:55 -07:00
Yichi Chiang dcc9aafab7 Merge branch 'master' of github.com:apple/foundationdb 2017-11-02 10:47:59 -07:00
Yichi Chiang c033d8efd8 Fix typo message and remove extra TraceEvent which overwrites the expected one 2017-11-02 10:47:51 -07:00
Balachandar Namasivayam 3efaaec479 onMasterProxiesChanged was being triggered when any member of ClientDBInfo changed. Change the behavior to be triggered only when proxies field in ClientDBInfo is changed. 2017-11-01 18:29:56 -07:00
A.J. Beamon 7cf17df821 Merge branch 'master' into log-group-for-unsupported-clients
# Conflicts:
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-11-01 11:31:02 -07:00
A.J. Beamon 31caac67dc Rename supported_versions[x].clients to supported_versions[x].connected_clients 2017-11-01 10:41:30 -07:00
Balachandar Namasivayam 988bc0207f Reset Client Transaction profiling parameters when the config keys are cleared. 2017-10-31 15:40:57 -07:00
Alec Grieser 5a4a5985fd Merge branch 'release-5.0' 2017-10-30 08:31:23 -07:00
Alec Grieser 87321f5017 Merge branch 'release-4.6' into release-5.0 2017-10-30 08:31:01 -07:00
Evan Tschannen 54d82c0d92 Merge pull request #194 from cie/alexmiller/valgrind
Fix valgrind errors
2017-10-27 17:25:12 -07:00
Alex Miller e0d33ef8d7 Preemptively fix profiler-related valgrind errors/straight out bugs.
I forgot to initialize some fields in requests.
2017-10-27 17:20:19 -07:00
Evan Tschannen aa0c2ae317 only increase the max shard size if the shard begins in the keyServer keyspace, do not increase the minimum shard size 2017-10-27 14:22:26 -07:00
Evan Tschannen 3a4078bdda the keyservers shards are always a fixed large size 2017-10-27 11:52:11 -07:00
Balachandar Namasivayam cfefab18fb Merge branch 'master' into add-new-atomic-ops 2017-10-25 18:03:34 -07:00
Balachandar Namasivayam 3d5658940a Addressed Review Comments 2017-10-25 16:42:05 -07:00
Balachandar Namasivayam 9dd588dcce Addressed review comments.
Changed naming for NewMin and NewAnd to MinV2 and AndV2
2017-10-25 14:48:05 -07:00
Evan Tschannen d852a53ae4 Merge pull request #181 from cie/throttle-spammy-logs
Throttle spammy logs
2017-10-25 13:45:55 -07:00
Balachandar Namasivayam 2f6d55a52f Add correctness tests for all atomic ops 2017-10-25 13:36:49 -07:00
Yichi Chiang 4d54a73f5b Merge pull request #191 from cie/count-cluster-controller-role
Take cluster controller role into consideration when recruiting workers
2017-10-25 12:09:15 -07:00
Yichi Chiang f39cce9b8d Use processId instead of address for comparison 2017-10-25 11:35:29 -07:00
Yichi Chiang 5fcef911f0 Take cluster controller role into consideration when recruiting workers 2017-10-25 10:35:46 -07:00
Evan Tschannen 48901a9223 added a list of tlog IDs that are missing to status 2017-10-24 16:28:50 -07:00
Yichi Chiang c2a117fe07 Merge pull request #189 from cie/enable-check-desired-class
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 15:18:21 -07:00
Yichi Chiang defdc6550d Exclude excluded processses when getting testers 2017-10-24 15:16:34 -07:00
Yichi Chiang 3865c5ae0e Enable checkUsingDesiredClasses() in consistency check 2017-10-24 12:58:54 -07:00
Balachandar Namasivayam 8c3bdc5b3b Make atomic ops differentiate between unset and empty values. 2017-10-23 16:48:13 -07:00
Bhaskar Muppana 360b777b78 Fail with correct error code in case of abort or discontinue of
non-existing backups.
2017-10-18 23:17:48 -07:00