Commit Graph

345 Commits

Author SHA1 Message Date
A.J. Beamon fcbeea104f Update documentation and tests with new connected_clients status schema 2017-11-01 11:25:50 -07:00
A.J. Beamon 31caac67dc Rename supported_versions[x].clients to supported_versions[x].connected_clients 2017-11-01 10:41:30 -07:00
Yichi Chiang d4f75630de Support log group field in status json 2017-09-28 16:31:29 -07:00
Alvin Moore 298b54104e Merge branch 'release-5.0' 2017-09-26 11:16:14 -07:00
Alvin Moore 02525d7b14 Added TESTs to ensure that all of the different kills are performed during simulation 2017-09-26 11:15:39 -07:00
Yichi Chiang 4ce60c4276 Merge pull request #159 from cie/add-locality-to-backup
Add locality to backup agent and DR agent
2017-09-26 10:20:32 -07:00
Yichi Chiang 5e9c6d6b64 Add locality to backup agent and DR agent 2017-09-26 10:19:26 -07:00
Evan Tschannen acb7e66d01 fix: failed logs do not count even if they have returned a result 2017-09-25 18:14:40 -07:00
Evan Tschannen 2bf042a559 fix: file_corrupt was not checking for fault injection
latency threshold was too long
2017-09-25 17:22:41 -07:00
A.J. Beamon e5e7f8a081 When using setKey() on Standalone<KeySelectorRef> in RYW, make sure that the key is part of the key selector's arena. 2017-09-25 15:52:45 -07:00
Evan Tschannen cce4eeb52d fix: the master was sending the cluster controller uninitialized configurations 2017-09-22 16:59:24 -07:00
Evan Tschannen 180438d41e fix: use the number of present logServers rather than the total size of the vector 2017-09-22 16:19:16 -07:00
Evan Tschannen 7081136f74 added a fix 2017-09-22 15:08:14 -07:00
Evan Tschannen 738ae21c3a fix: an optimization in buggified locking can cause recovery to break because it would not restart if a locked process was killed when the remaining logs cannot obtain a quorum 2017-09-22 15:07:57 -07:00
Evan Tschannen fba78ce4ef refactored monitor leader again to be even safer.
fixed a problem where we would write the header to clusters files twice
added extra logging in monitor leader
2017-09-22 15:06:11 -07:00
Alex Miller 585c9bf68f Quick fix to reduce CPU usage of ensureEpochLive.
It is suspected that policy recomputations are driving proxy CPU usage up, and
thus latency and throughput down.  To quickly confirm this theory, we're
forcing ensureEpochLive to wait until it has RF responses, which means we'll
probably only validate the policy once per call.
2017-09-21 18:22:24 -07:00
Evan Tschannen 4809bd8f62 fix: We cannot inject faults after renaming the file, because we could end up with two asyncFileNonDurable open for the same file 2017-09-21 18:11:18 -07:00
A.J. Beamon 995587b12b Merge branch 'release-5.0' 2017-09-21 13:32:12 -07:00
Evan Tschannen a9e3ae40d6 refactored monitorLeader to avoid the risk of one generation or coordinators interfering with the next 2017-09-20 17:42:12 -07:00
Evan Tschannen 53a4a3280a fix: we cannot add to the trLog when cancelled 2017-09-20 14:47:57 -07:00
Evan Tschannen c3f77ebbd2 Merge branch 'master' of github.com:apple/foundationdb 2017-09-20 11:48:35 -07:00
Evan Tschannen fbd67ea547 fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)
fix: better master exists did not use exclusions because the configuration was reset
2017-09-20 11:48:26 -07:00
Ben Collins 21688afeb3 Merge pull request #155 from cie/feature-jni-no-memcpy
Fix possible leaks, move to SetByteArrayRegion()
2017-09-20 11:01:29 -07:00
A.J. Beamon da9b56e1ef More use of SetByteArrayRegion and various memory management fixes. 2017-09-20 10:31:25 -07:00
Balachandar Namasivayam 24aa616a7a Merge pull request #154 from cie/additional-client-profiling
Additional client profiling
2017-09-19 18:15:02 -07:00
Evan Tschannen cb43563b2d fix: toMap properly lists the redundancy mode of the cluster 2017-09-19 16:35:42 -07:00
Ben Collins 8c13f60625 Update tuple.md 2017-09-19 22:41:55 +00:00
Evan Tschannen f75dfc3153 do not register with the master until recovery of the queue is complete, to avoid having the master wait a long time for a peek response 2017-09-18 17:39:12 -07:00
Alex Miller 567d663afd Fix SimulationConfig never generating a custom config.
A 0 was changed to a 1 when rewriting code, and `case 0:` was never being hit. :(
Thankfully, it looks like nothing was broken by this in the meantime.
2017-09-18 17:29:36 -07:00
Evan Tschannen e8b895c878 added the ability to disable connection failures for a period of time after one happens 2017-09-18 12:46:29 -07:00
Evan Tschannen 111121fd13 Merge branch 'master' of github.com:apple/foundationdb 2017-09-18 11:05:02 -07:00
Evan Tschannen 489332533c all timeouts longer than two minutes have been can be lowered to 60.0 with buggification
added a workload that tries for a 50 second maximum latency in the presence of one failure with both buggification and connection failures
2017-09-18 11:04:51 -07:00
A.J. Beamon 2934c0d443 Merge branch 'release-5.0' 2017-09-18 09:30:13 -07:00
Evan Tschannen 34f987f56d added a test in simulation which ensures that a recovery after a single failure takes less than 15 seconds 2017-09-15 17:55:01 -07:00
Evan Tschannen d9b64899c5 fix: we need to wait for log server failures if we have not locked all of the logs 2017-09-15 13:11:21 -07:00
Evan Tschannen d67e017bcc reduced reply_byte_limit to 80k 2017-09-15 11:01:56 -07:00
Evan Tschannen 36c98f18e9 do not register a worker with the cluster controller until it has finished recovering all files from disk 2017-09-15 10:57:58 -07:00
Evan Tschannen f3b7aa615d fix: seed storage servers are recruited based on the storage policy 2017-09-14 17:06:00 -07:00
Alvin Moore 9404d226d0 Merge branch 'release-5.0' 2017-09-13 16:49:00 -07:00
Alvin Moore cb92194772 Fixed problem with master being recruited on excluded servers 2017-09-13 16:48:27 -07:00
Alex Miller 5e14f19875 Merge pull request #147 from cie/alexmiller/grvtlogs
Only verify a quorum of TLogs are unlocked for a GRV request
2017-09-13 16:07:25 -07:00
Alex Miller d6b3be98fe Fix whitespace. 2017-09-13 15:49:39 -07:00
Alex Miller 06a9c7a772 Remove unnecessary policy recomputations in confirmEpochLive.
Watching for interface changes on readied servers was done as a workaround for
a case where all futures could be ready, but the policy verification would
never succeed.  This turns out to be because stopping a tlog causes an error to
be returned.  However, if a TLog is stopped, then we know that we can't do any
more commits, so we can just immediately stop trying and never mark our future
as ready.
2017-09-13 15:45:09 -07:00
Evan Tschannen 8cb53fd608 Merge pull request #149 from cie/choose-leader-on-stateless-processes
choose leader on the perferred process class
2017-09-13 13:58:49 -07:00
Alvin Moore b1dd2ac6fe Merge branch 'release-5.0' 2017-09-12 13:34:28 -07:00
Alvin Moore 4a6fb10a42 Added TraceEvents for remaining and killed workers when killing DataCenter
Fixed consideration of excluded workers when checking cluster availability
2017-09-12 13:33:13 -07:00
Alec Grieser 81860eeee7 fixed exclusion rule to actually not take flow tester 2017-09-11 13:04:45 -07:00
A.J. Beamon 4fa2415553 Merge branch 'release-5.0' 2017-09-08 17:28:12 -07:00
A.J. Beamon bb8a245bdb circus: throughput test scales latency error by the target latency 2017-09-08 17:27:54 -07:00
A.J. Beamon 18b2b95421 Merge branch 'release-5.0' 2017-09-08 15:47:31 -07:00