A.J. Beamon
fcbeea104f
Update documentation and tests with new connected_clients status schema
2017-11-01 11:25:50 -07:00
A.J. Beamon
31caac67dc
Rename supported_versions[x].clients to supported_versions[x].connected_clients
2017-11-01 10:41:30 -07:00
Yichi Chiang
d4f75630de
Support log group field in status json
2017-09-28 16:31:29 -07:00
Alvin Moore
298b54104e
Merge branch 'release-5.0'
2017-09-26 11:16:14 -07:00
Alvin Moore
02525d7b14
Added TESTs to ensure that all of the different kills are performed during simulation
2017-09-26 11:15:39 -07:00
Yichi Chiang
4ce60c4276
Merge pull request #159 from cie/add-locality-to-backup
...
Add locality to backup agent and DR agent
2017-09-26 10:20:32 -07:00
Yichi Chiang
5e9c6d6b64
Add locality to backup agent and DR agent
2017-09-26 10:19:26 -07:00
Evan Tschannen
acb7e66d01
fix: failed logs do not count even if they have returned a result
2017-09-25 18:14:40 -07:00
Evan Tschannen
2bf042a559
fix: file_corrupt was not checking for fault injection
...
latency threshold was too long
2017-09-25 17:22:41 -07:00
A.J. Beamon
e5e7f8a081
When using setKey() on Standalone<KeySelectorRef> in RYW, make sure that the key is part of the key selector's arena.
2017-09-25 15:52:45 -07:00
Evan Tschannen
cce4eeb52d
fix: the master was sending the cluster controller uninitialized configurations
2017-09-22 16:59:24 -07:00
Evan Tschannen
180438d41e
fix: use the number of present logServers rather than the total size of the vector
2017-09-22 16:19:16 -07:00
Evan Tschannen
7081136f74
added a fix
2017-09-22 15:08:14 -07:00
Evan Tschannen
738ae21c3a
fix: an optimization in buggified locking can cause recovery to break because it would not restart if a locked process was killed when the remaining logs cannot obtain a quorum
2017-09-22 15:07:57 -07:00
Evan Tschannen
fba78ce4ef
refactored monitor leader again to be even safer.
...
fixed a problem where we would write the header to clusters files twice
added extra logging in monitor leader
2017-09-22 15:06:11 -07:00
Alex Miller
585c9bf68f
Quick fix to reduce CPU usage of ensureEpochLive.
...
It is suspected that policy recomputations are driving proxy CPU usage up, and
thus latency and throughput down. To quickly confirm this theory, we're
forcing ensureEpochLive to wait until it has RF responses, which means we'll
probably only validate the policy once per call.
2017-09-21 18:22:24 -07:00
Evan Tschannen
4809bd8f62
fix: We cannot inject faults after renaming the file, because we could end up with two asyncFileNonDurable open for the same file
2017-09-21 18:11:18 -07:00
A.J. Beamon
995587b12b
Merge branch 'release-5.0'
2017-09-21 13:32:12 -07:00
Evan Tschannen
a9e3ae40d6
refactored monitorLeader to avoid the risk of one generation or coordinators interfering with the next
2017-09-20 17:42:12 -07:00
Evan Tschannen
53a4a3280a
fix: we cannot add to the trLog when cancelled
2017-09-20 14:47:57 -07:00
Evan Tschannen
c3f77ebbd2
Merge branch 'master' of github.com:apple/foundationdb
2017-09-20 11:48:35 -07:00
Evan Tschannen
fbd67ea547
fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)
...
fix: better master exists did not use exclusions because the configuration was reset
2017-09-20 11:48:26 -07:00
Ben Collins
21688afeb3
Merge pull request #155 from cie/feature-jni-no-memcpy
...
Fix possible leaks, move to SetByteArrayRegion()
2017-09-20 11:01:29 -07:00
A.J. Beamon
da9b56e1ef
More use of SetByteArrayRegion and various memory management fixes.
2017-09-20 10:31:25 -07:00
Balachandar Namasivayam
24aa616a7a
Merge pull request #154 from cie/additional-client-profiling
...
Additional client profiling
2017-09-19 18:15:02 -07:00
Evan Tschannen
cb43563b2d
fix: toMap properly lists the redundancy mode of the cluster
2017-09-19 16:35:42 -07:00
Ben Collins
8c13f60625
Update tuple.md
2017-09-19 22:41:55 +00:00
Evan Tschannen
f75dfc3153
do not register with the master until recovery of the queue is complete, to avoid having the master wait a long time for a peek response
2017-09-18 17:39:12 -07:00
Alex Miller
567d663afd
Fix SimulationConfig never generating a custom config.
...
A 0 was changed to a 1 when rewriting code, and `case 0:` was never being hit. :(
Thankfully, it looks like nothing was broken by this in the meantime.
2017-09-18 17:29:36 -07:00
Evan Tschannen
e8b895c878
added the ability to disable connection failures for a period of time after one happens
2017-09-18 12:46:29 -07:00
Evan Tschannen
111121fd13
Merge branch 'master' of github.com:apple/foundationdb
2017-09-18 11:05:02 -07:00
Evan Tschannen
489332533c
all timeouts longer than two minutes have been can be lowered to 60.0 with buggification
...
added a workload that tries for a 50 second maximum latency in the presence of one failure with both buggification and connection failures
2017-09-18 11:04:51 -07:00
A.J. Beamon
2934c0d443
Merge branch 'release-5.0'
2017-09-18 09:30:13 -07:00
Evan Tschannen
34f987f56d
added a test in simulation which ensures that a recovery after a single failure takes less than 15 seconds
2017-09-15 17:55:01 -07:00
Evan Tschannen
d9b64899c5
fix: we need to wait for log server failures if we have not locked all of the logs
2017-09-15 13:11:21 -07:00
Evan Tschannen
d67e017bcc
reduced reply_byte_limit to 80k
2017-09-15 11:01:56 -07:00
Evan Tschannen
36c98f18e9
do not register a worker with the cluster controller until it has finished recovering all files from disk
2017-09-15 10:57:58 -07:00
Evan Tschannen
f3b7aa615d
fix: seed storage servers are recruited based on the storage policy
2017-09-14 17:06:00 -07:00
Alvin Moore
9404d226d0
Merge branch 'release-5.0'
2017-09-13 16:49:00 -07:00
Alvin Moore
cb92194772
Fixed problem with master being recruited on excluded servers
2017-09-13 16:48:27 -07:00
Alex Miller
5e14f19875
Merge pull request #147 from cie/alexmiller/grvtlogs
...
Only verify a quorum of TLogs are unlocked for a GRV request
2017-09-13 16:07:25 -07:00
Alex Miller
d6b3be98fe
Fix whitespace.
2017-09-13 15:49:39 -07:00
Alex Miller
06a9c7a772
Remove unnecessary policy recomputations in confirmEpochLive.
...
Watching for interface changes on readied servers was done as a workaround for
a case where all futures could be ready, but the policy verification would
never succeed. This turns out to be because stopping a tlog causes an error to
be returned. However, if a TLog is stopped, then we know that we can't do any
more commits, so we can just immediately stop trying and never mark our future
as ready.
2017-09-13 15:45:09 -07:00
Evan Tschannen
8cb53fd608
Merge pull request #149 from cie/choose-leader-on-stateless-processes
...
choose leader on the perferred process class
2017-09-13 13:58:49 -07:00
Alvin Moore
b1dd2ac6fe
Merge branch 'release-5.0'
2017-09-12 13:34:28 -07:00
Alvin Moore
4a6fb10a42
Added TraceEvents for remaining and killed workers when killing DataCenter
...
Fixed consideration of excluded workers when checking cluster availability
2017-09-12 13:33:13 -07:00
Alec Grieser
81860eeee7
fixed exclusion rule to actually not take flow tester
2017-09-11 13:04:45 -07:00
A.J. Beamon
4fa2415553
Merge branch 'release-5.0'
2017-09-08 17:28:12 -07:00
A.J. Beamon
bb8a245bdb
circus: throughput test scales latency error by the target latency
2017-09-08 17:27:54 -07:00
A.J. Beamon
18b2b95421
Merge branch 'release-5.0'
2017-09-08 15:47:31 -07:00