Evan Tschannen
2987f85177
fix: known committed version must be updated before creating the tlogQueueEntryRef
2018-06-26 23:21:30 -07:00
Evan Tschannen
00167b0157
renamed some uses of knownCommittedVersion to durableKnownCommittedVersion
...
epochEnd exclusively refers to the last version a set of logs is responsible for serving peek requests for
recoverAt and recoveredAt refer to the last committed version of the previous generation
2018-06-26 18:20:28 -07:00
Evan Tschannen
6e19622872
fix: durableKnownCommittedVersion was incorrect because we could update knownCommittedVersion before pushing a TLogQueueEntry with that known committed version
2018-06-26 18:02:55 -07:00
Evan Tschannen
c6313a79e3
fix: the cluster controller needs to continue to retry recruitment until after wait_for_good_remote_recruitment_delay
2018-06-25 18:20:16 -07:00
Evan Tschannen
1a8dac365d
fix: poppedAllAfter was not set to a large enough version
2018-06-25 15:57:11 -07:00
Evan Tschannen
2ec8744ab3
fix: parallel get more needs to verify the begin version matches the end of the previous request, because when a peek cursor expires we lose all history, so the same sequence number could start at different versions
2018-06-25 11:15:49 -07:00
Evan Tschannen
398497f5c3
fix: wrong desired count used when checking good remote fitness
2018-06-22 12:24:01 -07:00
Evan Tschannen
0123627d67
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-06-22 10:43:07 -07:00
Evan Tschannen
96b0a91ab2
simplified betterCount logic
2018-06-22 10:38:36 -07:00
Evan Tschannen
5fc8199abc
Swapped OkayFit and UnsetFit, because generally if machine classes are set on one machine they are set everywhere and it helps with wait_for_good_recruitment logic
...
wait_for_good_recruitment now requires that you have the desired count of each roll
remote recruitment is given a much longer wait_for_good_recruitment time interval, which does not start until enough remote machines have registered
2018-06-22 10:15:24 -07:00
Evan Tschannen
8a8914f046
re-added the ability to configure the number of log routers. Many log routers are needed to get a sufficient number of sockets involved in copying data across the WAN
2018-06-22 00:04:00 -07:00
Evan Tschannen
1dce97f28c
Merge branch 'release-5.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/SimulatedCluster.actor.cpp
# packaging/msi/FDBInstaller.wxs
# versions.target
2018-06-21 17:05:11 -07:00
Evan Tschannen
3943dc1dfc
Merge pull request #521 from ajbeamon/metrics-improvements
...
Metrics improvements
2018-06-21 16:42:40 -07:00
Evan Tschannen
9a91dad5bd
fixed compile issue
2018-06-21 16:34:36 -07:00
Evan Tschannen
678b4494f4
added logging for the datacenter version difference
2018-06-21 16:31:52 -07:00
A.J. Beamon
36f84c9cff
Fix uninitialized variable
2018-06-21 16:03:05 -07:00
A.J. Beamon
e8f66df001
Add metrics for watches and mutations on the storage server. The storage server tracks its lag with the logs, and status tries to report a more accurate measure of this lag.
2018-06-21 15:59:43 -07:00
Evan Tschannen
8bd7eaebdb
fix: broken_promise from push can be throw into the proxy’s actor collection
2018-06-21 15:55:27 -07:00
Evan Tschannen
68ac3bdc4c
log routers now calculate a precise version to pop for their log router tag
2018-06-21 15:29:46 -07:00
Evan Tschannen
b6ad8b1092
merge master
2018-06-21 12:06:38 -07:00
Alec Grieser
e26bae6ec0
Merge pull request #509 from ajbeamon/report-unused-allocated-memory-in-status
...
Track unused allocated memory in ProcessMetrics and report it in status.
2018-06-20 21:08:39 -07:00
Evan Tschannen
4352db674f
the storage server would not always know about all options when running fetch keys
2018-06-20 17:05:03 -07:00
Evan Tschannen
f755961c42
use parallelGetMore during log recovery
2018-06-20 17:04:06 -07:00
Richard Low
361e335730
Disable cert validation in simulation
2018-06-20 16:26:11 -07:00
Alex Miller
569fd05954
Merge pull request #506 from etschannen/feature-remote-logs
...
Performance improvements when using multiple DCs
2018-06-20 12:12:05 -07:00
A.J. Beamon
5e81f4ac7e
Track unused allocated memory in ProcessMetrics and report it in status.
2018-06-20 10:10:51 -07:00
Evan Tschannen
e951df95b7
when doing data movement where one region has the data and the other doesn’t, first move a single replica to the other region to save WAN bandwidth
2018-06-19 23:15:30 -07:00
Evan Tschannen
e7999e7a3e
log routers need to use parallelGetMore when peeking because the latency to the primary datacenter makes the bandwidth of normal peeking too low.
2018-06-19 22:16:45 -07:00
Evan Tschannen
56b9fb09f0
Merge pull request #503 from etschannen/feature-remote-logs
...
fixed a bug with the tlog memory limit
2018-06-18 22:14:26 -07:00
Evan Tschannen
df4c445e25
fix: we need to return from commit when stopped
2018-06-18 22:12:46 -07:00
Evan Tschannen
403fb5a2e9
removed unnecessary suppressFor
2018-06-18 17:59:29 -07:00
Alex Miller
abf0e68364
Merge pull request #502 from etschannen/feature-remote-logs
...
fixed a few performance issues with multiple DC deployments
2018-06-18 17:46:37 -07:00
Evan Tschannen
eaca0fb2ea
fixed incorrect priorities on the log router
2018-06-18 17:36:40 -07:00
Evan Tschannen
0bdd25df23
ratekeeper does not control on remote storage servers
2018-06-18 17:23:55 -07:00
Evan Tschannen
b79feaddd3
added a hard memory limit to the TLog to prevent it from running out of memory. Because remote logs are not ratekeeper controlled this is their only protection
2018-06-18 17:22:40 -07:00
Alex Miller
aead2586f4
Move .actor.g.cpp files to .obj.
...
This means that grep over our source tree doesn't return 2x the results.
2018-06-18 16:47:27 -07:00
Evan Tschannen
127c2ad775
fix: prevent adding the same location multiple times for satellite logs
2018-06-18 15:27:28 -07:00
Evan Tschannen
50e1e03130
fix: for configurations with anti-quorums to work, the push actors need to be put in the proxy’s actor collection
2018-06-18 15:25:54 -07:00
Evan Tschannen
1ccfb3a0f4
fix: log_anti_quorum was always 0 in simulation
...
removed durableStorageQuorum, because it is no longer a useful configuration parameter
2018-06-18 10:24:57 -07:00
Evan Tschannen
e8c462882b
re-added remote_logs as a parameter, because it could be useful to have a different number of logs between when recruited as primary and remote
2018-06-18 10:22:34 -07:00
Evan Tschannen
0913368651
added usable_regions to specify if we will replicate into a remote region
...
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Evan Tschannen
7aef5ec6f1
got rid of persistUnrecoveredBefore
...
added persistLocality
2018-06-17 14:44:33 -07:00
Evan Tschannen
f637c680f1
fix: populateSatelliteTagLocations was broken
...
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen
6931a00993
satellite log push locations are static per tag, which will reduce the number of tags each satellite log has to index, and reduce the proxy cpu when calculating push locations
2018-06-16 17:39:02 -07:00
Evan Tschannen
f694f7c9ca
removed hasBestPolicy
2018-06-15 12:36:19 -07:00
Evan Tschannen
0d87186821
use a specific locality for satellites
2018-06-15 11:06:38 -07:00
Evan Tschannen
09c92c887b
fix: extraTlogEligibleMachines was not calculated correctly in all cases
2018-06-15 10:23:33 -07:00
Evan Tschannen
246abd1207
added full_replication to status
2018-06-14 21:14:18 -07:00
Evan Tschannen
284233baa1
added a key in the database with the locality of the current master
2018-06-14 19:36:02 -07:00
Evan Tschannen
0103b6f5ed
added datacenter_version_difference to status
2018-06-14 19:09:25 -07:00