Evan Tschannen
|
0217aed74c
|
Merge branch 'release-6.0'
# Conflicts:
# bindings/go/README.md
# documentation/sphinx/source/release-notes.rst
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
|
2018-10-15 18:38:51 -07:00 |
Evan Tschannen
|
0acfae1e76
|
fixed the windows linker error
|
2018-10-15 18:19:51 -07:00 |
Evan Tschannen
|
a8feecbfad
|
added a comment to explain code ordering
|
2018-10-12 16:27:13 -07:00 |
Evan Tschannen
|
8ed4ce183c
|
Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0
|
2018-10-12 14:56:19 -07:00 |
Evan Tschannen
|
17a1e3ce35
|
fix: the master proxy would log an OpCommit for empty commits to the txnStateStore
|
2018-10-12 12:58:17 -07:00 |
A.J. Beamon
|
419231d798
|
Fix: status was trying to read a metric under the wrong name, leading to an error that caused the cluster to report itself unhealthy and some metrics to be missing.
|
2018-10-10 13:33:28 -07:00 |
Evan Tschannen
|
4c95a5ee0f
|
added the basic structure for parallel restore
|
2018-10-09 18:47:28 -07:00 |
Evan Tschannen
|
ecddeab2ae
|
fixed review comments; demote killRegionCycle test for now
|
2018-10-08 10:39:39 -07:00 |
Evan Tschannen
|
1314bcec9e
|
Merge branch 'release-6.0'
# Conflicts:
# documentation/sphinx/source/release-notes.rst
|
2018-10-05 12:54:00 -07:00 |
Evan Tschannen
|
47e31133aa
|
fix: only create a new version if the version has not been created before
|
2018-10-05 12:37:29 -07:00 |
Evan Tschannen
|
06be70bace
|
fix: if localEnd is smaller than begin, we cannot peek from the local dc
|
2018-10-05 12:36:34 -07:00 |
Evan Tschannen
|
daed31708b
|
fix: we can only repair dead DCs if we have a fearless configuration
|
2018-10-05 12:35:37 -07:00 |
Evan Tschannen
|
3922e477a5
|
Merge branch 'release-6.0'
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/LogSystemDiskQueueAdapter.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
|
2018-10-03 16:57:18 -07:00 |
Evan Tschannen
|
9de55f362b
|
Merge pull request #793 from ajbeamon/add-new-storage-status-metrics
Add new metrics for bytes queried, keys queried, mutation bytes, muta…
|
2018-10-03 16:34:26 -07:00 |
Evan Tschannen
|
598788f60b
|
Merge pull request #801 from etschannen/feature-fix-forced-recovery
Fixed a number of problems with forced recoveries
|
2018-10-03 16:32:03 -07:00 |
Evan Tschannen
|
636420abee
|
fix: if the disk queue adapter peek hangs for a while, switch to a peek from a different locality
|
2018-10-03 13:58:55 -07:00 |
Evan Tschannen
|
28545e0f8d
|
multi cursors start a get more for the first 10 cursors to hide latency
|
2018-10-03 13:57:45 -07:00 |
Evan Tschannen
|
aa51d69b2d
|
fix: set peekLocality for upgraded tags
|
2018-10-03 13:54:59 -07:00 |
Evan Tschannen
|
c9f4109539
|
fix: add some additional time in the kill region workload to detect if we recovered successfully
|
2018-10-02 17:47:15 -07:00 |
Evan Tschannen
|
cdaf5e1192
|
fix: forced recovery does not recover tags from any DC besides the surviving one
|
2018-10-02 17:46:22 -07:00 |
Evan Tschannen
|
69711a107b
|
fix: because of forced recovery, 0 log router tags does not mean we are a special tlog set
|
2018-10-02 17:45:11 -07:00 |
Evan Tschannen
|
e7e1c634e0
|
fix: we need to restart the peek cursor when the known committed version becomes available
|
2018-10-02 17:44:14 -07:00 |
Evan Tschannen
|
a92fc911ac
|
do not spin on a failed storage server recruitment
|
2018-10-02 17:31:07 -07:00 |
Evan Tschannen
|
15ce215c1b
|
fix: parallel peek requests leaked memory
|
2018-10-02 17:28:39 -07:00 |
A.J. Beamon
|
84c2e3567f
|
Fix keys queried to use the RowsQueried metric instead of BytesQueried.
|
2018-10-01 11:19:28 -07:00 |
A.J. Beamon
|
a98fcf5972
|
Rename durable_lag to durability_lag
|
2018-10-01 09:58:49 -07:00 |
Evan Tschannen
|
bd6b743a81
|
fix: the storage server must always keep MAX_READ_TRANSACTION_LIFE_VERSIONS of history in memory, because forced recovery could roll back an epoch end.
fix: rollbacks were triggered unnecessarily
|
2018-09-28 16:04:59 -07:00 |
Evan Tschannen
|
3fdf72c626
|
fix: we need to force recovery if the master is still attempting to read the txs tag
|
2018-09-28 13:33:33 -07:00 |
Evan Tschannen
|
59335aa757
|
fix: the latest generation of remote transaction logs might has less data the a previous generation, because they take over at known committed version. Detect this case and end at the version that has the most data
|
2018-09-28 12:25:27 -07:00 |
Evan Tschannen
|
c577840020
|
fix: forced recovery should remove all references to the old primary tlogs in all generations of logs to help the peek logic avoid attempting to read from them
|
2018-09-28 12:23:09 -07:00 |
Evan Tschannen
|
05e7f08b26
|
added a peek method which will attempt to read the txsTag from the local region as much as possible
|
2018-09-28 12:21:08 -07:00 |
Evan Tschannen
|
a24eadd73a
|
fix: for remote logs, their known committed version cannot be set to 1, because they can be used when their durable version is 0, leading to a known committed version being greater than a queue committed version
|
2018-09-28 12:17:21 -07:00 |
Evan Tschannen
|
e64c55dce0
|
fix: data distribution would use the wrong priority sometimes when fixing an incomplete movement, this lead to the cluster thinking the data was replicated in all regions before it actually was
|
2018-09-28 12:15:23 -07:00 |
Evan Tschannen
|
b1fe069165
|
fix: during forced recovery logs can be removed from the logSystemConfig. We need to avoid killing the removed logs as unneeded until we actually complete the recovery
|
2018-09-28 12:13:46 -07:00 |
Evan Tschannen
|
22e6afbb18
|
fix: the cluster controller did not pass in its own locality when creating its database object, therefore it was not using locality aware load balancing
|
2018-09-28 12:12:06 -07:00 |
Evan Tschannen
|
b560b94ebc
|
fix: do not force a recovery if the master was already in the other region (and therefore already recovered)
fix: reboot the remaining DC, because any storage server rejoins that were rolled back will cause that server to be unusable
|
2018-09-28 12:10:04 -07:00 |
A.J. Beamon
|
f196e2d4dc
|
Lot metrics about read requests as well as completed reads.
|
2018-09-27 15:32:39 -07:00 |
A.J. Beamon
|
118e21c446
|
Add new metrics for bytes queried, keys queried, mutation bytes, mutations, and durable lag to the storage role in status.
|
2018-09-27 14:33:21 -07:00 |
Steve Atherton
|
6756188f53
|
Merge pull request #760 from ajbeamon/fix-actor-warnings
Fix warnings about ACTORs not having waits. Fix shadowing of future v…
|
2018-09-24 10:07:59 -07:00 |
A.J. Beamon
|
48e620c680
|
Change the first of two trace events named "BTreeIntegrityCheck" to have the name "BTreeIntegrityCheckResults"
|
2018-09-24 08:40:18 -07:00 |
A.J. Beamon
|
92990d6aef
|
Merge release-6.0 into master
|
2018-09-21 16:14:39 -07:00 |
Evan Tschannen
|
77e2fb787e
|
Merge branch 'release-6.0' into feature-fix-forced-recovery
|
2018-09-21 14:55:37 -07:00 |
Evan Tschannen
|
3f86905ea7
|
fix: restore did not take into account that the end version of a log file does not exist in that file. This resulted in restores done at the same version a snapshot completes to not apply the mutations at that final version.
|
2018-09-21 11:48:28 -07:00 |
Evan Tschannen
|
6b6d7a087d
|
The cluster controller should never consider itself as failed (that will be handled by the coordinators)
Simplified the check that the cluster controller is excluded
|
2018-09-20 17:01:11 -07:00 |
Evan Tschannen
|
31d0b0315f
|
fix: tlog spill policy would spill everything when it wanted to spill nothing
use a flow lock to protect updatePersistData and initPersistentState from committing simultaneously
|
2018-09-20 15:33:38 -07:00 |
Evan Tschannen
|
03728db99b
|
do not trigger better master exists if the cluster controller is excluded, since the master will change anyways once the cluster controller is moved
|
2018-09-19 18:28:24 -07:00 |
Evan Tschannen
|
861c8aa675
|
consider server health when building subsets of emergency teams
|
2018-09-19 17:57:01 -07:00 |
Evan Tschannen
|
702d018882
|
fix: we cannot use count on an async map, because someone waiting onChange for an item will cause it to exist in the map before it is set
|
2018-09-19 16:11:57 -07:00 |
Evan Tschannen
|
6d18193b3a
|
fix: team->setHealthy was not being called correctly on initially unhealthy teams
|
2018-09-19 14:48:07 -07:00 |
Evan Tschannen
|
270b1b24a6
|
fix: we have to use durableKnownCommittedVersion, because the is the true lower bound on the recovery version of the remote logs
fixed a compiler error
|
2018-09-18 16:29:03 -07:00 |