Meng Xu
08a721b320
Merge branch 'master' into mengxu/server-team-remover-PR
2019-07-08 16:30:32 -07:00
A.J. Beamon
0a5c7608df
Remove "Number" suffix from newly added events (and variables that feed the events).
2019-07-08 15:45:28 -07:00
A.J. Beamon
f52c239ef8
Merge branch 'master' into trace-event-rename
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
2019-07-08 15:37:00 -07:00
A.J. Beamon
a5a6f8431c
Add a random UID to TransactionMetrics in case a client opens multiple connections and also a field to indicate whether the connection is internal. Convert some of the metrics to our Counter object instead of running totals.
2019-07-08 14:01:04 -07:00
Evan Tschannen
c348b3da51
After a proxy dies, it will remain alive for an additional 10 seconds to forward clients to the new proxies
2019-07-08 12:53:40 -07:00
Evan Tschannen
ec11ef024b
Merge pull request #1798 from ajbeamon/merge-release-6.1-into-master
...
Merge release 6.1 into master
2019-07-08 09:02:56 -07:00
A.J. Beamon
dd85edb08c
Merge pull request #1802 from xumengpanda/mengxu/DD-ensure-redundant-team-priority-as700-PR
...
TeamTracker:Set redundant team priority as PRIORITY_TEAM_REDUNDANT
2019-07-08 08:47:28 -07:00
Vishesh Yadav
8d3a826c63
Merge pull request #1804 from alexmiller-apple/cycle-verify-only
...
Add a checkOnly parameter to Cycle workload.
2019-07-05 21:59:52 -07:00
Jingyu Zhou
50e7593c5b
Merge pull request #1796 from ajbeamon/remove-trace-event-underscores
...
Remove trace event underscores
2019-07-05 21:45:55 -07:00
Alex Miller
14e5dd74fe
Add a checkOnly parameter to Cycle workload.
...
So that it can be used in the real world for consistency checking of
backup and DR.
2019-07-05 19:09:09 -07:00
Evan Tschannen
310a5fe9a3
fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy
2019-07-05 17:28:22 -07:00
Meng Xu
e8fb7564f5
Merge branch 'master' into mengxu/DD-ensure-redundant-team-priority-as700-PR
2019-07-05 17:28:12 -07:00
Meng Xu
c7a996267c
TeamRemover: Remove unused declaration
...
Also change state variable to variable.
2019-07-05 16:54:06 -07:00
Evan Tschannen
15e894c724
Merge in master
2019-07-05 15:49:24 -07:00
Evan Tschannen
e7c0ecf729
fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy
2019-07-05 15:46:16 -07:00
Meng Xu
46d28a3b79
TeamTracker:Set redundant team priority as redundant
...
The redundant team removed by teamRemover will not exist
in the global teams data structure. So we will not find
the redundant team from shard-to-team mapping in the system key.
Before this change, teamTracker marks such team as PRIORITY_TEAM_UNHEALTHY.
With this change, it marks it as PRIORITY_TEAM_REDUNDANT
2019-07-05 15:24:00 -07:00
A.J. Beamon
4be08d9b2d
Rename datacenter_version_difference to datacenter_lag and include both seconds and versions.
2019-07-05 14:36:18 -07:00
A.J. Beamon
2a56e011ea
Merge branch 'release-6.1' into merge-release-6.1-into-master
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/DataDistribution.actor.cpp
2019-07-05 13:52:29 -07:00
Meng Xu
7ba6cd2d9d
ServerTeamRemover:Reduce the overshot server team number to build
...
Each server has the maximum of DESIRED_TEAMS_PER_SERVER and
(DESIRED_TEAMS_PER_SERVER * storageTeamSize) / 2)
2019-07-05 11:01:50 -07:00
A.J. Beamon
2a709ee5d0
Rename event details that use the suffix "Number" to indicate a count, as number could also imply an index. Rename a few other trace events and details that e.g. needed to be pluralized.
2019-07-05 08:54:21 -07:00
A.J. Beamon
9f4b6fd770
Remove additional underscores
2019-07-05 08:12:25 -07:00
A.J. Beamon
a3ac9c7eea
Remove underscores from some trace event names
2019-07-05 08:08:29 -07:00
Alex Miller
ea6898144d
Merge remote-tracking branch 'upstream/master' into flowlock-api
2019-07-03 20:44:15 -07:00
Evan Tschannen
23ecc17075
Merge pull request #1755 from senthil-ram/recoveryFix
...
sev40 if knownCommittedVersion > recoveryVersion
2019-07-03 16:39:16 -07:00
Evan Tschannen
e153571a50
Merge pull request #1775 from alexmiller-apple/crc32c-memory-storage
...
Memory storage engine to use crc32c DiskQueue by default (in 6.2).
2019-07-03 16:37:42 -07:00
Evan Tschannen
79a90d33a7
fix: the push location for txs tags needs to be based on what the tag will become after changing the number of txs tags
2019-07-03 16:06:54 -07:00
A.J. Beamon
8c10d832a1
Add coordinator role in trace events
2019-07-03 11:09:36 -07:00
Meng Xu
2782d432ac
ServerTeamRemover:Update the desired number and pick unhealthy teams first
2019-07-02 22:17:53 -07:00
Meng Xu
599fcb2e6d
Add serverTeamRemover to remove redundant server teams
2019-07-02 17:40:37 -07:00
Meng Xu
716494ed9f
ConsistencyCheck:Check serverTeamNumber larger than desired number
2019-07-02 17:40:37 -07:00
Meng Xu
7461c87ae6
AddTeamsBestOf: Build more teams than desired
...
We build more teams than we finally want so that we can use serverTeamRemover() actor to remove the teams
whose member belong to too many teams. This allows us to get a more balanced number of teams per server.
2019-07-02 17:40:37 -07:00
Evan Tschannen
8afab93e29
Merge pull request #1782 from etschannen/master
...
revert storage server priority changes
2019-07-02 17:25:31 -07:00
Evan Tschannen
3fb0999e10
revert storage server priority changes
2019-07-02 16:54:47 -07:00
Evan Tschannen
86b0224347
Merge branch 'release-6.1' of github.com:apple/foundationdb into release-6.1
2019-07-02 16:27:31 -07:00
Evan Tschannen
64e33bb4f9
added logging for maintenance mode
2019-07-02 16:25:29 -07:00
Stephen Atherton
71ba490cf8
Removed use of the C "struct hack" as it is not valid C++. Replaced zero-length members with functions returning a pointer for arrays or a reference for single members.
2019-07-02 16:02:58 -07:00
dyoungworth
817fce080b
Fix minor bug in External Workload
2019-07-02 15:57:26 -07:00
Meng Xu
7afbd10a10
Change teamRemover to machineTeamRemover
2019-07-02 15:16:34 -07:00
Meng Xu
d2d6022ed4
StorageServerTracker:Do not always set doBuildTeams
...
When interface changes, we set doBuildTeams to true only when
the interface location changes.
2019-07-02 14:24:26 -07:00
Meng Xu
de5bcaf588
minTeamNumber for server and machine cannot be uint64_t
...
Because the consistency check will try to conver the value to int64_t.
If no server exists, the variable will not be updated and thus get overflowed
when it is converted to int64_t
2019-07-01 21:39:18 -07:00
Evan Tschannen
841e61ac25
fixed a broken promise in localRatekeeper
2019-07-01 16:56:35 -07:00
Meng Xu
347a7ecdff
MachineTeams:Make traceTeamCollectionInfo not an actor
2019-07-01 16:50:53 -07:00
mengranwo
e54eedf0e2
Address pr comments, remove wait(tr.commit()) for read-only txn
2019-07-01 16:09:51 -07:00
mengranwo
0ad151e70a
style formatting
2019-07-01 16:09:51 -07:00
mengranwo
819b6e3d6d
fix compiling error
2019-07-01 16:09:51 -07:00
mengranwo
c7148bbb14
address cr comments:
2019-07-01 16:09:51 -07:00
mengranwo
d96cdacdd5
fix format issue
2019-07-01 16:09:51 -07:00
mengranwo
11161746f8
add try catch block around tx.onerror()
2019-07-01 16:09:51 -07:00
mengranwo
6b61b0e030
fix syntax error, pass compile
2019-07-01 16:09:51 -07:00
mengranwo
0b9cd18fb4
checking cluster is healthy or not during recovery process(for storage engine), if healthy, delete data files and join as new
2019-07-01 16:09:51 -07:00
Jingyu Zhou
b69d7adabc
Remove unused remoteRecovered from master server
2019-07-01 15:41:35 -07:00
Alex Miller
23de5b64ad
Memory storage engine to use crc32c DiskQueue by default (in 6.2).
2019-07-01 13:38:06 -07:00
Meng Xu
b8cb883040
AddBestMachineTeams:Fix input must be non-negative value
2019-06-28 22:46:16 -07:00
Evan Tschannen
4e45a58750
fix: forced recovery did not copy the number of txsTags properly
2019-06-28 20:51:16 -07:00
Evan Tschannen
2c40c818cf
fix: txsTags was not copied into oldLogData
2019-06-28 17:51:16 -07:00
Alex Miller
8e1ab6e7db
Merge remote-tracking branch 'upstream/master' into flowlock-api
2019-06-28 17:32:54 -07:00
Evan Tschannen
5041ff38b1
removed unneeded description
2019-06-28 16:54:22 -07:00
Evan Tschannen
a124fc6e8a
fixed compiler error
2019-06-28 16:54:22 -07:00
Evan Tschannen
b9a6271375
local ratekeeper no longer globally limits
2019-06-28 16:54:22 -07:00
Evan Tschannen
4cef1d3937
Experimental change of storage write priority
2019-06-28 16:54:22 -07:00
Evan Tschannen
f539b5f09a
fix: a large targetRateRatio means limiting more
2019-06-28 16:54:22 -07:00
Evan Tschannen
18d5fbf1e0
Avoid jumping from rejecting 0% of requests directly to 20% of requests
2019-06-28 16:54:22 -07:00
Evan Tschannen
db413c37f7
restored the STORAGE_DURABILITY_LAG_SOFT_MAX knob and made the rk target slightly smaller than the soft limit, to avoid inaccuracies in ratekeeper control causing behavior changes on the storage servers
2019-06-28 16:54:22 -07:00
Evan Tschannen
ec16688db1
fixed the local ratekeeper workload to match the logic on the storage server
2019-06-28 16:54:22 -07:00
Evan Tschannen
a97940a10b
fixed compiler error
2019-06-28 16:54:22 -07:00
Evan Tschannen
92b32855ca
ratekeeper’s control algorithm would oscillate when limited by local ratekeeper
2019-06-28 16:54:22 -07:00
Evan Tschannen
1b939d5208
Merge pull request #1749 from satherton/feature-redwood
...
Update redwood storage engine to latest correctness-passing version
2019-06-28 16:22:06 -07:00
Meng Xu
63c42533eb
TaceTeamCollectionInfo:Remove delay
2019-06-28 16:19:58 -07:00
Meng Xu
875cb877ac
TeamCollection: Apply clang-format
2019-06-28 16:01:05 -07:00
Meng Xu
0baae134f6
TeamCollectionInfo: Resolve review comments
2019-06-28 15:59:47 -07:00
Evan Tschannen
cfce1e1705
fix: buffered peek cursor would advance very slowly through large ranges of empty versions
2019-06-28 15:54:08 -07:00
Evan Tschannen
7f4586ad49
the number of txsTags needs to be tracked separately from the number of transaction logs because of forced recoveries
2019-06-28 12:33:24 -07:00
Meng Xu
cb681693df
TeamCollection:Do NOT consider healthyness in counting team number
...
If a team is removed from DD, it will be marked as failed and eventually removed from the
global teams data structure.
Team healthyness is likely to be a temporary state which can be changed rather quickly.
2019-06-28 09:50:43 -07:00
Evan Tschannen
2113d6d01e
fix: peek all possible txsTags which could have been used by old log sets
2019-06-27 23:39:19 -07:00
Evan Tschannen
235697f688
fix: txsTags are not popped at the recovery version
2019-06-27 23:18:26 -07:00
Meng Xu
4da345f7d2
TeamCollectionTest:Remove test on minTeamOnServer
2019-06-27 19:05:10 -07:00
Meng Xu
ce7eb10cac
TeamCollectionInfo: Only count team number for healthy server and machine
2019-06-27 19:04:22 -07:00
Meng Xu
f889843332
Change traceTeamCollectionInfo to actor
...
There are cases where traceTeamCollectionInfo was called within the same execution block, i.e.,
no wait between the two traceTeamCollectionInfo calls.
Because simulation uses the same time for all execution instructions in the same execution block,
having more than one traceTeamCollectionInfo at the same time will mess up the trackLatest semantics.
When one of them is always chosen by simulator, simulation test will report false positive error.
Changing this function to actor and adding a small delay inside the function can solve this problem.
2019-06-27 18:24:20 -07:00
Meng Xu
4fe3c7f749
TeamCollectionInfo:Revert to original version where it is
2019-06-27 17:09:21 -07:00
Meng Xu
42620e4831
TeamCollectionTest:GetTeamCollectionValid wait until values are correct
2019-06-27 16:52:36 -07:00
Meng Xu
ee41311a54
TeamCollection:Call addTeamsBestOf when remainingTeamBudget is not 0
2019-06-27 15:29:26 -07:00
Evan Tschannen
52efcfd136
fix: properly create the right number for txsTags when changing between different numbers of logs
2019-06-27 15:15:05 -07:00
Meng Xu
8d5e848808
QuitDatabase test: Check each server has at least 1 team
2019-06-27 14:22:41 -07:00
Meng Xu
2993a96de8
TeamCollectionInfo: Remove debug trace and apply clang format
2019-06-27 14:15:51 -07:00
Meng Xu
5f5c404291
BugFix:ReplicationPolicy always fails when teamSize is 1
...
Whenever use selectReplicas function, be careful that it may have bugs!
This bug is that it always return false (not able to find candidates)
when the storage team size is 1. This is wrong because when storage team size
is 1, the selectReplicas should return an empty result.
2019-06-27 13:47:49 -07:00
A.J. Beamon
35b6277a50
Fix knob copy paste error
2019-06-27 12:55:39 -07:00
mpilman
7bfda1faaa
Fixed three more Windows issues
...
This is now compiling on my Windows machine
2019-06-27 11:39:36 -07:00
Meng Xu
90c158984c
TeamCollection:Add extra trace events
2019-06-27 11:27:29 -07:00
Meng Xu
aaf97542e9
TeamCollectionTest: Update unit test
2019-06-27 11:27:29 -07:00
Meng Xu
53324e4db7
TeamCollectionInfo: clang format
2019-06-27 11:27:29 -07:00
Meng Xu
cc6a0e9bcd
TeamCollectionTest:Do not enforce minServerTeamOnServer larger than 0
...
In ConfigureTest, one server may be left with 0 server teams, even if
we call buildTeams in the storageServerTracker.
2019-06-27 11:27:29 -07:00
Meng Xu
c23d89c98a
TeamCollection:Only count healthy teams for a server
...
When team collection add new server teams, it picks a team with
the least number of teams. We should only consider the healthy teams
because the unhealthy ones will not be useful.
2019-06-27 11:27:29 -07:00
Meng Xu
02cdcc0b0c
TeamCollectionTest: Only ensure each server and machine have a team
2019-06-27 11:27:29 -07:00
Meng Xu
e1d459075a
TeamCollection:Count healthy machine teams only
...
Team collection should prioritize to build machine teams for a machine
that has the least number of healthy machine teams, instead of just
machine teams, because unhealthy machine team will not be able to
produce more server teams.
2019-06-27 11:27:29 -07:00
Meng Xu
ee916b337d
TeamCollection:Change the target team number to build
...
When team collection (TC) build server teams and machine teams,
it needs to build enough teams such that each server and machine has
the DESIRED_TEAMS_PER_SERVER server teams and machine teams.
This change calculate the number of teams (server team and machine teams)
needed to get each teams for each server and machine.
2019-06-27 11:16:44 -07:00
Meng Xu
21664742a6
TeamCollection:Desired team number may be larger than the max possible team number
...
For example, we have 3 servers for replica factor 3. We can have only 1 team
but the desired team number is 3 times 5 equal to 15.
Instead of sanity checking the absolute team number per server, we check
the difference between the minServerTeamOnServer and maxServerTeamOnServer.
2019-06-27 11:15:06 -07:00
Meng Xu
08f28e99f9
TeamCollection:Test no server or machine has incorrect team number
...
Add test for simulation test which make sure the server team number
per server will be no less than the desired_teams_per_server defined
in knobs and no larger than the max_teams_per_server.
Add similar test for machine teams number per machine as well.
2019-06-27 11:15:06 -07:00
A.J. Beamon
7f23814841
Track run loop busyness and report it in status.
2019-06-26 14:03:02 -07:00
Alex Miller
83fae6cc15
Fix ExternalWorkload not being a part of the old build/test system.
2019-06-25 21:42:35 -07:00
Alex Miller
b5af601a8a
Fix ExternalWorkload not being a part of the old build/test system.
2019-06-25 21:41:43 -07:00
sramamoorthy
0a94f96dee
sev40 if knownCommittedVersion > recoveryVersion
2019-06-25 16:17:45 -07:00
Alex Miller
bf883d7055
Merge remote-tracking branch 'upstream/master' into flowlock-api
2019-06-25 14:26:50 -07:00
Evan Tschannen
0fe6edc254
Merge pull request #1678 from mpilman/features/external-workload
...
Features/external workload
2019-06-25 13:53:19 -07:00
Evan Tschannen
c913aafc1c
Merge pull request #1721 from bnamasivayam/address-comma-separate-list
...
Make public address and listen address a comma separated list
2019-06-25 13:52:16 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Stephen Atherton
f1f1081202
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/VersionedBTree.actor.cpp
2019-06-24 20:17:49 -07:00
Evan Tschannen
76ba4e60b7
fixed a stack overflow bug
2019-06-24 13:03:35 -07:00
sramamoorthy
212136d024
SnapTest to handle retries for exec txns
2019-06-24 10:22:42 -07:00
Stephen Atherton
112b0918c9
Refactored set() speed test to produce random sets of consecutive records with random prefixes that will often share common bytes.
2019-06-24 01:05:16 -07:00
Alec Grieser
e8c75505d3
Merge pull request #1725 from jzhou77/db-option
...
Add transaction size option
2019-06-21 08:25:34 -07:00
Balachandar Namasivayam
5ce45a8a2d
Addressed review comments.
2019-06-20 23:03:49 -07:00
Balachandar Namasivayam
7489f83a7f
Disable/Re-enable consistency check through a database key.
...
fdbcli has a new command 'consistencycheck' to disable/re-enable consistency check.
cluster_healthy metric in status becomes false if consistencycheck is disabled.
2019-06-20 21:38:45 -07:00
Evan Tschannen
1c005d5878
Merge pull request #1584 from alexmiller-apple/spilled-only-peek
...
Save TLog resources by letting peek request only spilled data.
2019-06-20 18:22:31 -07:00
Alex Miller
26343f557a
Update getMore() contract.
...
MultiCursor already did this.
2019-06-20 17:48:24 -07:00
Evan Tschannen
37c1df2491
Merge pull request #1705 from bnamasivayam/suspend-process
...
Extend RebootRequest API to include time to suspend the process befor…
2019-06-20 17:36:25 -07:00
Evan Tschannen
460af91913
Merge pull request #1727 from alexmiller-apple/dd-failure-time
...
Increase how long FDB will wait before starting DD to repair data loss.
2019-06-20 17:33:16 -07:00
Jingyu Zhou
357c9ba0fb
Refactor code
2019-06-19 20:41:53 -07:00
Evan Tschannen
e0be631414
shard the txs tag so that more transaction logs are involved in its recovery
2019-06-19 18:15:09 -07:00
Alex Miller
df0baa0066
Merge pull request #1720 from mpilman/features/protocol-version
...
Make protocol version a type
2019-06-19 13:46:35 -07:00
Alex Miller
61901effed
Increase how long FDB will wait before starting DD to repair data loss.
...
10s is a bit short for starting data distribution, which is rather
expensive. 60s is a bit more reasonable.
2019-06-19 13:40:21 -07:00
mpilman
ab7562160c
Made JavaWorkload an external workload
2019-06-19 13:03:41 -07:00
mpilman
2eff2b7e21
First simple test is working (but very buggy)
2019-06-19 13:03:41 -07:00
mpilman
1707f068e0
started implementation first c workload
2019-06-19 13:03:41 -07:00
mpilman
c8957d93f8
Implementation code complete
2019-06-19 13:03:41 -07:00
Alex Miller
ce24db3c53
Fully consume parallelPeekMore results before switching back.
2019-06-19 01:30:49 -07:00
Balachandar Namasivayam
4832404c85
Make public address and listen address a comma separated list
2019-06-18 18:15:15 -07:00
mpilman
68ce9a5e75
ProtocolVersion type - second try
2019-06-18 17:55:27 -07:00
Alex Miller
51fd42a4d2
Merge remote-tracking branch 'upstream/master' into spilled-only-peek
2019-06-18 17:33:52 -07:00
Alex Miller
4fa5dc0502
Merge remote-tracking branch 'upstream/master' into cloexec
2019-06-18 16:35:18 -07:00
mpilman
8576665a90
Revert "Revert "Make protocol version a type""
...
This reverts commit 455bf3b3ec
.
2019-06-18 14:49:04 -07:00
Alex Miller
455bf3b3ec
Revert "Make protocol version a type"
2019-06-18 10:59:17 -07:00
A.J. Beamon
c3aa5819f2
Merge pull request #1417 from mpilman/features/client-buggify
...
Overall framework and first buggify entries
2019-06-18 09:10:11 -07:00
Stephen Atherton
d4b7f9b606
Fixed some cmake, compile, and IDE warnings.
2019-06-17 18:55:49 -07:00
Steve Atherton
ba52623637
Merge pull request #1582 from tclinken/features/sqlite-crc32c
...
Use crc32 for sqlite page checksums
2019-06-17 14:20:41 -07:00
mpilman
da53a92bec
Make protocol version a type
...
This fixes #1214
The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
mpilman
6ea75713cb
Overall framework and first buggify entries
2019-06-16 09:09:09 -07:00
Evan Tschannen
20e3edeb0a
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/storageserver.actor.cpp
# versions.target
2019-06-14 12:42:59 -07:00
Balachandar Namasivayam
5eb833759e
Extend RebootRequest API to include time to suspend the process before reboot. This is intended to be used for testing purposes to simulate failures.
2019-06-14 11:35:38 -07:00
Evan Tschannen
6ececa94ce
Merge pull request #1640 from vishesh/task/client-failmon
...
Clients will no longer get failure monitoring info from cluster controller
2019-06-13 17:31:17 -07:00
A.J. Beamon
fddcf3486c
Merge pull request #1697 from etschannen/increase_idle_delay
...
Increase idle delay
2019-06-13 16:34:22 -07:00
A.J. Beamon
aad79aae49
Merge pull request #1699 from senthil-ram/boostwindowsmac
...
disable boost::process code for windows and mac
2019-06-13 16:12:40 -07:00
Evan Tschannen
924f92e5aa
Prevent the byte sample recovery from interfering with storage server recovery
2019-06-13 15:55:25 -07:00
sramamoorthy
1d1d42c8af
disable boost::process code for windows and mac
2019-06-13 15:43:03 -07:00
Evan Tschannen
b2a5d4fd0d
Merge branch 'master' into increase_idle_delay
2019-06-13 15:23:18 -07:00
A.J. Beamon
e45c13358e
Merge pull request #1691 from etschannen/master
...
Fixed a number of correctness problems
2019-06-13 15:11:16 -07:00
Evan Tschannen
054d775343
increase the delay between idle commits to reduce the rate idle clusters fsync
2019-06-13 14:55:37 -07:00
Evan Tschannen
55f7e7d372
fix: The delay inside the disabledMap was causing the storage server updateStorage actor to run on the client process
2019-06-13 14:28:30 -07:00
A.J. Beamon
3dd2479193
Try avoiding use of boost in FDBExecHelper
2019-06-13 13:09:29 -07:00
Evan Tschannen
dccb9bc26d
fixed a number of correctness problems
2019-06-12 19:40:50 -07:00
Trevor Clinkenbeard
1e8f7e5b82
Refactor NextFastAllocatedSize to be constexpr function
2019-06-11 15:55:23 -07:00
Trevor Clinkenbeard
cb420ea4bd
Only construct waitDescription in simulator
2019-06-11 12:43:39 -07:00
Trevor Clinkenbeard
8144882d7b
Merge branch 'apple-master' into features/local-rk
2019-06-10 19:40:25 -07:00
Trevor Clinkenbeard
46b77819aa
Fixed LocalRatekeeper test
2019-06-10 18:25:58 -07:00
Vishesh Yadav
a8e408e268
run clang-format on changes
2019-06-10 14:10:24 -07:00
Vishesh Yadav
6fa7081a21
net: Don't make FailureMonitoring requests from client
...
This patch removes the need for clients to continuously contact
cluster coordinator for failure monitoring information. Instead, it
uses the FlowTransport to monitor the statuses of peers and update
FailureMonitor accordingly.
2019-06-09 00:43:38 -07:00
Vishesh Yadav
6b4d30c3ae
failmon: Identify client vs server when starting failure monitoring client
2019-06-09 00:43:12 -07:00
Evan Tschannen
5bdf5aaeb6
Merge pull request #1662 from etschannen/master
...
Merge 6.1 into master
2019-06-06 13:57:34 -07:00
Stephen Atherton
100789b354
More bug fixes in handling upperBound changes in modified pages and worst-case delta size calculation. Normalized some formatting in debug statements. Fixed compile error on linux. Updated test specs.
2019-06-05 20:58:47 -07:00
Trevor Clinkenbeard
8dbb231f33
Don't reject read requests until the storage server durability lag gets large enough
2019-06-05 15:42:58 -07:00
Trevor Clinkenbeard
d1d98f298a
Changed storage server getPenalty calculation.
...
Penalty should always be >= 1.0
2019-06-05 14:14:40 -07:00
chaoguang
877a59fab9
add in fdbserver.vcxproj.filters
2019-06-04 15:58:17 -07:00
Stephen Atherton
6aad34620d
Bug fix in upper boundary selection in commitSubtree(). More debug output.
2019-06-04 04:55:09 -07:00
Stephen Atherton
653440d54c
Changes and bug fixes in how boundary keys are modified during clears in internal pages by rewriting how internal pages are modified, making edge cases much easier to handle. Several debug output improvements. Page numbers stored on disk are now big endian.
2019-06-04 04:03:52 -07:00
Evan Tschannen
29b96414e2
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/NativeAPI.actor.cpp
# fdbserver/Coordination.actor.cpp
# flow/Arena.h
# versions.target
2019-06-03 18:49:35 -07:00
chaoguang
66811b7bd2
update to latest version
2019-06-03 16:49:19 -07:00
chaoguang
3055376b45
remove static keyword to make variables not in binary
2019-06-03 16:40:34 -07:00
Parallels
773f52d0a1
Merge remote-tracking branch 'upstream/master' into cloexec
2019-06-03 15:43:32 -07:00
A.J. Beamon
bb22ee7d37
Merge pull request #1649 from etschannen/feature-coordinator-bug
...
The coordinators did not always converge on the same leader
2019-06-03 15:04:25 -07:00
A.J. Beamon
773bce9e32
Merge pull request #1643 from etschannen/feature-cc-mem-leak
...
Fixed a memory leak on the cluster controller
2019-06-03 15:02:36 -07:00
Meng Xu
dc59f63d0e
TraceEvent:First letter must be capitalized
2019-06-03 13:27:18 -07:00
chaoguang
ac2c0f38b7
remove inheritance from KVWorkload
2019-06-02 23:16:39 -07:00
chaoguang
d07c46e3f3
fix issues by comments
2019-05-31 00:44:07 -07:00
chaoguang
66d25cef21
fix issues by comments
2019-05-31 00:27:30 -07:00
Evan Tschannen
b830fa4c84
fix: A minority of coordinators could continue choosing a candidate which was not the leader
2019-05-30 17:25:20 -07:00
Stephen Atherton
9f064ad7cf
Added back minimal btree internal page boundaries using RedwoodRecordRef.
2019-05-30 02:10:07 -07:00
Stephen Atherton
098ac46af9
RedwoodRecordRef::deltaSize() now calculates actual delta size instead of a conservative estimate.
2019-05-29 18:06:11 -07:00
Stephen Atherton
3e155a2563
Bug fixes.
2019-05-29 17:38:55 -07:00
Evan Tschannen
7c333dbc16
If a process receives a message in its clusterControllerInterface before becoming the cluster controller, if the process does not become the cluster controller in the next minute it should destroy the interface to prevent a memory leak.
2019-05-29 16:57:13 -07:00
Stephen Atherton
cedcfcddd0
Bug fix in RedwoodRecordRef::Delta var int writer, new tests.
2019-05-29 16:47:53 -07:00
Stephen Atherton
1e5b9faa11
Bug fixes in RedwoodRecordRef::Delta.
2019-05-29 16:26:58 -07:00
Evan Tschannen
362c2bf1e6
improved the cpu efficiency of printable
2019-05-29 14:55:45 -07:00
Stephen Atherton
02882dbf00
Checkpointing progress, RedwoodRecordRef and DeltaTree tests pass but BTree test does not. RedwoodRecordRef::Delta rewritten to actually do prefix compression on key and integer fields. Added related unit tests and benchmarks. Some improvements to DeltaTree and requirements on its T and Delta types to avoid repeated common prefix discovery.
2019-05-29 06:23:32 -07:00
sramamoorthy
1190f2f33d
rebased related changes
2019-05-28 22:07:46 -07:00
sramamoorthy
4bcb590f12
g_random -> deterministicRandom()
2019-05-28 22:07:46 -07:00
sramamoorthy
b43c100e57
TLog bug fixes
2019-05-28 22:07:46 -07:00
sramamoorthy
42c551a996
handle isRestoring & BackupFailed not being set
...
restartInfo.in->BackupFailed and isRestoring may not be
set in all cases, handle the absence of them.
2019-05-28 22:07:46 -07:00
sramamoorthy
3877f87481
comment change in tLogCommit
2019-05-28 22:07:46 -07:00
sramamoorthy
2a68b28590
rebase related changes
2019-05-28 22:07:46 -07:00
sramamoorthy
b17ad85497
exec op not supported when log_anti_quorum > 0
2019-05-28 22:07:46 -07:00
sramamoorthy
3aa848b8af
minor bug in whitelist binary path testing
2019-05-28 22:07:46 -07:00
sramamoorthy
c906da1f62
simulator: spawnProcess to wait for long duration
...
spawnProcess was waiting for 3 seconds and terminating
the child process for synchronous calls, but in the
simulator, this can lead to non-determinism, because
some cases the command can run in <3 or >3 seconds.
The fix is to increase the wait for duration to be
very long that it has to synchronously wait and get
the results or the test will timeout.
2019-05-28 22:07:46 -07:00
sramamoorthy
31b6c86650
ignorePopDeadline to have high limit in simulator
...
- ignorePopDeadline to have highier limit in simulator
to accommdate for the buggify delays and make snapshot succeed.
- introduce a new knob for auto resetting the disabling of tlog pop
2019-05-28 22:07:46 -07:00
sramamoorthy
40358e1dd6
limit of getRange in snapTest reduced
...
With CLIENT_KNOBS->TOO_MANY in snapTest, by the time getRange
gathers all the results, the storage server's oldest version has
gone past the req->version and hence the transaction fails with
transaction_too_old
2019-05-28 22:07:46 -07:00
sramamoorthy
b1b96946af
logData->stop check right after execOpHold wait
2019-05-28 22:07:46 -07:00
sramamoorthy
5749e220bd
use FlowLock for implementing critical section
...
Instead of using Promises and future to implement
critcal section use FlowLock
2019-05-28 22:07:46 -07:00
sramamoorthy
e6c0b87a4d
remove unused variable
2019-05-28 22:07:46 -07:00
sramamoorthy
b56d8e648f
bp::child->wait_for does not give correct err code
...
boost::process::child->wait_for does not give the error code
from the process being run. Re-arrange the code to work-around
it.
2019-05-28 22:07:46 -07:00
sramamoorthy
f27a40f118
execProcessingHelper made synchronous
...
tLogCommit exects no blocking between duplicate check and
setting of the new version, that constraint was broken
when synchronous execProcessingHelper was introduced.
As a fix, execProcessingHelper was made asynchronous.
2019-05-28 22:07:46 -07:00
sramamoorthy
ceac68c990
restore - remove emtpy snapdir,snap loop retry fix
...
- remove partially snapped directories to avoid no cluster file assert
- snap create to retry max 3 times for not_fully_recovered and keep
retrying for the other failures
2019-05-28 22:07:46 -07:00
sramamoorthy
d3a179b6f9
Multiple bug fixes
...
- wait for snapTLogFailKeys in a loop, otherwise in some race
condition it can cause a false assert
- in single region, there does not seem to be a guarantee of
tagLocalityListKey for a given DC ID, avoiding that assert for now
- to find the workers that are coordinators, looking up by primary
address is not sufficient in some cases, hence looking by both
primary and secondary address
- test make files to reflect the location of the new test cases
2019-05-28 22:07:46 -07:00