Commit Graph

5043 Commits

Author SHA1 Message Date
Evan Tschannen b2e6b25496
Merge pull request #1764 from xumengpanda/mengxu/release-61/DD-ensure-new-machines-have-teams-PR
[Release 6.1 Patch] Ensure new added machines are used to build teams
2019-07-02 14:03:35 -07:00
Meng Xu de5bcaf588 minTeamNumber for server and machine cannot be uint64_t
Because the consistency check will try to conver the value to int64_t.
If no server exists, the variable will not be updated and thus get overflowed
when it is converted to int64_t
2019-07-01 21:39:18 -07:00
Meng Xu 347a7ecdff MachineTeams:Make traceTeamCollectionInfo not an actor 2019-07-01 16:50:53 -07:00
Meng Xu b8cb883040 AddBestMachineTeams:Fix input must be non-negative value 2019-06-28 22:46:16 -07:00
Meng Xu 63c42533eb TaceTeamCollectionInfo:Remove delay 2019-06-28 16:19:58 -07:00
Meng Xu 875cb877ac TeamCollection: Apply clang-format 2019-06-28 16:01:05 -07:00
Meng Xu 0baae134f6 TeamCollectionInfo: Resolve review comments 2019-06-28 15:59:47 -07:00
Meng Xu cb681693df TeamCollection:Do NOT consider healthyness in counting team number
If a team is removed from DD, it will be marked as failed and eventually removed from the
global teams data structure.
Team healthyness is likely to be a temporary state which can be changed rather quickly.
2019-06-28 09:50:43 -07:00
Meng Xu 4da345f7d2 TeamCollectionTest:Remove test on minTeamOnServer 2019-06-27 19:05:10 -07:00
Meng Xu ce7eb10cac TeamCollectionInfo: Only count team number for healthy server and machine 2019-06-27 19:04:22 -07:00
Meng Xu f889843332 Change traceTeamCollectionInfo to actor
There are cases where traceTeamCollectionInfo was called within the same execution block, i.e.,
no wait between the two traceTeamCollectionInfo calls.
Because simulation uses the same time for all execution instructions in the same execution block,
having more than one traceTeamCollectionInfo at the same time will mess up the trackLatest semantics.
When one of them is always chosen by simulator, simulation test will report false positive error.

Changing this function to actor and adding a small delay inside the function can solve this problem.
2019-06-27 18:24:20 -07:00
Meng Xu 4fe3c7f749 TeamCollectionInfo:Revert to original version where it is 2019-06-27 17:09:21 -07:00
Meng Xu bc3e833634 TeamCollection: Add release note 2019-06-27 16:53:01 -07:00
Meng Xu 42620e4831 TeamCollectionTest:GetTeamCollectionValid wait until values are correct 2019-06-27 16:52:36 -07:00
Meng Xu ee41311a54 TeamCollection:Call addTeamsBestOf when remainingTeamBudget is not 0 2019-06-27 15:29:26 -07:00
Meng Xu 8d5e848808 QuitDatabase test: Check each server has at least 1 team 2019-06-27 14:22:41 -07:00
Meng Xu 2993a96de8 TeamCollectionInfo: Remove debug trace and apply clang format 2019-06-27 14:15:51 -07:00
Meng Xu 5f5c404291 BugFix:ReplicationPolicy always fails when teamSize is 1
Whenever use selectReplicas function, be careful that it may have bugs!
This bug is that it always return false (not able to find candidates)
when the storage team size is 1. This is wrong because when storage team size
is 1, the selectReplicas should return an empty result.
2019-06-27 13:47:49 -07:00
Meng Xu 90c158984c TeamCollection:Add extra trace events 2019-06-27 11:27:29 -07:00
Meng Xu aaf97542e9 TeamCollectionTest: Update unit test 2019-06-27 11:27:29 -07:00
Meng Xu 53324e4db7 TeamCollectionInfo: clang format 2019-06-27 11:27:29 -07:00
Meng Xu cc6a0e9bcd TeamCollectionTest:Do not enforce minServerTeamOnServer larger than 0
In ConfigureTest, one server may be left with 0 server teams, even if
we call buildTeams in the storageServerTracker.
2019-06-27 11:27:29 -07:00
Meng Xu c23d89c98a TeamCollection:Only count healthy teams for a server
When team collection add new server teams, it picks a team with
the least number of teams. We should only consider the healthy teams
because the unhealthy ones will not be useful.
2019-06-27 11:27:29 -07:00
Meng Xu 02cdcc0b0c TeamCollectionTest: Only ensure each server and machine have a team 2019-06-27 11:27:29 -07:00
Meng Xu e1d459075a TeamCollection:Count healthy machine teams only
Team collection should prioritize to build machine teams for a machine
that has the least number of healthy machine teams, instead of just
machine teams, because unhealthy machine team will not be able to
produce more server teams.
2019-06-27 11:27:29 -07:00
Meng Xu ee916b337d TeamCollection:Change the target team number to build
When team collection (TC) build server teams and machine teams,
it needs to build enough teams such that each server and machine has
the DESIRED_TEAMS_PER_SERVER server teams and machine teams.

This change calculate the number of teams (server team and machine teams)
needed to get each teams for each server and machine.
2019-06-27 11:16:44 -07:00
Meng Xu 21664742a6 TeamCollection:Desired team number may be larger than the max possible team number
For example, we have 3 servers for replica factor 3. We can have only 1 team
but the desired team number is 3 times 5 equal to 15.

Instead of sanity checking the absolute team number per server, we check
the difference between the minServerTeamOnServer and maxServerTeamOnServer.
2019-06-27 11:15:06 -07:00
Meng Xu 08f28e99f9 TeamCollection:Test no server or machine has incorrect team number
Add test for simulation test which make sure the server team number
per server will be no less than the desired_teams_per_server defined
in knobs and no larger than the max_teams_per_server.

Add similar test for machine teams number per machine as well.
2019-06-27 11:15:06 -07:00
John Brownlee 9ff1b06484
Merge pull request #1722 from alecgrieser/01690-docker-hard-codes-4500
Server docker image to use same port for listen and public IP
2019-06-20 10:04:22 -07:00
Alec Grieser c92324b894
python sample docker app uses default coordinator port 2019-06-20 08:54:12 -07:00
Alec Grieser 7b12374a87
Fixes #1690: Server docker image hard-codes 4500 in a few places
This makes the default public port for starting FDB processes the same as the FDB_PORT. This is probably necessary given #1714, especially for coordinators, though it might not be necessary for other processes in the cluster. This can *almost* be used to start up multiple FDB processes locally and then access them from the same machine, but that (unfortunately) requires both the other processes in the docker compose network and the host machine to agree on what IP to use for the coordinator. But as that machine has different IPs in those networks, they cannot be made to agree.
2019-06-18 18:36:12 -07:00
Alec Grieser a4fc1a6b57
Merge pull request #1708 from ajbeamon/fix-binding-tester-tuple-comparison
Use fdb.tuple.compare to do tuple comparison in binding tester
2019-06-18 16:03:01 -07:00
A.J. Beamon 88e765b9e6 Fix: the binding tester was taking the min() of a list of tuples, but that could fail if the tuple contained incomparable types. Instead, use fdb.tuple.compare() to do the comparison. 2019-06-17 11:43:58 -07:00
A.J. Beamon 4cb9d5eae6
Merge pull request #1710 from etschannen/release-6.1
fixed documentation
2019-06-14 15:20:42 -07:00
Evan Tschannen 6d7041bc51 fixed documentation 2019-06-14 15:18:35 -07:00
Evan Tschannen 2684f601be
Merge pull request #1706 from etschannen/post-release-cleanup-6.1.10
Post release cleanup 6.1.10
2019-06-14 12:36:13 -07:00
Evan Tschannen 3b49e5c911 update installer WIX GUID following release 2019-06-14 12:35:28 -07:00
Evan Tschannen 4d9159fddd update versions target to 6.1.11 2019-06-14 12:35:28 -07:00
Alvin Moore 7686fdd8e3
Merge pull request #1572 from mpilman/productbuild
Use productbuild instead of PackageMaker on MacOS
2019-06-14 05:56:20 -07:00
Evan Tschannen 141d6f4ba1
Merge pull request #1703 from etschannen/release-6.1
updated documentation for 6.1.10
2019-06-13 17:59:28 -07:00
Evan Tschannen a11b65236d updated documentation for 6.1.10 2019-06-13 17:58:34 -07:00
Evan Tschannen f86c207b57
Merge pull request #1702 from etschannen/prepare-release-6.1.10
update installer WIX GUID following release
2019-06-13 17:57:35 -07:00
Evan Tschannen e1868064b1 update installer WIX GUID following release 2019-06-13 17:56:59 -07:00
Evan Tschannen 2344c47f1c
Merge pull request #1701 from satherton/fix-fdbrestore-clusterfile
Bug fix:  fdbrestore abort, wait, and status were not using the --des…
2019-06-13 17:53:15 -07:00
Steve Atherton 0110b672c7
Update documentation/sphinx/source/release-notes.rst 2019-06-13 17:51:48 -07:00
Steve Atherton 716d95eaf1
Update documentation/sphinx/source/release-notes.rst
Co-Authored-By: Evan Tschannen <36455792+etschannen@users.noreply.github.com>
2019-06-13 17:51:04 -07:00
Stephen Atherton 9c0238d262 Added release note for fdbrestore cluster file argument fix. 2019-06-13 17:46:04 -07:00
Stephen Atherton 235d24ee26 Bug fix: fdbrestore abort, wait, and status were not using the --dest_cluster_file argument. 2019-06-13 17:42:11 -07:00
A.J. Beamon ada4c9ed23
Merge pull request #1700 from etschannen/release-6.1
Prevent the byte sample recovery from interfering with storage server recovery
2019-06-13 16:32:39 -07:00
Evan Tschannen 91b52e4a45 added a release note 2019-06-13 16:03:31 -07:00