foundationdb

Commit Graph

Author	SHA1	Message	Date
Meng Xu	08a721b320	Merge branch 'master' into mengxu/server-team-remover-PR	2019-07-08 16:30:32 -07:00
A.J. Beamon	0a5c7608df	Remove "Number" suffix from newly added events (and variables that feed the events).	2019-07-08 15:45:28 -07:00
A.J. Beamon	f52c239ef8	Merge branch 'master' into trace-event-rename # Conflicts: # fdbserver/DataDistribution.actor.cpp # fdbserver/QuietDatabase.actor.cpp	2019-07-08 15:37:00 -07:00
A.J. Beamon	a5a6f8431c	Add a random UID to TransactionMetrics in case a client opens multiple connections and also a field to indicate whether the connection is internal. Convert some of the metrics to our Counter object instead of running totals.	2019-07-08 14:01:04 -07:00
Evan Tschannen	c348b3da51	After a proxy dies, it will remain alive for an additional 10 seconds to forward clients to the new proxies	2019-07-08 12:53:40 -07:00
Evan Tschannen	ec11ef024b	Merge pull request #1798 from ajbeamon/merge-release-6.1-into-master Merge release 6.1 into master	2019-07-08 09:02:56 -07:00
A.J. Beamon	dd85edb08c	Merge pull request #1802 from xumengpanda/mengxu/DD-ensure-redundant-team-priority-as700-PR TeamTracker:Set redundant team priority as PRIORITY_TEAM_REDUNDANT	2019-07-08 08:47:28 -07:00
Vishesh Yadav	8d3a826c63	Merge pull request #1804 from alexmiller-apple/cycle-verify-only Add a checkOnly parameter to Cycle workload.	2019-07-05 21:59:52 -07:00
Jingyu Zhou	50e7593c5b	Merge pull request #1796 from ajbeamon/remove-trace-event-underscores Remove trace event underscores	2019-07-05 21:45:55 -07:00
Alex Miller	14e5dd74fe	Add a checkOnly parameter to Cycle workload. So that it can be used in the real world for consistency checking of backup and DR.	2019-07-05 19:09:09 -07:00
Evan Tschannen	310a5fe9a3	fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy	2019-07-05 17:28:22 -07:00
Meng Xu	e8fb7564f5	Merge branch 'master' into mengxu/DD-ensure-redundant-team-priority-as700-PR	2019-07-05 17:28:12 -07:00
Meng Xu	c7a996267c	TeamRemover: Remove unused declaration Also change state variable to variable.	2019-07-05 16:54:06 -07:00
Evan Tschannen	15e894c724	Merge in master	2019-07-05 15:49:24 -07:00
Evan Tschannen	e7c0ecf729	fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy	2019-07-05 15:46:16 -07:00
Meng Xu	46d28a3b79	TeamTracker:Set redundant team priority as redundant The redundant team removed by teamRemover will not exist in the global teams data structure. So we will not find the redundant team from shard-to-team mapping in the system key. Before this change, teamTracker marks such team as PRIORITY_TEAM_UNHEALTHY. With this change, it marks it as PRIORITY_TEAM_REDUNDANT	2019-07-05 15:24:00 -07:00
A.J. Beamon	4be08d9b2d	Rename datacenter_version_difference to datacenter_lag and include both seconds and versions.	2019-07-05 14:36:18 -07:00
A.J. Beamon	2a56e011ea	Merge branch 'release-6.1' into merge-release-6.1-into-master # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbserver/DataDistribution.actor.cpp	2019-07-05 13:52:29 -07:00
Meng Xu	7ba6cd2d9d	ServerTeamRemover:Reduce the overshot server team number to build Each server has the maximum of DESIRED_TEAMS_PER_SERVER and (DESIRED_TEAMS_PER_SERVER * storageTeamSize) / 2)	2019-07-05 11:01:50 -07:00
A.J. Beamon	2a709ee5d0	Rename event details that use the suffix "Number" to indicate a count, as number could also imply an index. Rename a few other trace events and details that e.g. needed to be pluralized.	2019-07-05 08:54:21 -07:00
A.J. Beamon	9f4b6fd770	Remove additional underscores	2019-07-05 08:12:25 -07:00
A.J. Beamon	a3ac9c7eea	Remove underscores from some trace event names	2019-07-05 08:08:29 -07:00
Alex Miller	ea6898144d	Merge remote-tracking branch 'upstream/master' into flowlock-api	2019-07-03 20:44:15 -07:00
Evan Tschannen	23ecc17075	Merge pull request #1755 from senthil-ram/recoveryFix sev40 if knownCommittedVersion > recoveryVersion	2019-07-03 16:39:16 -07:00
Evan Tschannen	e153571a50	Merge pull request #1775 from alexmiller-apple/crc32c-memory-storage Memory storage engine to use crc32c DiskQueue by default (in 6.2).	2019-07-03 16:37:42 -07:00
Evan Tschannen	79a90d33a7	fix: the push location for txs tags needs to be based on what the tag will become after changing the number of txs tags	2019-07-03 16:06:54 -07:00
A.J. Beamon	8c10d832a1	Add coordinator role in trace events	2019-07-03 11:09:36 -07:00
Meng Xu	2782d432ac	ServerTeamRemover:Update the desired number and pick unhealthy teams first	2019-07-02 22:17:53 -07:00
Meng Xu	599fcb2e6d	Add serverTeamRemover to remove redundant server teams	2019-07-02 17:40:37 -07:00
Meng Xu	716494ed9f	ConsistencyCheck:Check serverTeamNumber larger than desired number	2019-07-02 17:40:37 -07:00
Meng Xu	7461c87ae6	AddTeamsBestOf: Build more teams than desired We build more teams than we finally want so that we can use serverTeamRemover() actor to remove the teams whose member belong to too many teams. This allows us to get a more balanced number of teams per server.	2019-07-02 17:40:37 -07:00
Evan Tschannen	8afab93e29	Merge pull request #1782 from etschannen/master revert storage server priority changes	2019-07-02 17:25:31 -07:00
Evan Tschannen	3fb0999e10	revert storage server priority changes	2019-07-02 16:54:47 -07:00
Evan Tschannen	86b0224347	Merge branch 'release-6.1' of github.com:apple/foundationdb into release-6.1	2019-07-02 16:27:31 -07:00
Evan Tschannen	64e33bb4f9	added logging for maintenance mode	2019-07-02 16:25:29 -07:00
Stephen Atherton	71ba490cf8	Removed use of the C "struct hack" as it is not valid C++. Replaced zero-length members with functions returning a pointer for arrays or a reference for single members.	2019-07-02 16:02:58 -07:00
dyoungworth	817fce080b	Fix minor bug in External Workload	2019-07-02 15:57:26 -07:00
Meng Xu	7afbd10a10	Change teamRemover to machineTeamRemover	2019-07-02 15:16:34 -07:00
Meng Xu	d2d6022ed4	StorageServerTracker:Do not always set doBuildTeams When interface changes, we set doBuildTeams to true only when the interface location changes.	2019-07-02 14:24:26 -07:00
Meng Xu	de5bcaf588	minTeamNumber for server and machine cannot be uint64_t Because the consistency check will try to conver the value to int64_t. If no server exists, the variable will not be updated and thus get overflowed when it is converted to int64_t	2019-07-01 21:39:18 -07:00
Evan Tschannen	841e61ac25	fixed a broken promise in localRatekeeper	2019-07-01 16:56:35 -07:00
Meng Xu	347a7ecdff	MachineTeams:Make traceTeamCollectionInfo not an actor	2019-07-01 16:50:53 -07:00
mengranwo	e54eedf0e2	Address pr comments, remove wait(tr.commit()) for read-only txn	2019-07-01 16:09:51 -07:00
mengranwo	0ad151e70a	style formatting	2019-07-01 16:09:51 -07:00
mengranwo	819b6e3d6d	fix compiling error	2019-07-01 16:09:51 -07:00
mengranwo	c7148bbb14	address cr comments:	2019-07-01 16:09:51 -07:00
mengranwo	d96cdacdd5	fix format issue	2019-07-01 16:09:51 -07:00
mengranwo	11161746f8	add try catch block around tx.onerror()	2019-07-01 16:09:51 -07:00
mengranwo	6b61b0e030	fix syntax error, pass compile	2019-07-01 16:09:51 -07:00
mengranwo	0b9cd18fb4	checking cluster is healthy or not during recovery process(for storage engine), if healthy, delete data files and join as new	2019-07-01 16:09:51 -07:00
Jingyu Zhou	b69d7adabc	Remove unused remoteRecovered from master server	2019-07-01 15:41:35 -07:00
Alex Miller	23de5b64ad	Memory storage engine to use crc32c DiskQueue by default (in 6.2).	2019-07-01 13:38:06 -07:00
Meng Xu	b8cb883040	AddBestMachineTeams:Fix input must be non-negative value	2019-06-28 22:46:16 -07:00
Evan Tschannen	4e45a58750	fix: forced recovery did not copy the number of txsTags properly	2019-06-28 20:51:16 -07:00
Evan Tschannen	2c40c818cf	fix: txsTags was not copied into oldLogData	2019-06-28 17:51:16 -07:00
Alex Miller	8e1ab6e7db	Merge remote-tracking branch 'upstream/master' into flowlock-api	2019-06-28 17:32:54 -07:00
Evan Tschannen	5041ff38b1	removed unneeded description	2019-06-28 16:54:22 -07:00
Evan Tschannen	a124fc6e8a	fixed compiler error	2019-06-28 16:54:22 -07:00
Evan Tschannen	b9a6271375	local ratekeeper no longer globally limits	2019-06-28 16:54:22 -07:00
Evan Tschannen	4cef1d3937	Experimental change of storage write priority	2019-06-28 16:54:22 -07:00
Evan Tschannen	f539b5f09a	fix: a large targetRateRatio means limiting more	2019-06-28 16:54:22 -07:00
Evan Tschannen	18d5fbf1e0	Avoid jumping from rejecting 0% of requests directly to 20% of requests	2019-06-28 16:54:22 -07:00
Evan Tschannen	db413c37f7	restored the STORAGE_DURABILITY_LAG_SOFT_MAX knob and made the rk target slightly smaller than the soft limit, to avoid inaccuracies in ratekeeper control causing behavior changes on the storage servers	2019-06-28 16:54:22 -07:00
Evan Tschannen	ec16688db1	fixed the local ratekeeper workload to match the logic on the storage server	2019-06-28 16:54:22 -07:00
Evan Tschannen	a97940a10b	fixed compiler error	2019-06-28 16:54:22 -07:00
Evan Tschannen	92b32855ca	ratekeeper’s control algorithm would oscillate when limited by local ratekeeper	2019-06-28 16:54:22 -07:00
Evan Tschannen	1b939d5208	Merge pull request #1749 from satherton/feature-redwood Update redwood storage engine to latest correctness-passing version	2019-06-28 16:22:06 -07:00
Meng Xu	63c42533eb	TaceTeamCollectionInfo:Remove delay	2019-06-28 16:19:58 -07:00
Meng Xu	875cb877ac	TeamCollection: Apply clang-format	2019-06-28 16:01:05 -07:00
Meng Xu	0baae134f6	TeamCollectionInfo: Resolve review comments	2019-06-28 15:59:47 -07:00
Evan Tschannen	cfce1e1705	fix: buffered peek cursor would advance very slowly through large ranges of empty versions	2019-06-28 15:54:08 -07:00
Evan Tschannen	7f4586ad49	the number of txsTags needs to be tracked separately from the number of transaction logs because of forced recoveries	2019-06-28 12:33:24 -07:00
Meng Xu	cb681693df	TeamCollection:Do NOT consider healthyness in counting team number If a team is removed from DD, it will be marked as failed and eventually removed from the global teams data structure. Team healthyness is likely to be a temporary state which can be changed rather quickly.	2019-06-28 09:50:43 -07:00
Evan Tschannen	2113d6d01e	fix: peek all possible txsTags which could have been used by old log sets	2019-06-27 23:39:19 -07:00
Evan Tschannen	235697f688	fix: txsTags are not popped at the recovery version	2019-06-27 23:18:26 -07:00
Meng Xu	4da345f7d2	TeamCollectionTest:Remove test on minTeamOnServer	2019-06-27 19:05:10 -07:00
Meng Xu	ce7eb10cac	TeamCollectionInfo: Only count team number for healthy server and machine	2019-06-27 19:04:22 -07:00
Meng Xu	f889843332	Change traceTeamCollectionInfo to actor There are cases where traceTeamCollectionInfo was called within the same execution block, i.e., no wait between the two traceTeamCollectionInfo calls. Because simulation uses the same time for all execution instructions in the same execution block, having more than one traceTeamCollectionInfo at the same time will mess up the trackLatest semantics. When one of them is always chosen by simulator, simulation test will report false positive error. Changing this function to actor and adding a small delay inside the function can solve this problem.	2019-06-27 18:24:20 -07:00
Meng Xu	4fe3c7f749	TeamCollectionInfo:Revert to original version where it is	2019-06-27 17:09:21 -07:00
Meng Xu	42620e4831	TeamCollectionTest:GetTeamCollectionValid wait until values are correct	2019-06-27 16:52:36 -07:00
Meng Xu	ee41311a54	TeamCollection:Call addTeamsBestOf when remainingTeamBudget is not 0	2019-06-27 15:29:26 -07:00
Evan Tschannen	52efcfd136	fix: properly create the right number for txsTags when changing between different numbers of logs	2019-06-27 15:15:05 -07:00
Meng Xu	8d5e848808	QuitDatabase test: Check each server has at least 1 team	2019-06-27 14:22:41 -07:00
Meng Xu	2993a96de8	TeamCollectionInfo: Remove debug trace and apply clang format	2019-06-27 14:15:51 -07:00
Meng Xu	5f5c404291	BugFix:ReplicationPolicy always fails when teamSize is 1 Whenever use selectReplicas function, be careful that it may have bugs! This bug is that it always return false (not able to find candidates) when the storage team size is 1. This is wrong because when storage team size is 1, the selectReplicas should return an empty result.	2019-06-27 13:47:49 -07:00
A.J. Beamon	35b6277a50	Fix knob copy paste error	2019-06-27 12:55:39 -07:00
mpilman	7bfda1faaa	Fixed three more Windows issues This is now compiling on my Windows machine	2019-06-27 11:39:36 -07:00
Meng Xu	90c158984c	TeamCollection:Add extra trace events	2019-06-27 11:27:29 -07:00
Meng Xu	aaf97542e9	TeamCollectionTest: Update unit test	2019-06-27 11:27:29 -07:00
Meng Xu	53324e4db7	TeamCollectionInfo: clang format	2019-06-27 11:27:29 -07:00
Meng Xu	cc6a0e9bcd	TeamCollectionTest:Do not enforce minServerTeamOnServer larger than 0 In ConfigureTest, one server may be left with 0 server teams, even if we call buildTeams in the storageServerTracker.	2019-06-27 11:27:29 -07:00
Meng Xu	c23d89c98a	TeamCollection:Only count healthy teams for a server When team collection add new server teams, it picks a team with the least number of teams. We should only consider the healthy teams because the unhealthy ones will not be useful.	2019-06-27 11:27:29 -07:00
Meng Xu	02cdcc0b0c	TeamCollectionTest: Only ensure each server and machine have a team	2019-06-27 11:27:29 -07:00
Meng Xu	e1d459075a	TeamCollection:Count healthy machine teams only Team collection should prioritize to build machine teams for a machine that has the least number of healthy machine teams, instead of just machine teams, because unhealthy machine team will not be able to produce more server teams.	2019-06-27 11:27:29 -07:00
Meng Xu	ee916b337d	TeamCollection:Change the target team number to build When team collection (TC) build server teams and machine teams, it needs to build enough teams such that each server and machine has the DESIRED_TEAMS_PER_SERVER server teams and machine teams. This change calculate the number of teams (server team and machine teams) needed to get each teams for each server and machine.	2019-06-27 11:16:44 -07:00
Meng Xu	21664742a6	TeamCollection:Desired team number may be larger than the max possible team number For example, we have 3 servers for replica factor 3. We can have only 1 team but the desired team number is 3 times 5 equal to 15. Instead of sanity checking the absolute team number per server, we check the difference between the minServerTeamOnServer and maxServerTeamOnServer.	2019-06-27 11:15:06 -07:00
Meng Xu	08f28e99f9	TeamCollection:Test no server or machine has incorrect team number Add test for simulation test which make sure the server team number per server will be no less than the desired_teams_per_server defined in knobs and no larger than the max_teams_per_server. Add similar test for machine teams number per machine as well.	2019-06-27 11:15:06 -07:00
A.J. Beamon	7f23814841	Track run loop busyness and report it in status.	2019-06-26 14:03:02 -07:00
Alex Miller	83fae6cc15	Fix ExternalWorkload not being a part of the old build/test system.	2019-06-25 21:42:35 -07:00
Alex Miller	b5af601a8a	Fix ExternalWorkload not being a part of the old build/test system.	2019-06-25 21:41:43 -07:00
sramamoorthy	0a94f96dee	sev40 if knownCommittedVersion > recoveryVersion	2019-06-25 16:17:45 -07:00
Alex Miller	bf883d7055	Merge remote-tracking branch 'upstream/master' into flowlock-api	2019-06-25 14:26:50 -07:00
Evan Tschannen	0fe6edc254	Merge pull request #1678 from mpilman/features/external-workload Features/external workload	2019-06-25 13:53:19 -07:00
Evan Tschannen	c913aafc1c	Merge pull request #1721 from bnamasivayam/address-comma-separate-list Make public address and listen address a comma separated list	2019-06-25 13:52:16 -07:00
Alex Miller	7a500cd37f	A giant translation of TaskFooPriority -> TaskPriority::Foo This is so that APIs that take priorities don't take ints, which are common and easy to accidentally pass the wrong thing.	2019-06-25 02:47:35 -07:00
Stephen Atherton	f1f1081202	Merge branch 'master' of github.com:apple/foundationdb into feature-redwood # Conflicts: # fdbserver/VersionedBTree.actor.cpp	2019-06-24 20:17:49 -07:00
Evan Tschannen	76ba4e60b7	fixed a stack overflow bug	2019-06-24 13:03:35 -07:00
sramamoorthy	212136d024	SnapTest to handle retries for exec txns	2019-06-24 10:22:42 -07:00
Stephen Atherton	112b0918c9	Refactored set() speed test to produce random sets of consecutive records with random prefixes that will often share common bytes.	2019-06-24 01:05:16 -07:00
Alec Grieser	e8c75505d3	Merge pull request #1725 from jzhou77/db-option Add transaction size option	2019-06-21 08:25:34 -07:00
Balachandar Namasivayam	5ce45a8a2d	Addressed review comments.	2019-06-20 23:03:49 -07:00
Balachandar Namasivayam	7489f83a7f	Disable/Re-enable consistency check through a database key. fdbcli has a new command 'consistencycheck' to disable/re-enable consistency check. cluster_healthy metric in status becomes false if consistencycheck is disabled.	2019-06-20 21:38:45 -07:00
Evan Tschannen	1c005d5878	Merge pull request #1584 from alexmiller-apple/spilled-only-peek Save TLog resources by letting peek request only spilled data.	2019-06-20 18:22:31 -07:00
Alex Miller	26343f557a	Update getMore() contract. MultiCursor already did this.	2019-06-20 17:48:24 -07:00
Evan Tschannen	37c1df2491	Merge pull request #1705 from bnamasivayam/suspend-process Extend RebootRequest API to include time to suspend the process befor…	2019-06-20 17:36:25 -07:00
Evan Tschannen	460af91913	Merge pull request #1727 from alexmiller-apple/dd-failure-time Increase how long FDB will wait before starting DD to repair data loss.	2019-06-20 17:33:16 -07:00
Jingyu Zhou	357c9ba0fb	Refactor code	2019-06-19 20:41:53 -07:00
Evan Tschannen	e0be631414	shard the txs tag so that more transaction logs are involved in its recovery	2019-06-19 18:15:09 -07:00
Alex Miller	df0baa0066	Merge pull request #1720 from mpilman/features/protocol-version Make protocol version a type	2019-06-19 13:46:35 -07:00
Alex Miller	61901effed	Increase how long FDB will wait before starting DD to repair data loss. 10s is a bit short for starting data distribution, which is rather expensive. 60s is a bit more reasonable.	2019-06-19 13:40:21 -07:00
mpilman	ab7562160c	Made JavaWorkload an external workload	2019-06-19 13:03:41 -07:00
mpilman	2eff2b7e21	First simple test is working (but very buggy)	2019-06-19 13:03:41 -07:00
mpilman	1707f068e0	started implementation first c workload	2019-06-19 13:03:41 -07:00
mpilman	c8957d93f8	Implementation code complete	2019-06-19 13:03:41 -07:00
Alex Miller	ce24db3c53	Fully consume parallelPeekMore results before switching back.	2019-06-19 01:30:49 -07:00
Balachandar Namasivayam	4832404c85	Make public address and listen address a comma separated list	2019-06-18 18:15:15 -07:00
mpilman	68ce9a5e75	ProtocolVersion type - second try	2019-06-18 17:55:27 -07:00
Alex Miller	51fd42a4d2	Merge remote-tracking branch 'upstream/master' into spilled-only-peek	2019-06-18 17:33:52 -07:00
Alex Miller	4fa5dc0502	Merge remote-tracking branch 'upstream/master' into cloexec	2019-06-18 16:35:18 -07:00
mpilman	8576665a90	Revert "Revert "Make protocol version a type"" This reverts commit `455bf3b3ec`.	2019-06-18 14:49:04 -07:00
Alex Miller	455bf3b3ec	Revert "Make protocol version a type"	2019-06-18 10:59:17 -07:00
A.J. Beamon	c3aa5819f2	Merge pull request #1417 from mpilman/features/client-buggify Overall framework and first buggify entries	2019-06-18 09:10:11 -07:00
Stephen Atherton	d4b7f9b606	Fixed some cmake, compile, and IDE warnings.	2019-06-17 18:55:49 -07:00
Steve Atherton	ba52623637	Merge pull request #1582 from tclinken/features/sqlite-crc32c Use crc32 for sqlite page checksums	2019-06-17 14:20:41 -07:00
mpilman	da53a92bec	Make protocol version a type This fixes #1214 The basic idea is that ProtocolVersion is now its own type. This alone is an improvement as it makes many things more typesafe. For each version, we can now add breaking features (for example Fearless). After that, there's no need to test against actual (confusing) version numbers. Instead a developer can simply test `protocolVersion->hasFearless()` and this will return true iff the protocolVersion is newer than the newest version that didn't support fearless.	2019-06-16 09:59:15 -07:00
mpilman	6ea75713cb	Overall framework and first buggify entries	2019-06-16 09:09:09 -07:00
Evan Tschannen	20e3edeb0a	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbserver/storageserver.actor.cpp # versions.target	2019-06-14 12:42:59 -07:00
Balachandar Namasivayam	5eb833759e	Extend RebootRequest API to include time to suspend the process before reboot. This is intended to be used for testing purposes to simulate failures.	2019-06-14 11:35:38 -07:00
Evan Tschannen	6ececa94ce	Merge pull request #1640 from vishesh/task/client-failmon Clients will no longer get failure monitoring info from cluster controller	2019-06-13 17:31:17 -07:00
A.J. Beamon	fddcf3486c	Merge pull request #1697 from etschannen/increase_idle_delay Increase idle delay	2019-06-13 16:34:22 -07:00
A.J. Beamon	aad79aae49	Merge pull request #1699 from senthil-ram/boostwindowsmac disable boost::process code for windows and mac	2019-06-13 16:12:40 -07:00
Evan Tschannen	924f92e5aa	Prevent the byte sample recovery from interfering with storage server recovery	2019-06-13 15:55:25 -07:00
sramamoorthy	1d1d42c8af	disable boost::process code for windows and mac	2019-06-13 15:43:03 -07:00
Evan Tschannen	b2a5d4fd0d	Merge branch 'master' into increase_idle_delay	2019-06-13 15:23:18 -07:00
A.J. Beamon	e45c13358e	Merge pull request #1691 from etschannen/master Fixed a number of correctness problems	2019-06-13 15:11:16 -07:00
Evan Tschannen	054d775343	increase the delay between idle commits to reduce the rate idle clusters fsync	2019-06-13 14:55:37 -07:00
Evan Tschannen	55f7e7d372	fix: The delay inside the disabledMap was causing the storage server updateStorage actor to run on the client process	2019-06-13 14:28:30 -07:00
A.J. Beamon	3dd2479193	Try avoiding use of boost in FDBExecHelper	2019-06-13 13:09:29 -07:00
Evan Tschannen	dccb9bc26d	fixed a number of correctness problems	2019-06-12 19:40:50 -07:00
Trevor Clinkenbeard	1e8f7e5b82	Refactor NextFastAllocatedSize to be constexpr function	2019-06-11 15:55:23 -07:00
Trevor Clinkenbeard	cb420ea4bd	Only construct waitDescription in simulator	2019-06-11 12:43:39 -07:00
Trevor Clinkenbeard	8144882d7b	Merge branch 'apple-master' into features/local-rk	2019-06-10 19:40:25 -07:00
Trevor Clinkenbeard	46b77819aa	Fixed LocalRatekeeper test	2019-06-10 18:25:58 -07:00
Vishesh Yadav	a8e408e268	run clang-format on changes	2019-06-10 14:10:24 -07:00
Vishesh Yadav	6fa7081a21	net: Don't make FailureMonitoring requests from client This patch removes the need for clients to continuously contact cluster coordinator for failure monitoring information. Instead, it uses the FlowTransport to monitor the statuses of peers and update FailureMonitor accordingly.	2019-06-09 00:43:38 -07:00
Vishesh Yadav	6b4d30c3ae	failmon: Identify client vs server when starting failure monitoring client	2019-06-09 00:43:12 -07:00
Evan Tschannen	5bdf5aaeb6	Merge pull request #1662 from etschannen/master Merge 6.1 into master	2019-06-06 13:57:34 -07:00
Stephen Atherton	100789b354	More bug fixes in handling upperBound changes in modified pages and worst-case delta size calculation. Normalized some formatting in debug statements. Fixed compile error on linux. Updated test specs.	2019-06-05 20:58:47 -07:00
Trevor Clinkenbeard	8dbb231f33	Don't reject read requests until the storage server durability lag gets large enough	2019-06-05 15:42:58 -07:00
Trevor Clinkenbeard	d1d98f298a	Changed storage server getPenalty calculation. Penalty should always be >= 1.0	2019-06-05 14:14:40 -07:00
chaoguang	877a59fab9	add in fdbserver.vcxproj.filters	2019-06-04 15:58:17 -07:00
Stephen Atherton	6aad34620d	Bug fix in upper boundary selection in commitSubtree(). More debug output.	2019-06-04 04:55:09 -07:00
Stephen Atherton	653440d54c	Changes and bug fixes in how boundary keys are modified during clears in internal pages by rewriting how internal pages are modified, making edge cases much easier to handle. Several debug output improvements. Page numbers stored on disk are now big endian.	2019-06-04 04:03:52 -07:00
Evan Tschannen	29b96414e2	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/NativeAPI.actor.cpp # fdbserver/Coordination.actor.cpp # flow/Arena.h # versions.target	2019-06-03 18:49:35 -07:00
chaoguang	66811b7bd2	update to latest version	2019-06-03 16:49:19 -07:00
chaoguang	3055376b45	remove static keyword to make variables not in binary	2019-06-03 16:40:34 -07:00
Parallels	773f52d0a1	Merge remote-tracking branch 'upstream/master' into cloexec	2019-06-03 15:43:32 -07:00
A.J. Beamon	bb22ee7d37	Merge pull request #1649 from etschannen/feature-coordinator-bug The coordinators did not always converge on the same leader	2019-06-03 15:04:25 -07:00
A.J. Beamon	773bce9e32	Merge pull request #1643 from etschannen/feature-cc-mem-leak Fixed a memory leak on the cluster controller	2019-06-03 15:02:36 -07:00
Meng Xu	dc59f63d0e	TraceEvent:First letter must be capitalized	2019-06-03 13:27:18 -07:00
chaoguang	ac2c0f38b7	remove inheritance from KVWorkload	2019-06-02 23:16:39 -07:00
chaoguang	d07c46e3f3	fix issues by comments	2019-05-31 00:44:07 -07:00
chaoguang	66d25cef21	fix issues by comments	2019-05-31 00:27:30 -07:00
Evan Tschannen	b830fa4c84	fix: A minority of coordinators could continue choosing a candidate which was not the leader	2019-05-30 17:25:20 -07:00
Stephen Atherton	9f064ad7cf	Added back minimal btree internal page boundaries using RedwoodRecordRef.	2019-05-30 02:10:07 -07:00
Stephen Atherton	098ac46af9	RedwoodRecordRef::deltaSize() now calculates actual delta size instead of a conservative estimate.	2019-05-29 18:06:11 -07:00
Stephen Atherton	3e155a2563	Bug fixes.	2019-05-29 17:38:55 -07:00
Evan Tschannen	7c333dbc16	If a process receives a message in its clusterControllerInterface before becoming the cluster controller, if the process does not become the cluster controller in the next minute it should destroy the interface to prevent a memory leak.	2019-05-29 16:57:13 -07:00
Stephen Atherton	cedcfcddd0	Bug fix in RedwoodRecordRef::Delta var int writer, new tests.	2019-05-29 16:47:53 -07:00
Stephen Atherton	1e5b9faa11	Bug fixes in RedwoodRecordRef::Delta.	2019-05-29 16:26:58 -07:00
Evan Tschannen	362c2bf1e6	improved the cpu efficiency of printable	2019-05-29 14:55:45 -07:00
Stephen Atherton	02882dbf00	Checkpointing progress, RedwoodRecordRef and DeltaTree tests pass but BTree test does not. RedwoodRecordRef::Delta rewritten to actually do prefix compression on key and integer fields. Added related unit tests and benchmarks. Some improvements to DeltaTree and requirements on its T and Delta types to avoid repeated common prefix discovery.	2019-05-29 06:23:32 -07:00
sramamoorthy	1190f2f33d	rebased related changes	2019-05-28 22:07:46 -07:00
sramamoorthy	4bcb590f12	g_random -> deterministicRandom()	2019-05-28 22:07:46 -07:00
sramamoorthy	b43c100e57	TLog bug fixes	2019-05-28 22:07:46 -07:00
sramamoorthy	42c551a996	handle isRestoring & BackupFailed not being set restartInfo.in->BackupFailed and isRestoring may not be set in all cases, handle the absence of them.	2019-05-28 22:07:46 -07:00
sramamoorthy	3877f87481	comment change in tLogCommit	2019-05-28 22:07:46 -07:00
sramamoorthy	2a68b28590	rebase related changes	2019-05-28 22:07:46 -07:00
sramamoorthy	b17ad85497	exec op not supported when log_anti_quorum > 0	2019-05-28 22:07:46 -07:00
sramamoorthy	3aa848b8af	minor bug in whitelist binary path testing	2019-05-28 22:07:46 -07:00
sramamoorthy	c906da1f62	simulator: spawnProcess to wait for long duration spawnProcess was waiting for 3 seconds and terminating the child process for synchronous calls, but in the simulator, this can lead to non-determinism, because some cases the command can run in <3 or >3 seconds. The fix is to increase the wait for duration to be very long that it has to synchronously wait and get the results or the test will timeout.	2019-05-28 22:07:46 -07:00
sramamoorthy	31b6c86650	ignorePopDeadline to have high limit in simulator - ignorePopDeadline to have highier limit in simulator to accommdate for the buggify delays and make snapshot succeed. - introduce a new knob for auto resetting the disabling of tlog pop	2019-05-28 22:07:46 -07:00
sramamoorthy	40358e1dd6	limit of getRange in snapTest reduced With CLIENT_KNOBS->TOO_MANY in snapTest, by the time getRange gathers all the results, the storage server's oldest version has gone past the req->version and hence the transaction fails with transaction_too_old	2019-05-28 22:07:46 -07:00
sramamoorthy	b1b96946af	logData->stop check right after execOpHold wait	2019-05-28 22:07:46 -07:00
sramamoorthy	5749e220bd	use FlowLock for implementing critical section Instead of using Promises and future to implement critcal section use FlowLock	2019-05-28 22:07:46 -07:00
sramamoorthy	e6c0b87a4d	remove unused variable	2019-05-28 22:07:46 -07:00
sramamoorthy	b56d8e648f	bp::child->wait_for does not give correct err code boost::process::child->wait_for does not give the error code from the process being run. Re-arrange the code to work-around it.	2019-05-28 22:07:46 -07:00
sramamoorthy	f27a40f118	execProcessingHelper made synchronous tLogCommit exects no blocking between duplicate check and setting of the new version, that constraint was broken when synchronous execProcessingHelper was introduced. As a fix, execProcessingHelper was made asynchronous.	2019-05-28 22:07:46 -07:00
sramamoorthy	ceac68c990	restore - remove emtpy snapdir,snap loop retry fix - remove partially snapped directories to avoid no cluster file assert - snap create to retry max 3 times for not_fully_recovered and keep retrying for the other failures	2019-05-28 22:07:46 -07:00
sramamoorthy	d3a179b6f9	Multiple bug fixes - wait for snapTLogFailKeys in a loop, otherwise in some race condition it can cause a false assert - in single region, there does not seem to be a guarantee of tagLocalityListKey for a given DC ID, avoiding that assert for now - to find the workers that are coordinators, looking up by primary address is not sufficient in some cases, hence looking by both primary and secondary address - test make files to reflect the location of the new test cases	2019-05-28 22:07:46 -07:00

... 2 3 4 5 6 ...

2270 Commits