foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	1022e0a5c6	added yields to the log router and tlogs after processing a version	2018-09-04 17:16:44 -07:00
Evan Tschannen	21f5cf9ce9	suppress spammy trace events	2018-09-04 17:12:26 -07:00
Evan Tschannen	65eabedb6c	fix: addSubsetOfEmergencyTeams could add unhealthy teams optimized teamTracker to check if it satisfies the policy more efficiently added yields to initialization to avoid slow tasks when adding lots of teams	2018-08-31 17:54:55 -07:00
Evan Tschannen	72c86e909e	fix: tracking of the number of unhealthy servers was incorrect fix: locality equality was only checking zoneId	2018-08-31 17:40:27 -07:00
Evan Tschannen	90bf277206	require key value store memory to recover cleanly when recovering the txnStateStore, since all of the data it is recovering has been fsync’ed	2018-08-31 13:07:48 -07:00
Evan Tschannen	1e2ce75ce4	fix: if usable_regions=1 extraTlogEligibleMachines was calculated incorrectly	2018-08-31 13:04:00 -07:00
Evan Tschannen	b0d94597d4	Add additional metrics to track fetch key duration on the storage servers	2018-08-31 13:01:36 -07:00
Evan Tschannen	d8659a5822	fix: bytesWritten would overflow and go negative	2018-08-31 12:46:57 -07:00
Evan Tschannen	6496a6d9c8	fix: start move keys will only move destination servers to become source servers if less than destination servers are healthy and the total number of sources is less than 2x the number of destinations	2018-08-31 12:43:14 -07:00
Evan Tschannen	e60c668853	The cluster controller will increase its failure monitoring delay after there have been many unfinishedRecoveries	2018-08-31 10:51:55 -07:00
Evan Tschannen	84e1f7b2b5	added overhead bytes durable to complement overhead bytes input	2018-08-21 22:35:04 -07:00
Evan Tschannen	74f7412975	added separate logging for overhead bytes	2018-08-21 22:18:38 -07:00
Evan Tschannen	ffde1a0e28	renamed onlySystem to mustContainSystemMutations, to accurately represent what setting the key does	2018-08-21 22:15:45 -07:00
Evan Tschannen	d7c01f0419	added a separate knob for tlog’s recoverMemoryLimit	2018-08-21 21:11:23 -07:00
Evan Tschannen	cb60002944	Added the ability to disable all commits which do not modify the system keys by setting \xff/onlySystem = 1 in the database	2018-08-21 21:09:50 -07:00
Evan Tschannen	a694364a39	fix: teams larger than the storageTeamSize can never become healthy, so we do not need to track them in our data structures. After configuring from usable_regions=2 to usable_regions=1 we will have a lot of these types of teams, leading to performance issues	2018-08-21 21:08:15 -07:00
Evan Tschannen	e770629229	fix: json_spirit::write_string is very CPU intensive, especially for large JSON documents. The cluster controller would call this function for each status reply it needed to send, resulting in a slow task.	2018-08-15 19:39:06 -07:00
Evan Tschannen	883050d12f	moved the creation of the yieldPromiseStream to properly yield moves from initialDataDistribution	2018-08-13 22:29:55 -07:00
Evan Tschannen	f52d841e8a	we need to send notifications when the leader fitness becomes worse so that we repopulate availableCandidates to compare with the new lower fitness	2018-08-13 20:56:02 -07:00
Evan Tschannen	2341e5d8ad	fix: we must yield when updating shardsAffectedByTeamFailure with the initial shards. A test with 1 million shards caused a 22 second slow task	2018-08-13 19:46:47 -07:00
Evan Tschannen	8fc8aa0493	fix: we must notify every time nextNominee is not present to continue to repopulate availableCandidates	2018-08-13 17:59:47 -07:00
Evan Tschannen	aaa90de7d9	merge 5.2 into 6.0	2018-08-13 10:13:03 -07:00
Evan Tschannen	4f9dd10644	fix: as long as some leader was sending heartbeats we would keep the currentNominee as leader, even if that currentNominee was not the one sending the heartbeats	2018-08-10 17:11:24 -07:00
Evan Tschannen	9c918a28f6	fix: status was reporting no replicas remaining when the remote datacenter was initially configured with usable_regions=2	2018-08-09 13:16:09 -07:00
Evan Tschannen	7c5d414f7b	fix: during destruction logData could attempt to dereference tLogData after it has been deleted	2018-08-09 12:38:35 -07:00
Evan Tschannen	6f02ea843a	prevented a slow task when too many shards were sent to the data distribution queue after switching to a fearless deployment	2018-08-09 12:37:46 -07:00
Evan Tschannen	7f7755165c	slowly send notifications to clients to clear the list of dead clients	2018-08-08 17:29:32 -07:00
Evan Tschannen	0ca11aabe6	Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0	2018-08-07 17:23:52 -07:00
Evan Tschannen	3bb8dad431	TooManyNotifications is only sevWanAlways if it happens more than once a day. Status continuously adds to notifications currently, so we expect this to trigger every 4-5 days.	2018-08-07 17:00:43 -07:00
A.J. Beamon	9b1f7408d5	Merge pull request #678 from ajbeamon/use-new-data-lag-fields Fix: use new data lag fields when making storage server message indicating high lag.	2018-08-07 15:42:23 -07:00
A.J. Beamon	7d831ef9c3	Revert change that prints lag with 2 decimal points of precision.	2018-08-07 15:41:51 -07:00
A.J. Beamon	e0cf525951	Fix: use new data lag fields when making storage server message indicating high lag.	2018-08-07 11:02:09 -07:00
Evan Tschannen	6f328d41ac	suppressed spammy trace events	2018-08-06 12:12:55 -07:00
Evan Tschannen	c757c68bfa	fix: nextVersion needs to be set to logData->version if version_sizes is empty	2018-08-04 23:53:37 -07:00
Evan Tschannen	9d0a07a400	fix: trackLatest for master recovery state was wrong, causing status to report incorrect recovery states	2018-08-04 12:50:56 -07:00
Evan Tschannen	fec285146c	significant cpu optimization in update storage	2018-08-04 12:36:48 -07:00
Evan Tschannen	be1a4d74c7	tlogs serve reads to log routers at a low priority, to prevent them from using all their resources catching up a remote dc that has been down for a long time increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget	2018-08-04 10:31:30 -07:00
Evan Tschannen	71f89f372f	changed a trace event name to avoid scope type mismatch on the tag field	2018-08-03 15:53:38 -07:00
Evan Tschannen	2619234477	Merge branch 'release-5.2' into release-6.0 # Conflicts: # documentation/sphinx/source/release-notes.rst	2018-08-03 11:40:24 -07:00
Evan Tschannen	501033c5af	fix: tlog spilling on a stopped log was only making one version durable at a time	2018-08-03 11:38:12 -07:00
Evan Tschannen	1c29275672	call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.	2018-08-01 14:30:57 -07:00
Evan Tschannen	57f121481c	reverted killing processes because of io_error, we should fix the problem in a better way in the future	2018-07-16 15:09:07 -07:00
Evan Tschannen	f72a9f60c0	only disable fearless if a datacenter has actually been killed fix: we must prevent recovery into the dead datacenter while reducing usable_regions	2018-07-16 10:06:57 -07:00
Evan Tschannen	30b2f85020	fix: it is not safe to drop logs supporting the current primary datacenter, because configuring usable_regions down will drop the storage servers in the remote region, leaving you will no remaining logs	2018-07-14 16:26:45 -07:00
Evan Tschannen	0f59dc4086	fix: do not write to the persistent queue when we are terminated, which could happen if shutdown was caused by setting a promise in the asyncPullData loop	2018-07-13 17:01:31 -07:00
Evan Tschannen	10ae883a68	changed the location of a yield	2018-07-12 17:59:12 -07:00
Evan Tschannen	4fedd05506	added more yields to avoid slow tasks	2018-07-12 17:47:35 -07:00
Evan Tschannen	d47aae27f3	added a yield to getMore()	2018-07-12 16:27:27 -07:00
Evan Tschannen	392c73affb	fixed a few slow tasks	2018-07-12 14:06:59 -07:00
Evan Tschannen	d12dac60ec	fix: the same team was being added multiple times to primaryTeams	2018-07-12 12:10:18 -07:00

1 2 3 4 5 ...

826 Commits