foundationdb

Commit Graph

Author	SHA1	Message	Date
A.J. Beamon	41605735f5	Merge pull request #1916 from ajbeamon/merge-onto-new-servers Add knob to control whether merges request new servers or not.	2019-07-30 15:04:37 -07:00
A.J. Beamon	bc536757df	Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space.	2019-07-29 15:47:34 -07:00
Evan Tschannen	d8b14fe372	we cannot buggify replace content bytes because it takes too long to recovery when the txnStateStore is too large	2019-07-28 19:34:17 -07:00
Evan Tschannen	5c98dcce6d	revert the proxy forwarding path, because it is no longer necessary as clients keep a persistent connection open with coordinators	2019-07-27 16:46:22 -07:00
Evan Tschannen	b509a441e7	Merge branch 'master' into feature-skip-confirm # Conflicts: # bindings/flow/tester/Tester.actor.cpp # bindings/go/src/_stacktester/stacktester.go # bindings/java/src/test/com/apple/foundationdb/test/AsyncStackTester.java # bindings/java/src/test/com/apple/foundationdb/test/StackTester.java # bindings/python/tests/tester.py # bindings/ruby/tests/tester.rb # documentation/sphinx/source/api-c.rst # documentation/sphinx/source/api-python.rst # documentation/sphinx/source/api-ruby.rst # documentation/sphinx/source/data-modeling.rst # documentation/sphinx/source/developer-guide.rst # fdbclient/vexillographer/fdb.options # fdbserver/MasterProxyServer.actor.cpp	2019-07-27 15:08:13 -07:00
Evan Tschannen	ee94e8a062	removed a trace event which was causing valgrind errors	2019-07-27 13:51:59 -07:00
Evan Tschannen	90e3b50213	Merge branch 'master' into feature-coordinator-connection # Conflicts: # fdbclient/DatabaseContext.h # fdbclient/NativeAPI.actor.cpp # fdbclient/NativeAPI.actor.h # fdbserver/workloads/KillRegion.actor.cpp	2019-07-26 15:05:02 -07:00
Evan Tschannen	ee92f0574f	fix: lastRequestTime was not updated fix: COORDINATOR_REGISTER_INTERVAL was not set fixed review comments	2019-07-26 13:23:56 -07:00
sramamoorthy	a65c9f92ed	get rid of all timeouts and other changes	2019-07-24 15:36:28 -07:00
sramamoorthy	7e04e3c8be	snap v2: knobs for max snap create timeout	2019-07-24 15:36:28 -07:00
Evan Tschannen	c70e762f0e	Merge pull request #1785 from xumengpanda/mengxu/server-team-remover-PR Remove redundant server teams	2019-07-19 17:44:16 -07:00
Meng Xu	b001a9ebe8	ServerTeamRemover runs after machineTeamRemover finishes If serverTeamRemover removes a team before machineTeamRemover brings the machine team number down to the desired number, DD may create a new team (due to teams removed by serverTeamRemover), which may be removed later by machineTeamRemover. This causes unnnecessary extra data movement.	2019-07-19 16:48:52 -07:00
Evan Tschannen	846038b0e6	Merge pull request #1858 from bnamasivayam/rk-ssfetch-throttle Ratekeeper throttling aggressively when unable to fetch storage server list	2019-07-19 16:41:58 -07:00
Alex Miller	c3a8ae4752	Merge pull request #1791 from fzhjon/fetch-keys-requests-priority Introduce priority to fetchKeys requests from data distribution	2019-07-19 14:54:51 -07:00
Balachandar Namasivayam	ecb3de3b49	Fixed space issue.	2019-07-17 18:10:05 -07:00
Balachandar Namasivayam	406bcebdc4	Ratekeeper to throttle tpsLimit to 1 if it is not able to fetch storage server list for some configurable amount of time.	2019-07-17 18:08:17 -07:00
Meng Xu	20f067e794	Merge with master:Resolve conflict with PR#1797	2019-07-16 10:52:28 -07:00
Meng Xu	415622f465	MachineTeamRemover:Change to remove MT with most teams Change to remove machine team with most machine teams, using the same logic as the serverTeamRemover. The featue is guarded by TR_FLAG_REMOVE_MT_WITH_MOST_TEAMS knob.	2019-07-15 14:29:49 -07:00
Evan Tschannen	db5b4a6331	avoid going to unlimited immediately after going below the durabilityLagTargetVersion	2019-07-12 18:50:56 -07:00
Evan Tschannen	1a18c859c7	knobified the durability lag rate controls	2019-07-12 18:50:56 -07:00
Evan Tschannen	02de53160d	only skip confirm epoch live if CAUSAL_READ_RISKY is enabled time checked on the proxy should be less than the time waited by the master to account for clock speed differences setting REQUIRED_MIN_RECOVERY_DURATION and ENFORCED_MIN_RECOVERY_DURATION to 0 will go back to the old behavior	2019-07-12 17:58:16 -07:00
Evan Tschannen	a63969afb3	enforce a minimum recovery duration, which allows proxies to avoid checking if the epoch is alive as long as its last commit has been less than MINIMUM_RECOVERY_DURATION ago	2019-07-12 13:10:21 -07:00
Jon Fu	f12a3909f3	renamed workloads and made code style adjustments	2019-07-11 09:56:58 -07:00
Jon Fu	1e9d31597c	removed extra parameter from getRange, added knob to guard new changes, and adjusted style/formatting in several places	2019-07-11 09:56:58 -07:00
Evan Tschannen	7e919e361c	Merge pull request #1817 from etschannen/feature-proxy-forward Proxies will forward clients to the next generation	2019-07-10 13:53:12 -07:00
Evan Tschannen	49121172ea	Merge pull request #1795 from alexmiller-apple/peek-from-satellites Log Routers will prefer to peek from satellite logs.	2019-07-09 17:38:57 -07:00
Evan Tschannen	001abec29d	fixed a compiler error, buggified a new knob	2019-07-09 16:50:59 -07:00
Evan Tschannen	64aee73c4f	we only need to hold the ReplyPromise for messages that we are going to forward to new proxies	2019-07-09 16:47:56 -07:00
Alex Miller	44f11702a8	Log Routers will prefer to peek from satellite logs. Formerly, they would prefer to peek from the primary's logs. Testing of a failed region rejoining the cluster revealed that this becomes quite a strain on the primary logs when extremely large volumes of peek requests are coming from the Log Routers. It happens that we have satellites that contain the same mutations with Log Router tags, that have no other peeking load, so we can prefer to use the satellite to peek rather than the primary to distribute load across TLogs better. Unfortunately, this revealed a latent bug in how tagged mutations in the KnownCommittedVersion->RecoveryVersion gap were copied across generations when the number of log router tags were decreased. Satellite TLogs would be assigned log router tags using the team-building based logic in getPushLocations(), whereas TLogs would internally re-index tags according to tag.id%logRouterTags. This mismatch would mean that we could have: Log0 -2:0 ----- -2:0 Log 0 Log1 -2:1 \ >--- -2:1,-2:0 (-2:2 mod 2 becomes -2:0) Log 1 Log2 -2:2 / And now we have data that's tagged as -2:0 on a TLog that's not the preferred location for -2:0, and therefore a BestLocationOnly cursor would miss the mutations. This was never noticed before, as we never used a satellite as a preferred location to peek from. Merge cursors always peek from all locations, and thus a peek for -2:0 that needed data from the satellites would have gone to both TLogs and merged the results. We now take this mod-based re-indexing into account when assigning which TLogs need to recover which tags from the previous generation, to make sure that tag.id%logRouterTags always results in the assigned TLog being the preferred location. Unfortunately, previously existing will potentially have existing satellites with log router tags indexed incorrectly, so this transition needs to be gated on a `log_version` transition. Old LogSets will have an old LogVersion, and we won't prefer the sattelite for peeking. Log Sets post-6.2 (opt-in) or post-6.3 (default) will be indexed correctly, and therefore we can safely offload peeking onto the satellites.	2019-07-08 22:25:01 -07:00
Meng Xu	3b9618fe11	ServerTeamRemover:Speedup removing teams in simulation Otherwise, simulation may time out when team remover needs to remove hundreds of teams.	2019-07-08 18:17:21 -07:00
Meng Xu	08a721b320	Merge branch 'master' into mengxu/server-team-remover-PR	2019-07-08 16:30:32 -07:00
Evan Tschannen	c348b3da51	After a proxy dies, it will remain alive for an additional 10 seconds to forward clients to the new proxies	2019-07-08 12:53:40 -07:00
Evan Tschannen	310a5fe9a3	fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy	2019-07-05 17:28:22 -07:00
Evan Tschannen	e7c0ecf729	fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy	2019-07-05 15:46:16 -07:00
Meng Xu	599fcb2e6d	Add serverTeamRemover to remove redundant server teams	2019-07-02 17:40:37 -07:00
Evan Tschannen	b9a6271375	local ratekeeper no longer globally limits	2019-06-28 16:54:22 -07:00
Evan Tschannen	18d5fbf1e0	Avoid jumping from rejecting 0% of requests directly to 20% of requests	2019-06-28 16:54:22 -07:00
Evan Tschannen	db413c37f7	restored the STORAGE_DURABILITY_LAG_SOFT_MAX knob and made the rk target slightly smaller than the soft limit, to avoid inaccuracies in ratekeeper control causing behavior changes on the storage servers	2019-06-28 16:54:22 -07:00
Evan Tschannen	92b32855ca	ratekeeper’s control algorithm would oscillate when limited by local ratekeeper	2019-06-28 16:54:22 -07:00
A.J. Beamon	35b6277a50	Fix knob copy paste error	2019-06-27 12:55:39 -07:00
Alex Miller	61901effed	Increase how long FDB will wait before starting DD to repair data loss. 10s is a bit short for starting data distribution, which is rather expensive. 60s is a bit more reasonable.	2019-06-19 13:40:21 -07:00
Evan Tschannen	20e3edeb0a	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbserver/storageserver.actor.cpp # versions.target	2019-06-14 12:42:59 -07:00
Evan Tschannen	924f92e5aa	Prevent the byte sample recovery from interfering with storage server recovery	2019-06-13 15:55:25 -07:00
Evan Tschannen	054d775343	increase the delay between idle commits to reduce the rate idle clusters fsync	2019-06-13 14:55:37 -07:00
Trevor Clinkenbeard	8144882d7b	Merge branch 'apple-master' into features/local-rk	2019-06-10 19:40:25 -07:00
Evan Tschannen	29b96414e2	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/NativeAPI.actor.cpp # fdbserver/Coordination.actor.cpp # flow/Arena.h # versions.target	2019-06-03 18:49:35 -07:00
Evan Tschannen	7c333dbc16	If a process receives a message in its clusterControllerInterface before becoming the cluster controller, if the process does not become the cluster controller in the next minute it should destroy the interface to prevent a memory leak.	2019-05-29 16:57:13 -07:00
sramamoorthy	31b6c86650	ignorePopDeadline to have high limit in simulator - ignorePopDeadline to have highier limit in simulator to accommdate for the buggify delays and make snapshot succeed. - introduce a new knob for auto resetting the disabling of tlog pop	2019-05-28 22:07:46 -07:00
A.J. Beamon	603721e125	Merge branch 'master' into thread-safe-random-number-generation # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbrpc/AsyncFileCached.actor.h # fdbrpc/genericactors.actor.cpp # fdbrpc/sim2.actor.cpp # fdbserver/DiskQueue.actor.cpp # fdbserver/workloads/BulkSetup.actor.h # flow/ActorCollection.actor.cpp # flow/Net2.actor.cpp # flow/Trace.cpp # flow/flow.cpp	2019-05-23 08:35:47 -07:00
Evan Tschannen	8c3516951a	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-05-12 20:13:49 -07:00

1 2 3 4

189 Commits