foundationdb

Commit Graph

Author	SHA1	Message	Date
A.J. Beamon	603721e125	Merge branch 'master' into thread-safe-random-number-generation # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbrpc/AsyncFileCached.actor.h # fdbrpc/genericactors.actor.cpp # fdbrpc/sim2.actor.cpp # fdbserver/DiskQueue.actor.cpp # fdbserver/workloads/BulkSetup.actor.h # flow/ActorCollection.actor.cpp # flow/Net2.actor.cpp # flow/Trace.cpp # flow/flow.cpp	2019-05-23 08:35:47 -07:00
Evan Tschannen	8c3516951a	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-05-12 20:13:49 -07:00
Alex Miller	ea12a54946	Rename DISK_QUEUE_MAX_TRUNCATE_EXTENTS -> ..._BYTES So as to not make filesystem assumptions. This knob did technically appear in (only the) 6.1.5 release, but this feature was broken 6.1.5, so thus impossible to use anyway.	2019-05-10 18:26:22 -10:00
A.J. Beamon	5f55f3f613	Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.	2019-05-10 14:01:52 -07:00
Evan Tschannen	22499666d0	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbserver/LogRouter.actor.cpp # flow/Trace.cpp # versions.target	2019-05-08 18:19:35 -07:00
Alex Miller	0685e6c1c7	Avoid large truncates in the DiskQueue. And instead create a new file while incrementally truncating the old one down. This avoids queueing up a massive number of filesystem metadata operations in one call, thus flooding the disk with requests and stalling out all other filesystem operations. This sets the knobs so that a truncate of >10GB causes us to create a new file rather than trying to truncate the old one.	2019-05-08 12:33:31 -10:00
Alex Miller	4052f3826a	Add a knob to limit the number of commits indexed per key. Theoretically, we could spill 20MB of 22B mutations for one key, which would generate a very long value being stored in SQLite, and very inefficiently read back. This stops that from being a problem, at the cost of some extra write calls.	2019-05-03 15:27:10 -07:00
Alex Miller	f4e48c3851	Add a knob to limit amount of data read from sqlite for one PeekRequest. This prevents peeking from degrading over time if there are a very large number of SpilledData entries for one particular tag.	2019-05-02 17:26:45 -07:00
Evan Tschannen	2d5043c665	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-04-30 18:27:04 -07:00
Evan Tschannen	1a4c1759a4	Merge pull request #1429 from jzhou77/pprof Dump heap profiler when memory usage is high	2019-04-29 16:31:44 -07:00
Evan Tschannen	cacd82758e	Reduced data distribution speeds	2019-04-26 13:54:49 -07:00
Evan Tschannen	9ff8aca1da	Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation)	2019-04-26 13:53:56 -07:00
A.J. Beamon	253d2400ef	Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning # Conflicts: # documentation/sphinx/source/release-notes.rst	2019-04-23 14:38:52 -07:00
A.J. Beamon	4ad0496b39	Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process.	2019-04-23 14:01:51 -07:00
Stephen Atherton	83db547306	Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES.	2019-04-23 04:50:58 -07:00
Jingyu Zhou	6870e132b2	Merge branch 'master' into pprof	2019-04-19 14:06:44 -07:00
Andrew Noyes	d1e86779a6	Address review comments	2019-04-18 08:48:27 -07:00
mpilman	32393ec4c9	Prototype of local ratekeeper	2019-04-08 11:04:44 -07:00
Evan Tschannen	05869a8383	do not log a degraded reset message if the previous reset was more than a week ago	2019-04-07 23:00:58 -07:00
Jingyu Zhou	4b08042a88	Change memory profiling threshold to a flag	2019-04-05 16:33:51 -07:00
Jingyu Zhou	09b2c35d11	Dump heap profiler when memory usage is high Set the threshold of dump to 2GB.	2019-04-05 16:12:23 -07:00
Evan Tschannen	390ab9cfed	A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy	2019-04-04 14:11:12 -07:00
A.J. Beamon	71e2fdafb8	Changes to ratekeeper camel case	2019-03-27 08:24:25 -07:00
Evan Tschannen	6254a1a8e4	fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop	2019-03-22 18:37:39 -07:00
A.J. Beamon	2d7b48dadc	Merge pull request #1311 from etschannen/feature-increase-grv-batch Increased the GRV client batch size	2019-03-19 08:23:05 -07:00
Evan Tschannen	2554fed965	reduce max transaction to start	2019-03-18 16:16:03 -07:00
Evan Tschannen	87e2a1a029	The proxy budget is implemented to let one request over its limit through, and then pay back what was over the limit in the next update	2019-03-18 16:09:57 -07:00
Alex Miller	29ab7370cd	Clear versionLocation when spilling, and pop DQ separately. Popping the disk queue now requires potentially recovering the location to which we can pop from the spilled data itself, and for each tag we must maintain the first location with relevant data. The previous queue we had to represent the ordering, queueOrder, was used by spilling, and popped when a TLog had been spilled. This means that as soon as a TLog has been fully spilled, we have no idea how it relates in order to other fully spilled TLogs. Instead, use queueOrder to keep track of all the TLog UIDs until they're removed, and use spillOrder to keep track of the order only for spilling.	2019-03-18 15:09:22 -07:00
Evan Tschannen	ec6c843124	increased the GRV client batch size, similarly increased the proxy limits related to the number of transactions started in a batch	2019-03-16 16:18:58 -07:00
Evan Tschannen	e068c478b5	merge master	2019-03-12 18:31:25 -07:00
Evan Tschannen	c6e94293bf	reset a process to not be degraded after 2 days	2019-03-10 22:39:21 -07:00
Evan Tschannen	53f16b5347	when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded	2019-03-08 11:46:34 -05:00
Jingyu Zhou	3c86643822	Separate Ratekeeper from data distribution. Add a new role for ratekeeper. Remove StorageServerChanges from data distribution. Ratekeeper monitors storage servers, which borrows the idea from DataDistribution.	2019-03-07 13:16:20 -08:00
Alex Miller	94bf75cb00	Allow the disk queue to shrink if it has unneeded slack space.	2019-03-04 01:42:38 -08:00
Alex Miller	9ef283d4e7	Implement hard limiting of memory used to serve peek requests.	2019-03-04 01:42:38 -08:00
Alex Miller	e7d8520c63	Batch more when spilling data.	2019-03-04 01:42:38 -08:00
Trevor Clinkenbeard	39f612d132	Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics	2019-03-02 17:07:00 -08:00
A.J. Beamon	a051055caf	Initial implementation of adding separate limits for batch priority in ratekeeper	2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard	abfe057805	Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics	2019-02-25 13:47:16 -08:00
Evan Tschannen	b8910ba7cd	Merge branch 'master' into feature-fix-force-recovery # Conflicts: # fdbclient/ManagementAPI.actor.h # fdbserver/DataDistribution.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/KillRegion.actor.cpp	2019-02-22 14:38:13 -08:00
Meng Xu	9445ac0b0c	Status: Use new data distributor worker to publish status After we add a new data distributor role, we publish the data related to data distributor and rate keeper through the new role (and new worker). So the status needs to contact the data distributor, instead of master, to get the status information.	2019-02-21 18:05:50 -08:00
Meng Xu	7cca439e00	TeamRemover: Add status to show redundant team removing Distinguish the removal of unhealthy team and redundant team. Change status report to include redundant team removal report.	2019-02-21 14:16:46 -08:00
Trevor Clinkenbeard	fa96b8dd33	Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics	2019-02-20 16:56:16 -08:00
Meng Xu	d86ba0e811	TeamRemover: Change it to run periodically This simplifies the problem of when we should invoke the teamRemover	2019-02-20 16:08:34 -08:00
Evan Tschannen	27e3617548	fix: remove bad teams needed to use dd_stall_check delay, because in simulation the buggified delay time could make us remove bad teams before they submit their ranges to the queue	2019-02-20 14:18:36 -08:00
Evan Tschannen	d4737fac0f	knobify force recovery recovery check delay	2019-02-19 16:05:20 -08:00
Evan Tschannen	065a45e05f	Merge branch 'master' into feature-fix-force-recovery # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/workloads/KillRegion.actor.cpp	2019-02-18 17:09:06 -08:00
Evan Tschannen	d492395f84	fix: simulation could buggify a delay such that data distribution incorrectly thinks the queue is not processing unhealthy relocations	2019-02-18 14:57:07 -08:00
Meng Xu	6d09ac483c	Merge with master	2019-02-15 17:03:40 -08:00
Jingyu Zhou	fc3a784963	Fix another build team bug The buildTeam() can create teams with undesired storage servers, which are considered unhealthy. As a result, the data movement can become stuck. Fix this by adding an ACTOR monitorHealthyTeams that builds team every one second whenever there is no healthy teams. Clean up storageServerTracker() interface.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	816f8b1ae1	Per review comments Add a knob for starting distributor delay. Move distributor failed variable to a local loop.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	e0a7162cf8	Add a failure timeout knob for data distributor. Set default time to 1.0s.	2019-02-14 16:37:16 -08:00
Meng Xu	5481851e82	TeamCollection: Add knobs for team remover Added three knobs to control team remover bool TR_FLAG_DISABLE_TEAM_REMOVER: Disable the teamRemover actor double TR_REMOVE_MACHINE_TEAM_DELAY: Wait for the specified time before try to remove next machine team double TR_WAIT_FOR_ALL_MACHINES_HEALTHY_DELAY: Wait before checking if all machines are healthy	2019-02-13 15:11:56 -08:00
Meng Xu	214a72fba3	TeamCollection: Resolve review comments 1) Reduce the frequency of checking if we need to call teamRemover 2) Improve code efficiency in finding the machine team to remove 3) Remove unused code 4) Add sanity check	2019-02-12 10:59:57 -08:00
Meng Xu	3b8ae0fe95	TeamCollection: Add into 6.1 release note	2019-02-08 13:50:27 -08:00
Meng Xu	7cfe6de27e	TeamCollection: Server team number must match machine team number DESIRED_TEAMS_PER_MACHINE must equal to DESIRED_TEAMS_PER_SERVER. Otherwise, we may have to few machine teams to create enough server teams. Note that BUGGIFY macro value is based on a random number generator. When you have two BUGGIFY, one may be true and the other is false. Also fix a bug in get the number of healthy machine teams.	2019-02-07 13:53:55 -08:00
Meng Xu	455024b3fe	SimulationTest: Test the number of teams Magnify the possibility that the number of created machine teams is larger than the number of desired machine teams if we do NOT try to remove the surplus machine teams. This help test the upgrade to machine team in FDB 6.1	2019-02-06 11:04:41 -08:00
Trevor Clinkenbeard	5822bd65bf	Track health metrics in Ratekeeper and send these metrics to proxies in GetRateInfoReply messages	2019-01-31 12:56:58 -08:00
Trevor Clinkenbeard	d7930af2cb	Storage server periodically calculates cpuUsage and diskUsage metrics. These metrics (as well as all other metrics necessary for health metrics calculation) are sent in the StorageQueuingMetricsReply message.	2019-01-31 12:23:04 -08:00
Trevor Clinkenbeard	5b89db811a	Throttle status requests with MAX_STATUS_REQUESTS_PER_SECOND knob, whenever status batching is used.	2019-01-28 15:37:30 -08:00
Evan Tschannen	684a22a52b	Merge branch 'release-6.0' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbbackup/backup.actor.cpp # fdbclient/BackupContainer.actor.cpp # fdbclient/HTTP.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/BackupCorrectness.actor.cpp # versions.target	2019-01-09 16:14:46 -08:00
Evan Tschannen	57293a2db0	byte sample recovery did not use limits for its range reads, leading to slow tasks	2019-01-04 10:32:31 -08:00
Markus Pilman	dbe9baff1f	Several small compilation fixes for new versions of gcc There are several missing includes for cmath in the code, I added those. Next, Coro returns a reference to a stack variable and this causes a warning. As this is probably ok for Coro, I disabled the warning in that file for GCC. I want to have this warning in the build system as it is generally a very useful warning to have. Another change is that major and minor are deprecated for a while now. I replaced those with gnu_dev_major and gnu_dev_minor. ErrorOr currently implements operators ==, !=, and <. These do not compile because Error does not implement ==. This compiles on older versions of gcc and clang because ErrorOr<T>::operator== is not used anywhere. It is still wrong though and newer gcc versions complain. I simply removed these methods. The most interesting fix is that TraceEvent::~TraceEvent is currently throwing exceptions. This is illegal behavior in C++11 and a idea in older versions of C++. For now I simply removed the throw, but this might need some more thought.	2019-01-03 12:44:19 -08:00
Evan Tschannen	4e54690005	Merge branch 'release-6.0' # Conflicts: # fdbserver/DataDistribution.actor.cpp # fdbserver/MoveKeys.actor.cpp	2018-11-12 20:26:58 -08:00
Evan Tschannen	3f3a562f75	updated resolution balancing knobs to be a little more aggressive	2018-11-12 19:11:28 -08:00
Evan Tschannen	3f39024640	buggify resolution balancing so that it still happens in simulation	2018-11-12 00:03:07 -08:00
Evan Tschannen	536ee826da	tuned resolver balancing to keep the resolvers within 5MB per second of each other	2018-11-11 23:42:45 -08:00
Robert Escriva	268093a96d	Adjust all includes to be relative to the root. Remove the use of relative paths. A header at foo/bar.h could be included by files under foo/ with "bar.h", but would be included everywhere else as "foo/bar.h". Adjust so that every include references such a header with the latter form. Signed-off-by: Robert Escriva <rescriva@dropbox.com>	2018-10-19 17:35:33 +00:00
Evan Tschannen	db71b60d72	Merge pull request #819 from satherton/feature-redwood Redwood storage engine, initial/experimental version	2018-10-18 18:38:11 -07:00
Evan Tschannen	0613a34845	The storage server would block the main thread when processing a single version with a large amount of data	2018-10-18 13:37:31 -07:00
Stephen Atherton	7c1dc305cb	Merge commit 'a72c8f5cb2e79a673abc0ed3d27ef1c51028fb13' into feature-redwood	2018-10-05 10:15:10 -07:00
Evan Tschannen	636420abee	fix: if the disk queue adapter peek hangs for a while, switch to a peek from a different locality	2018-10-03 13:58:55 -07:00
Evan Tschannen	28545e0f8d	multi cursors start a get more for the first 10 cursors to hide latency	2018-10-03 13:57:45 -07:00
Evan Tschannen	a92fc911ac	do not spin on a failed storage server recruitment	2018-10-02 17:31:07 -07:00
Stephen Atherton	2fc86c5ff3	Merge branch 'master' of github.com:apple/foundationdb into feature-redwood # Conflicts: # fdbrpc/AsyncFileCached.actor.h # fdbserver/IKeyValueStore.h # fdbserver/KeyValueStoreMemory.actor.cpp # fdbserver/workloads/StatusWorkload.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-09-20 03:39:55 -07:00
Evan Tschannen	200e65fe61	added a workload which tests killing an entire region, and recovering from the failure with data loss. fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore fix: we have to modify all of history, we cannot stop after finding a local remote	2018-09-17 18:32:39 -07:00
Evan Tschannen	6496a6d9c8	fix: start move keys will only move destination servers to become source servers if less than destination servers are healthy and the total number of sources is less than 2x the number of destinations	2018-08-31 12:43:14 -07:00
Evan Tschannen	d7c01f0419	added a separate knob for tlog’s recoverMemoryLimit	2018-08-21 21:11:23 -07:00
Evan Tschannen	7f7755165c	slowly send notifications to clients to clear the list of dead clients	2018-08-08 17:29:32 -07:00
Evan Tschannen	be1a4d74c7	tlogs serve reads to log routers at a low priority, to prevent them from using all their resources catching up a remote dc that has been down for a long time increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget	2018-08-04 10:31:30 -07:00
Stephen Atherton	40762d9f9b	Merge branch 'master' of github.com:apple/foundationdb into feature-redwood	2018-07-25 17:58:52 -07:00
Evan Tschannen	4fedd05506	added more yields to avoid slow tasks	2018-07-12 17:47:35 -07:00
Evan Tschannen	392c73affb	fixed a few slow tasks	2018-07-12 14:06:59 -07:00
Evan Tschannen	cd63c7a7cc	added a buffered cursor, which efficiently merges lots of peek cursors	2018-07-12 12:09:48 -07:00
Stephen Atherton	96389c74cd	Merge branch 'master' of github.com:apple/foundationdb into feature-redwood # Conflicts: # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-07-10 16:42:34 -07:00
Stephen Atherton	1bc95862b7	Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-redwood # Conflicts: # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-07-10 04:16:02 -07:00
Evan Tschannen	ed7c744a8e	fix: we cannot lower max_read_transaction_life_versions too low when configured with remote logs, because it causes the log routers to fall too far behind	2018-07-09 16:55:33 -07:00
Evan Tschannen	5a2cb3037b	merge 5.2 into 6.0	2018-07-08 20:14:06 -07:00
Evan Tschannen	4dd18afb84	fix: we cannot make MAX_RECOVERY_VERSIONS lower than MAX_VERSIONS_IN_FLIGHT because we can mark a recovery as stalled before finishing the recovery, leading to an infinite loop of recoveries	2018-07-07 17:41:20 -07:00
Evan Tschannen	d6c6e7d306	fix: do not attempt data movement to an unhealthy destination team allow building more teams than desired if all teams are unhealthy bestTeamStuck is an error in simulation again	2018-07-07 16:51:16 -07:00
Evan Tschannen	cdafd542ee	fix: fixed a memory leak where leaderInfo notifications are not cleared out	2018-07-06 17:40:29 -07:00
Stephen Atherton	2925b9b984	Merge branch 'master' of github.com:apple/foundationdb into feature-redwood	2018-07-03 23:03:56 -07:00
Evan Tschannen	7a12d3e130	added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive.	2018-07-01 09:39:04 -04:00
Stephen Atherton	b95a2bd6c1	Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood # Conflicts: # flow/Trace.cpp	2018-06-30 23:18:29 -07:00
Evan Tschannen	1f02bdee0a	do not buggify future version delay, because remote storage servers will be delayed getting data so they need additional time	2018-06-29 11:29:22 -07:00
Evan Tschannen	7e68bee692	update better machine classes first to give them a higher chance of becoming the next cluster controller	2018-06-29 01:11:59 -07:00
Evan Tschannen	58c2f67ff6	checking outstanding requests can be CPU intensive, so rate limit checking requests	2018-06-27 23:02:08 -07:00
Evan Tschannen	5fc8199abc	Swapped OkayFit and UnsetFit, because generally if machine classes are set on one machine they are set everywhere and it helps with wait_for_good_recruitment logic wait_for_good_recruitment now requires that you have the desired count of each roll remote recruitment is given a much longer wait_for_good_recruitment time interval, which does not start until enough remote machines have registered	2018-06-22 10:15:24 -07:00
Evan Tschannen	678b4494f4	added logging for the datacenter version difference	2018-06-21 16:31:52 -07:00
Stephen Atherton	e5c48d453a	Merge branch 'master' of github.com:apple/foundationdb into feature-redwood	2018-06-18 22:45:27 -07:00
Evan Tschannen	b79feaddd3	added a hard memory limit to the TLog to prevent it from running out of memory. Because remote logs are not ratekeeper controlled this is their only protection	2018-06-18 17:22:40 -07:00
Stephen Atherton	90c8288c68	Merge branch 'master' of github.com:apple/foundationdb into feature-redwood	2018-06-17 14:55:05 -07:00
Evan Tschannen	889889323e	The master will tell the cluster controller if it is going to take a long time to recruit new logs in its DC; the cluster controller can determine if the other DC would be better and recruit there. The cluster controller will not switch to the other data center if remote logs are too far behind. We will not recruit in DCs with negative priority.	2018-06-13 18:14:14 -07:00
Stephen Atherton	2878f30f29	Merge branch 'master' of github.com:apple/foundationdb into feature-redwood # Conflicts: # fdbserver/IKeyValueStore.h # fdbserver/KeyValueStoreMemory.actor.cpp # fdbserver/storageserver.actor.cpp	2018-06-13 15:56:06 -07:00
Evan Tschannen	372ed67497	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbserver/DataDistribution.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp	2018-06-11 11:34:10 -07:00
Evan Tschannen	4903df5ce9	fix: give time to detect failed servers before building teams	2018-06-10 20:21:39 -07:00
Stephen Atherton	69b713918b	VersionedBTree now uses PrefixTree based pages (with bugs). This required significant changes to both classes because the interface and semantics for building, seeking in, and iterating through pages is very different from the previous trivial approach which was based on serialized vectors. PrefixTree node format rewritten to support optional values without increasing overhead for common node scenarios. PrefixTree::Cursor rewritten to reuse path prefix memory instead of allocating new memory on each movement which is then 'leaked' until destruction. PrefixTree::Cursor movement modified to work better with VersionedBTree::InternalCursor, which was also heavily modified. Added knobs related to key arrangement in PrefixTree nodes. Added StringRef::toHexString() as an alternative to printable() to make reading raw PrefixTree data easier. PrefixTree performance is temporarily worse with this update and VersionedBtree fails its unit test.	2018-06-08 03:32:34 -07:00
Balachandar Namasivayam	529d0497f1	Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload. Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.	2018-06-01 15:21:40 -07:00
Evan Tschannen	e8f6ad88f0	fix: tripled the smallStorageTarget to prevent simulations which do a lot of work from timing out	2018-05-07 17:26:44 -07:00
Evan Tschannen	8cb8198250	fix: the e-brake should be buggified with ratekeeper storage limits to prevent simulation from running full blast into the e-brake resulting in simulation taking forever to complete (joshua timeouts)	2018-05-06 12:33:25 -07:00
Evan Tschannen	12ef63b698	knobify replace contents bytes	2018-05-01 19:43:35 -07:00
Stephen Atherton	af61d3596d	Merge branch 'public-master' into feature-redwood # Conflicts: # fdbserver/DatabaseConfiguration.cpp # fdbserver/OldTLogServer.actor.cpp # fdbserver/fdbserver.vcxproj # fdbserver/fdbserver.vcxproj.filters	2018-04-24 17:22:21 -07:00
Stephen Atherton	2752a28611	Merge branch 'release-5.2' of github.com:apple/foundationdb into feature-redwood	2018-04-06 16:29:37 -07:00
Evan Tschannen	9c8cb445d6	optimized the tlog to use a vector for tags instead of a map	2018-03-17 10:36:19 -07:00
Evan Tschannen	ccd70fd005	The tlog uses the tags embedded in the message instead of a separate vector of locations optimized remote tlog committing to avoid re-serializing the message	2018-03-16 16:47:05 -07:00
A.J. Beamon	f2c804e14f	Reverting changes from merge of master into release-5.2 (`b25810711c`). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit.	2018-03-06 10:15:04 -08:00
Evan Tschannen	37a6a81634	Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs # Conflicts: # fdbserver/workloads/RestartRecovery.actor.cpp	2018-02-23 12:33:28 -08:00
Alec Grieser	0bae9880f1	remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py	2018-02-21 10:25:11 -08:00
Stephen Atherton	0a35f167e4	Merge branch 'master' into feature-redwood # Conflicts: # fdbserver/DiskQueue.actor.cpp # fdbserver/IDiskQueue.h # fdbserver/Knobs.cpp # fdbserver/Knobs.h # fdbserver/fdbserver.vcxproj # fdbserver/fdbserver.vcxproj.filters # fdbserver/worker.actor.cpp	2018-02-12 01:30:02 -08:00
Evan Tschannen	42405c78a5	Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs # Conflicts: # fdbserver/ClusterController.actor.cpp # fdbserver/Knobs.cpp	2018-02-10 12:08:52 -08:00
Evan Tschannen	c7b3be5b19	re-enabled better master exists the cluster controller can choose a better data center for itself and let the workers know where the next cluster controller should be recruited	2018-02-09 16:48:55 -08:00
Evan Tschannen	d0caffd339	fix: knob was set to incorrect value	2018-02-06 18:11:45 -08:00
Evan Tschannen	b7dde88029	fix: the cluster controller did not consider the master sharing the same process as the cluster controller as bad in all needed locations waited too long for good recruitment locations, which would add too much time to recoveries of clusters that do not use machine classes	2018-02-06 11:30:05 -08:00
Evan Tschannen	21482a45e1	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbserver/DBCoreState.h # fdbserver/LogSystem.h # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/TLogServer.actor.cpp	2018-01-14 13:40:24 -08:00
Evan Tschannen	022df3b91b	backup and restore sometimes took too long in simulation	2018-01-09 17:26:42 -08:00
Evan Tschannen	5ac4f73978	Merge branch 'release-5.1' into feature-remote-logs # Conflicts: # fdbclient/NativeAPI.actor.cpp # fdbrpc/Locality.h # fdbrpc/simulator.h # fdbserver/ApplyMetadataMutation.h # fdbserver/ClusterController.actor.cpp # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/masterserver.actor.cpp # flow/Net2.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-01-05 11:33:42 -08:00
A.J. Beamon	bb1297c686	Remove RkServerQueueInfo and RkTLogQueueInfo trace events, since this information is more or less already logged on the storage servers and tlogs. Update the quiet database check and magnesium to use the information from the logs and storage servers.	2017-11-14 12:59:42 -08:00
Evan Tschannen	3a4078bdda	the keyservers shards are always a fixed large size	2017-10-27 11:52:11 -07:00
Alex Miller	f997cb9038	Add a string knob to hold the Log directory, and write profiles to it. This is the combination of two small changes. 1. Add support for a string knob type. 2. Change profiles to be written to the log directory instead of the working directory. We have three options of where to write files: the working directory, the data directory, and the log directory. The working directory may be set to a non-writable location, and likely contains the fdb binaries. Allowing these files to be overwritten would likely not be a wise idea. The data directory hosts our sqlite b-trees. It would also be very unfortunate if these were ever overwritten by an unfortunate profile name. The log directory contains logs. Out of the three, these matter the least if they disappear or become corrupted. Thus, we write to the log directory.	2017-10-16 16:05:02 -07:00
Evan Tschannen	15962cf079	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbrpc/Locality.cpp # fdbrpc/Locality.h # fdbserver/ClusterController.actor.cpp # fdbserver/ClusterRecruitmentInterface.h # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/fdbserver.vcxproj.filters # fdbserver/masterserver.actor.cpp # fdbserver/worker.actor.cpp # flow/error_definitions.h	2017-10-05 17:09:44 -07:00
Bhaskar Muppana	0f8ff26029	Merge pull request #158 from bmuppana/master <rdar://problem/34557380> Need a way to map real time to version	2017-09-27 17:56:42 -07:00
Bhaskar Muppana	6a0b1d6808	Fixing PR comments <rdar://problem/34557380> Need a way to map real time to version	2017-09-27 17:56:01 -07:00
Bhaskar Muppana	0bf5bdb23a	<rdar://problem/34557380> Need a way to map real time to version	2017-09-25 12:51:37 -07:00
Evan Tschannen	489332533c	all timeouts longer than two minutes have been can be lowered to 60.0 with buggification added a workload that tries for a 50 second maximum latency in the presence of one failure with both buggification and connection failures	2017-09-18 11:04:51 -07:00
Stephen Atherton	5d49f2c710	Merge branch 'master' into feature-redwood # Conflicts: # fdbserver/fdbserver.vcxproj	2017-08-28 17:45:50 -07:00
Evan Tschannen	c22708b6d6	added tag localities fix: remote logs need to stop the master when they are stopped	2017-08-03 16:16:36 -07:00
Evan Tschannen	0906250e78	merged everything from feature-remote-logs besides the tlog and tagpartitionedlogsystem re-included tags in messages to the tlog previously never committed the LogRouter	2017-06-29 15:50:19 -07:00
Evan Tschannen	69c862ed6e	updated release notes for 5.0.1	2017-06-28 16:52:45 -07:00
Evan Tschannen	4bdcd8fc12	Merge branch 'release-4.6' into release-5.0 # Conflicts: # bindings/bindingtester/run_binding_tester.sh # fdbrpc/AsyncFileKAIO.actor.h	2017-06-14 16:43:53 -07:00
Stephen Atherton	b65ad3563c	Merge branch 'master' into feature-redwood # Conflicts: # fdbserver/fdbserver.vcxproj # fdbserver/fdbserver.vcxproj.filters	2017-06-09 14:56:41 -07:00
FDB Dev Team	a674cb4ef4	Initial repository commit	2017-05-25 13:48:44 -07:00

... 2 3 4 5 6

291 Commits