foundationdb

Commit Graph

Author	SHA1	Message	Date
Trevor Clinkenbeard	8144882d7b	Merge branch 'apple-master' into features/local-rk	2019-06-10 19:40:25 -07:00
Evan Tschannen	29b96414e2	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/NativeAPI.actor.cpp # fdbserver/Coordination.actor.cpp # flow/Arena.h # versions.target	2019-06-03 18:49:35 -07:00
Evan Tschannen	7c333dbc16	If a process receives a message in its clusterControllerInterface before becoming the cluster controller, if the process does not become the cluster controller in the next minute it should destroy the interface to prevent a memory leak.	2019-05-29 16:57:13 -07:00
sramamoorthy	31b6c86650	ignorePopDeadline to have high limit in simulator - ignorePopDeadline to have highier limit in simulator to accommdate for the buggify delays and make snapshot succeed. - introduce a new knob for auto resetting the disabling of tlog pop	2019-05-28 22:07:46 -07:00
A.J. Beamon	603721e125	Merge branch 'master' into thread-safe-random-number-generation # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbrpc/AsyncFileCached.actor.h # fdbrpc/genericactors.actor.cpp # fdbrpc/sim2.actor.cpp # fdbserver/DiskQueue.actor.cpp # fdbserver/workloads/BulkSetup.actor.h # flow/ActorCollection.actor.cpp # flow/Net2.actor.cpp # flow/Trace.cpp # flow/flow.cpp	2019-05-23 08:35:47 -07:00
Evan Tschannen	8c3516951a	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-05-12 20:13:49 -07:00
Alex Miller	ea12a54946	Rename DISK_QUEUE_MAX_TRUNCATE_EXTENTS -> ..._BYTES So as to not make filesystem assumptions. This knob did technically appear in (only the) 6.1.5 release, but this feature was broken 6.1.5, so thus impossible to use anyway.	2019-05-10 18:26:22 -10:00
A.J. Beamon	5f55f3f613	Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.	2019-05-10 14:01:52 -07:00
Evan Tschannen	22499666d0	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbserver/LogRouter.actor.cpp # flow/Trace.cpp # versions.target	2019-05-08 18:19:35 -07:00
Alex Miller	0685e6c1c7	Avoid large truncates in the DiskQueue. And instead create a new file while incrementally truncating the old one down. This avoids queueing up a massive number of filesystem metadata operations in one call, thus flooding the disk with requests and stalling out all other filesystem operations. This sets the knobs so that a truncate of >10GB causes us to create a new file rather than trying to truncate the old one.	2019-05-08 12:33:31 -10:00
Alex Miller	4052f3826a	Add a knob to limit the number of commits indexed per key. Theoretically, we could spill 20MB of 22B mutations for one key, which would generate a very long value being stored in SQLite, and very inefficiently read back. This stops that from being a problem, at the cost of some extra write calls.	2019-05-03 15:27:10 -07:00
Alex Miller	f4e48c3851	Add a knob to limit amount of data read from sqlite for one PeekRequest. This prevents peeking from degrading over time if there are a very large number of SpilledData entries for one particular tag.	2019-05-02 17:26:45 -07:00
Evan Tschannen	2d5043c665	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-04-30 18:27:04 -07:00
Evan Tschannen	1a4c1759a4	Merge pull request #1429 from jzhou77/pprof Dump heap profiler when memory usage is high	2019-04-29 16:31:44 -07:00
Evan Tschannen	cacd82758e	Reduced data distribution speeds	2019-04-26 13:54:49 -07:00
Evan Tschannen	9ff8aca1da	Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation)	2019-04-26 13:53:56 -07:00
A.J. Beamon	253d2400ef	Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning # Conflicts: # documentation/sphinx/source/release-notes.rst	2019-04-23 14:38:52 -07:00
A.J. Beamon	4ad0496b39	Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process.	2019-04-23 14:01:51 -07:00
Stephen Atherton	83db547306	Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES.	2019-04-23 04:50:58 -07:00
Jingyu Zhou	6870e132b2	Merge branch 'master' into pprof	2019-04-19 14:06:44 -07:00
Andrew Noyes	d1e86779a6	Address review comments	2019-04-18 08:48:27 -07:00
mpilman	32393ec4c9	Prototype of local ratekeeper	2019-04-08 11:04:44 -07:00
Evan Tschannen	05869a8383	do not log a degraded reset message if the previous reset was more than a week ago	2019-04-07 23:00:58 -07:00
Jingyu Zhou	4b08042a88	Change memory profiling threshold to a flag	2019-04-05 16:33:51 -07:00
Jingyu Zhou	09b2c35d11	Dump heap profiler when memory usage is high Set the threshold of dump to 2GB.	2019-04-05 16:12:23 -07:00
Evan Tschannen	390ab9cfed	A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy	2019-04-04 14:11:12 -07:00
A.J. Beamon	71e2fdafb8	Changes to ratekeeper camel case	2019-03-27 08:24:25 -07:00
Evan Tschannen	6254a1a8e4	fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop	2019-03-22 18:37:39 -07:00
A.J. Beamon	2d7b48dadc	Merge pull request #1311 from etschannen/feature-increase-grv-batch Increased the GRV client batch size	2019-03-19 08:23:05 -07:00
Evan Tschannen	2554fed965	reduce max transaction to start	2019-03-18 16:16:03 -07:00
Evan Tschannen	87e2a1a029	The proxy budget is implemented to let one request over its limit through, and then pay back what was over the limit in the next update	2019-03-18 16:09:57 -07:00
Alex Miller	29ab7370cd	Clear versionLocation when spilling, and pop DQ separately. Popping the disk queue now requires potentially recovering the location to which we can pop from the spilled data itself, and for each tag we must maintain the first location with relevant data. The previous queue we had to represent the ordering, queueOrder, was used by spilling, and popped when a TLog had been spilled. This means that as soon as a TLog has been fully spilled, we have no idea how it relates in order to other fully spilled TLogs. Instead, use queueOrder to keep track of all the TLog UIDs until they're removed, and use spillOrder to keep track of the order only for spilling.	2019-03-18 15:09:22 -07:00
Evan Tschannen	ec6c843124	increased the GRV client batch size, similarly increased the proxy limits related to the number of transactions started in a batch	2019-03-16 16:18:58 -07:00
Evan Tschannen	e068c478b5	merge master	2019-03-12 18:31:25 -07:00
Evan Tschannen	c6e94293bf	reset a process to not be degraded after 2 days	2019-03-10 22:39:21 -07:00
Evan Tschannen	53f16b5347	when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded	2019-03-08 11:46:34 -05:00
Jingyu Zhou	3c86643822	Separate Ratekeeper from data distribution. Add a new role for ratekeeper. Remove StorageServerChanges from data distribution. Ratekeeper monitors storage servers, which borrows the idea from DataDistribution.	2019-03-07 13:16:20 -08:00
Alex Miller	94bf75cb00	Allow the disk queue to shrink if it has unneeded slack space.	2019-03-04 01:42:38 -08:00
Alex Miller	9ef283d4e7	Implement hard limiting of memory used to serve peek requests.	2019-03-04 01:42:38 -08:00
Alex Miller	e7d8520c63	Batch more when spilling data.	2019-03-04 01:42:38 -08:00
Trevor Clinkenbeard	39f612d132	Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics	2019-03-02 17:07:00 -08:00
A.J. Beamon	a051055caf	Initial implementation of adding separate limits for batch priority in ratekeeper	2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard	abfe057805	Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics	2019-02-25 13:47:16 -08:00
Evan Tschannen	b8910ba7cd	Merge branch 'master' into feature-fix-force-recovery # Conflicts: # fdbclient/ManagementAPI.actor.h # fdbserver/DataDistribution.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/KillRegion.actor.cpp	2019-02-22 14:38:13 -08:00
Meng Xu	9445ac0b0c	Status: Use new data distributor worker to publish status After we add a new data distributor role, we publish the data related to data distributor and rate keeper through the new role (and new worker). So the status needs to contact the data distributor, instead of master, to get the status information.	2019-02-21 18:05:50 -08:00
Meng Xu	7cca439e00	TeamRemover: Add status to show redundant team removing Distinguish the removal of unhealthy team and redundant team. Change status report to include redundant team removal report.	2019-02-21 14:16:46 -08:00
Trevor Clinkenbeard	fa96b8dd33	Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics	2019-02-20 16:56:16 -08:00
Meng Xu	d86ba0e811	TeamRemover: Change it to run periodically This simplifies the problem of when we should invoke the teamRemover	2019-02-20 16:08:34 -08:00
Evan Tschannen	27e3617548	fix: remove bad teams needed to use dd_stall_check delay, because in simulation the buggified delay time could make us remove bad teams before they submit their ranges to the queue	2019-02-20 14:18:36 -08:00
Evan Tschannen	d4737fac0f	knobify force recovery recovery check delay	2019-02-19 16:05:20 -08:00

1 2 3

145 Commits