foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	cacd82758e	Reduced data distribution speeds	2019-04-26 13:54:49 -07:00
Evan Tschannen	9ff8aca1da	Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation)	2019-04-26 13:53:56 -07:00
Evan Tschannen	1f37f82b87	invalid knob overrides do not prevent fdbserver from starting	2019-04-25 17:08:13 -07:00
Evan Tschannen	6c77864731	separate GetStorageServerRejoinInfoRequest from GetKeyServerLocationsRequest, to avoid yielding for the rejoin requests	2019-04-25 17:07:35 -07:00
Alex Miller	797d431934	Add an \xff keyrange that is backed by the txnStateStore.	2019-04-25 17:04:20 -07:00
Trevor Clinkenbeard	d339becd7c	Fix currentRate calculation for local ratekeeper	2019-04-25 15:35:34 -07:00
Jingyu Zhou	5462f560e7	Add pseudo locality for log routers and tlogs This changes the logic of pop operations from log routers (LG): - LG pops tagLocalityLogRouterMapped from TLogs; - TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before popping. Later when we add more psuedo localities, the same pattern can be used.	2019-04-23 21:35:56 -07:00
A.J. Beamon	253d2400ef	Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning # Conflicts: # documentation/sphinx/source/release-notes.rst	2019-04-23 14:38:52 -07:00
A.J. Beamon	ea7abff9df	Clean up from review	2019-04-23 14:16:52 -07:00
A.J. Beamon	4ad0496b39	Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process.	2019-04-23 14:01:51 -07:00
Stephen Atherton	df0548503d	Merge branch 'release-6.1' of https://github.com/apple/foundationdb into sqlite-grow-bigger	2019-04-23 13:43:58 -07:00
A.J. Beamon	e0f76edf77	Merge pull request #1471 from AlvinMooreSr/release-6.1-merge Merge Release 6.1 Into Master	2019-04-23 11:08:21 -07:00
Stephen Atherton	83db547306	Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES.	2019-04-23 04:50:58 -07:00
Evan Tschannen	e0f7ec96aa	Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers	2019-04-22 17:29:46 -07:00
Jingyu Zhou	439d5a3843	Use emplace_back instead of push_back in Proxy	2019-04-22 14:03:48 -07:00
Jingyu Zhou	d2b215b926	Refactor tag population of ServerCacheInfo	2019-04-22 11:55:04 -07:00
Jingyu Zhou	7cb61c766b	Fix tLogLocalities for current LogSet In toCoreState(), the serialization of current LogSet is different from old TLog sets. The locality data should be generated, not copied over. Found by: -r simulation --crash -f tests/fast/KillRegionCycle.txt -s 254666356 -b on	2019-04-21 10:41:07 -07:00
Jingyu Zhou	8b67da57bb	Fix upgrade test failure Serialize pseudoLocalities if protocol version is larger than 0x0FDB00B061060001LL. Note this version may need to be changed to "currentProtocolVersion" when merging into the master, and "currentProtocolVersion" should be incremented.	2019-04-21 10:41:07 -07:00
Jingyu Zhou	9e8ffd2ff7	Refactor OldLogData ctor	2019-04-21 10:41:07 -07:00
Jingyu Zhou	97986a28b7	Replace push_back with emplace_back for efficiency And better code readability.	2019-04-21 10:41:07 -07:00
Jingyu Zhou	010f825aff	Remove pseudoLocalities from LogSet, TLogSet, and CoreTLogSet	2019-04-21 10:41:07 -07:00
Jingyu Zhou	7befce6bf1	More pseudoLocalities and refactors.	2019-04-21 10:41:07 -07:00
Jingyu Zhou	66000a07a5	Use emplace_back instead of push_back	2019-04-21 10:41:07 -07:00
Jingyu Zhou	966ec30fcc	Add pseudoLocalities for special tag consumers	2019-04-21 10:41:07 -07:00
Jingyu Zhou	b4e7e7a85b	Refactor StorageCache updates	2019-04-21 10:41:07 -07:00
Jingyu Zhou	82ec80c42f	Refactor TLogSet ctor	2019-04-21 10:41:07 -07:00
Jingyu Zhou	d19b0cf1c1	Refactor LogSet with two new constructors	2019-04-21 10:41:07 -07:00
Jingyu Zhou	0b1984978a	Small code refactoring.	2019-04-21 10:41:07 -07:00
Jingyu Zhou	ec1bc5cfca	Add LogSystemType enum	2019-04-21 10:41:07 -07:00
Jingyu Zhou	6870e132b2	Merge branch 'master' into pprof	2019-04-19 14:06:44 -07:00
Andrew Noyes	d1e86779a6	Address review comments	2019-04-18 08:48:27 -07:00
Andrew Noyes	5af8208c62	Fix JavaWorkload unused variable	2019-04-17 16:29:22 -07:00
Andrew Noyes	ef04471a66	Fix more unused-variable warnings	2019-04-17 16:04:10 -07:00
Alvin Moore	2bea99591e	Merge branch 'release-6.1' of copy of master # Conflicts: # documentation/sphinx/source/release-notes.rst	2019-04-17 15:51:48 -07:00
Andrew Noyes	13ba915a19	Fix more unused variable warnings	2019-04-17 15:38:08 -07:00
A.J. Beamon	43533b3d72	Don't validate the shard size estimate unless enough keys are sampled with a less than 100% probability.	2019-04-17 11:01:23 -07:00
Trevor Clinkenbeard	3426205167	Fixed readGuard usage bug	2019-04-16 15:05:57 -07:00
Trevor Clinkenbeard	1d921da170	readGuard sends server_overloaded error if request is rejected	2019-04-16 11:29:01 -07:00
Trevor Clinkenbeard	8a7d9afbe9	Fixed name of LocalRatekeeperWorkloadFactory	2019-04-16 10:36:09 -07:00
Trevor Clinkenbeard	0594154644	Fixed getPenalty calculation	2019-04-16 10:17:41 -07:00
Andrew Noyes	baa3e806ef	Address review comments from #1446	2019-04-16 09:48:15 -07:00
Andrew Noyes	6207d724f8	Fix all -Wunused-variable warnings	2019-04-15 18:13:00 -07:00
Evan Tschannen	cd5c9d91fa	Merge pull request #1443 from etschannen/master Merge 6.1 into master	2019-04-10 17:43:07 -07:00
Balachandar Namasivayam	04e9aa6afd	For small clusters that are growing quickly, it could happen that the rateLimit is set to a low value and it would take very long to read the entire database. Fix this by setting the rateLimit to the maximum allowed value if reading the entire database is taking a long time.	2019-04-10 17:13:37 -07:00
Jingyu Zhou	ab834c4f7e	Move profiling option help message to devhelp	2019-04-09 13:26:12 -07:00
Evan Tschannen	6220a5ce0f	Merge pull request #1370 from jzhou77/fix-unreferenced Remove unused functions	2019-04-09 11:49:45 -07:00
A.J. Beamon	058d028099	Merge pull request #1301 from mpilman/features/cheaper-traces Defer formatting in traces to make them cheaper	2019-04-09 10:11:04 -07:00
Evan Tschannen	21c0ba555c	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-04-08 18:38:42 -07:00
Evan Tschannen	d126730b4d	fixed a spurious test error where process_behind was treated as an error	2019-04-08 17:09:54 -07:00
A.J. Beamon	538b431656	Apply suggestions from code review	2019-04-08 14:55:58 -07:00
A.J. Beamon	a7288e1325	Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate.	2019-04-08 14:21:24 -07:00
mpilman	789cd67bcd	Don't compile JavaWorkload by default	2019-04-08 13:06:29 -07:00
mpilman	c45fe8c697	Fixed typo	2019-04-08 11:33:45 -07:00
Trevor Clinkenbeard	b286102d34	Update fdbserver/workloads/LocalRatekeeper.actor.cpp Co-Authored-By: mpilman <markus@pilman.ch>	2019-04-08 11:06:17 -07:00
mpilman	d2e74cb2c0	Fix stupid rounding error	2019-04-08 11:05:29 -07:00
mpilman	aaa8f73bdc	fixed missing refactoring code	2019-04-08 11:05:29 -07:00
mpilman	bdba8e22eb	Added test and bugfixes	2019-04-08 11:05:29 -07:00
mpilman	b944e0b116	generalized read guards, allow for penalty+error	2019-04-08 11:04:44 -07:00
mpilman	207049e852	fixed serialization	2019-04-08 11:04:44 -07:00
mpilman	32393ec4c9	Prototype of local ratekeeper	2019-04-08 11:04:44 -07:00
Evan Tschannen	05869a8383	do not log a degraded reset message if the previous reset was more than a week ago	2019-04-07 23:00:58 -07:00
Jingyu Zhou	4b08042a88	Change memory profiling threshold to a flag	2019-04-05 16:33:51 -07:00
Andrew Noyes	d7612a4426	Fix OPEN_FOR_IDE build errors	2019-04-05 16:30:42 -07:00
Jingyu Zhou	09b2c35d11	Dump heap profiler when memory usage is high Set the threshold of dump to 2GB.	2019-04-05 16:12:23 -07:00
mpilman	d01cbf3455	Addressed code review comments	2019-04-05 13:12:20 -07:00
mpilman	4287b1d2a1	resolved minor merge issues	2019-04-05 13:12:19 -07:00
A.J. Beamon	614a599a04	Update fdbserver/SimulatedCluster.actor.cpp Co-Authored-By: mpilman <markus@pilman.ch>	2019-04-05 13:12:19 -07:00
mpilman	39ecbedd74	Fixed compilation errors on OS X & gcc8	2019-04-05 13:12:19 -07:00
mpilman	1c16f87a4e	Remove trace-calls to printable (in non-workloads)	2019-04-05 13:12:19 -07:00
mpilman	ea67b742c7	Implemented Traceable for printable types	2019-04-05 13:12:19 -07:00
mpilman	bb82f8560a	process all volatile ints correctly in traces	2019-04-05 13:12:19 -07:00
mpilman	02e3b634fb	Compile sqlite with NDEBUG so we can debug	2019-04-05 13:12:19 -07:00
mpilman	c008e16c81	Defer formatting in traces to make them cheaper This is the first part of making `TraceEvent` cheaper. The main idea is to defer calls to any code that formats string. These are the main changes: - TraceEvent::detail now takes a c-string instead of std::string for literals. This prevents unnecessary allocations if the trace is not going to be printed in the first place (for example for SevDebug). Before that `detail` expected a `std::string` as key, which mean that any string literal would be copied on each call. - Templates Traceable and SpecialTraceMetricType. These templates can be specialized for any type that needs to be printed. The actual formatting will be deferred to after the `enabled` check. This provides two benefits: (1) if a TraceEvent is disabled, we don't pay for the formatting and (2) TraceEvent can trace types that it doesn't know about. - TraceEvent::enabled will be set in the constructor if the Severity is passed. This will make sure that `TraceEvent::init` is not called. - `TraceEvent::detail` will be inlined. So for disabled TraceEvent calls, a call to detail will only introduce a if-branch which is much cheaper than a function call.	2019-04-05 13:12:19 -07:00
Jingyu Zhou	acf60c5e9a	Merge pull request #1414 from jzhou77/pprof Add manually triggered heap profiling	2019-04-04 22:27:33 -07:00
Jingyu Zhou	5be592632b	Change trace event message If heap profiler is not running, we can't take a snapshot of the profile.	2019-04-04 15:29:50 -07:00
Jingyu Zhou	f538df5e6c	Add TraceEvent if unable to invoke heap profiler	2019-04-04 15:26:41 -07:00
Evan Tschannen	390ab9cfed	A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy	2019-04-04 14:11:12 -07:00
Alex Miller	8f49be480b	Update fdbserver/worker.actor.cpp Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>	2019-04-04 13:32:10 -07:00
Jingyu Zhou	eaaf58ee34	Refactor profiler into cpu and heap profilers	2019-04-03 20:54:30 -07:00
Jingyu Zhou	3371cf22d4	Add manually triggered heap profiling At client side: fdb> profile ERROR: Usage: profile <client\|list\|flow\|heap> fdb> profile heap 127.0.0.1:4500 On the server side: $ HEAPPROFILE=/tmp/fdbserver bin/fdbserver -C ../test.cluster -p 127.0.0.1:4500 Starting tracking the heap FDBD joined cluster. Dumping heap profile to /tmp/fdbserver.0001.heap (1024 MB allocated cumulatively, 13 MB currently in use) Dumping heap profile to /tmp/fdbserver.0002.heap (User triggered heap dump)	2019-04-03 16:00:54 -07:00
Markus Pilman	101a05ae77	Merge branch 'master' into features/client-simulator	2019-04-03 10:03:56 -08:00
Jingyu Zhou	fc59587b3c	Merge pull request #1393 from jzhou77/pprof Gperftools Profiling fix.	2019-04-03 10:35:31 -07:00
Evan Tschannen	39c595223b	Merge branch 'release-6.1'	2019-04-02 22:30:02 -07:00
Evan Tschannen	30133a30e0	Merge pull request #1403 from etschannen/release-6.1 Ported a bug fix to the 6.0 log system, and updated documentation	2019-04-02 17:56:18 -07:00
Jingyu Zhou	56a1128a9b	Enhance cmake's gperftools support Add compiler flags and link flags for gperftools.	2019-04-02 17:34:29 -07:00
Evan Tschannen	31ed73d9f5	Ported the bug fix https://github.com/apple/foundationdb/pull/1379 to OldTLogServer_6_0	2019-04-02 15:27:37 -07:00
Evan Tschannen	1d4a6ab551	cleaned up status to keep the healthyZone read separated from relicaFutures	2019-04-02 14:46:56 -07:00
Evan Tschannen	a38c396283	made all maintenance transactions lock aware	2019-04-02 14:27:48 -07:00
Evan Tschannen	628fec8c8b	updated status with information about ongoing maintenance clear the maintenance zone if a different storage server is detected failed	2019-04-02 14:15:51 -07:00
mpilman	371a41dbba	Allow classPath to be modified at runtime	2019-04-02 11:56:40 -07:00
mpilman	e19901186f	Fixed buggy register preparation for natives	2019-04-02 11:56:03 -07:00
Evan Tschannen	72203ba47a	Merge commit '56f3f0b1bc60604f965152d856ae29a591227703'	2019-04-01 18:45:38 -07:00
Evan Tschannen	781cf9b5a0	added the ability to make a zoneId for maintenance in fdbcli	2019-04-01 17:55:13 -07:00
Evan Tschannen	f5de52de91	fix: cancel the previous log system recruitment before calling newEpoch, to avoid multiple actors attempting to modify oldLogSystem at the same time	2019-04-01 16:38:25 -07:00
Jingyu Zhou	49fdc35e5e	Gperftools Profiling fix. Fix a bug and update gperftools compiling flags The added flags are recommended by gperftools here: https://github.com/gperftools/gperftools Verified that heap profiles are saved with the following command: HEAPPROFILE=/tmp/fdbserver fdbserver [args...]	2019-04-01 14:42:18 -07:00
mpilman	b148981bba	Fixed compilation issues with char*	2019-04-01 14:29:45 -07:00
Jingyu Zhou	47b4b82628	Merge branch 'master' into fix-unreferenced	2019-04-01 14:07:19 -07:00
Jingyu Zhou	3f76be8f45	Merge remote-tracking branch 'apple/master' into fix-unreferenced	2019-04-01 14:00:43 -07:00
Jingyu Zhou	f7f8ddd894	Fix warnings on unused variables Found by -Wunused-variable flag.	2019-04-01 14:00:20 -07:00
mpilman	e23e63c6ac	Implemented JavaWorkload This change allows a user to write a workload in Java. The way this is implemented is by creating a JVM within the simulator and calling the corresponding workload class. A workload can then run in the simulator or on a testing cluster. If the workload is executed within the simulator, the resulting test will not be deterministic anymore as it will execute in a different thread (and even without that it is not clear, whether we could get determinism as the JVM does a lot of stuff that are not deterministic). This is intendet to get better testing of the Java client and layer authors can use the simulator to test their layers on a single machine but they can still simulate failing machines etc.	2019-03-31 17:57:43 -07:00
Evan Tschannen	a46620fbee	Merge branch 'release-6.1'	2019-03-30 17:59:28 -07:00
Evan Tschannen	8ebf771392	cleanup cluster controller trace events	2019-03-30 14:17:18 -07:00
Alex Miller	e7ad39246c	Fix typo	2019-03-29 20:16:26 -07:00
Evan Tschannen	a44ffd851e	fix: the shared tlog could fail to update a stopped tlog’s queueCommitVersion to version if a second tlog registered before it could issue the first commit for the tlog	2019-03-29 20:11:30 -07:00
Evan Tschannen	d882c060bf	Merge commit '5dd6396eed0de0dfea6cf9eecc307995eff5cedc'	2019-03-28 18:00:55 -07:00
Balachandar Namasivayam	0bbdc15f71	Multi-test processes waits until a timeout if any of the tester processes restarts. Use getReplyUnlessFailedFor instead of getReply to detect the restarts and fail quickly instead of waiting for a timeout which is usually large.	2019-03-28 17:05:30 -07:00
Evan Tschannen	b6008558d3	renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>() eliminated an unnecessary copy from the proxy commit path eliminated an unnecessary copy from buffered peek cursor	2019-03-28 11:52:50 -07:00
Evan Tschannen	836bb95a7a	Merge pull request #1372 from etschannen/master Merge 6.1 into master	2019-03-27 21:00:49 -07:00
Evan Tschannen	34b9d5e722	Merge pull request #1364 from etschannen/feature-fast-serialize A few performance optimizations	2019-03-27 20:57:25 -07:00
Evan Tschannen	e5a80f2c94	optimized IPaddress	2019-03-27 18:21:13 -07:00
Jingyu Zhou	a55f06e082	Remove unused functions Found with -Wunused-function flag.	2019-03-27 15:45:28 -07:00
Stephen Atherton	64554e90d4	Change this to THIS in actors for IDE compatibility.	2019-03-27 13:42:49 -07:00
Stephen Atherton	d5c8b6b083	Merge branch 'master' of github.com:apple/foundationdb into feature-redwood # Conflicts: # fdbserver/VersionedBTree.actor.cpp # flow/flow.h	2019-03-27 13:37:15 -07:00
A.J. Beamon	91014d4529	Add file changes that I accidentally failed to commit; fix naming issue in worker.	2019-03-27 08:41:19 -07:00
A.J. Beamon	71e2fdafb8	Changes to ratekeeper camel case	2019-03-27 08:24:25 -07:00
A.J. Beamon	d508658569	Make ratekeeper one word to match our existing convention	2019-03-27 08:15:19 -07:00
Jingyu Zhou	38c6681349	Fix some signed and unsigned mismatch warnings.	2019-03-26 14:54:11 -07:00
Jingyu Zhou	c0b58080ee	Fix type name warning for DDTeamCollection Seen using 'class' now seen using 'struct' in DataDistribution.actor.cpp	2019-03-26 14:18:25 -07:00
Jingyu Zhou	7c02ee6fdd	Fix compiler warning about unreferenced exception variable	2019-03-26 13:43:47 -07:00
Jingyu Zhou	466a59a99d	Merge remote-tracking branch 'apple/release-6.1' into ratekeeper	2019-03-25 15:27:38 -07:00
Jingyu Zhou	f57a22e2ed	Add data distributor and ratekeeper to status output	2019-03-25 15:11:29 -07:00
Trevor Clinkenbeard	007abbc45b	Added 96-byte FastAllocator Since storage queue nodes account for a large portion of memory usage, we can save space by only allocating 96 bytes instead of 128 bytes for each node.	2019-03-25 13:44:39 -07:00
Evan Tschannen	5e03e178de	Merge pull request #1345 from ajbeamon/support-multiple-client-or-worker-issues Add support for a client or worker having multiple issues.	2019-03-24 17:27:50 -07:00
Evan Tschannen	d45159ebf7	Merge pull request #1307 from jzhou77/ratekeeper Monitor placement of Ratekeeper and DataDistributor	2019-03-24 17:26:07 -07:00
Evan Tschannen	d6ad027d37	ratekeeper needs to be recruited for proxies to make progress, so if one has not registered with the cluster controller by the time we are accepting commits, recruit a new one	2019-03-24 16:48:24 -07:00
Evan Tschannen	f426d732ea	fix: forgot to remove one location where id_used was incremented for distributor and ratekeeper	2019-03-24 16:04:59 -07:00
Evan Tschannen	e8948726e8	once we recruit a ratekeeper, do not allow any other ratekeepers to register	2019-03-24 11:04:39 -07:00
Evan Tschannen	24c92a1870	Merge pull request #1352 from etschannen/feature-network-address-list Changed NetworkAddressList to at most two addresses for performance	2019-03-24 10:22:38 -07:00
Evan Tschannen	50a4403661	fix: missing parathesis	2019-03-23 21:52:15 -07:00
Jingyu Zhou	40eec20252	Restore master PID in worker registration This fix is lost during merge.	2019-03-23 21:02:11 -07:00
Jingyu Zhou	3ef26e6be3	Fix fitness assignment statements Found by MacOS build.	2019-03-23 19:16:04 -07:00
Evan Tschannen	1fc6937802	changed NetworkAddressList to at most two addresses for performance	2019-03-23 17:54:46 -07:00
Evan Tschannen	b51a24453e	the data distributor and ratekeeper are not included in id_used, but when comparing equally good options we prefer to avoid sharing with those roles excluded data distributor and ratekeeper were improperly killed when the best option was also excluded	2019-03-23 13:25:36 -07:00
Jingyu Zhou	10988f89d9	Code refactoring for ConsistencyCheck.actor.cpp	2019-03-23 11:06:43 -07:00
Jingyu Zhou	fdc5b5ddbf	Fix: spurious ratekeeper registration A rare race condition: -r simulation -f ./foundationdb/tests/slow/WriteDuringReadAtomicRestore.txt -s 114256311 -b on - A is the ratekeeper. - CC recruit B and B starts - CC halts ratekeeper A and A is halted - A registers back with CC, which then halts B. CC sets A to be the ratekeeper. CC starts recruiting and finds A is the best machine. But skips recruiting because CC thinks A is already used. Now the cluster is left with no ratekeeper. Fix by disallowing ratekeeper registration with previous ID.	2019-03-23 11:03:51 -07:00
Jingyu Zhou	6523cd4931	Fix: recruit ratekeeper is not triggerred	2019-03-23 09:20:54 -07:00
Steve Atherton	09f37cf3d2	Merge pull request #533 from ajbeamon/fix-parent-directory Fixes to parentDirectory() and abspath()	2019-03-22 23:53:46 -07:00
Evan Tschannen	2da46e3172	fix: halt if datacenters are different	2019-03-22 23:53:21 -07:00
Evan Tschannen	b68bc46042	Merge pull request #1348 from ajbeamon/fix-missing-metrics-when-ss-down Fix missing read workload metrics	2019-03-22 19:08:04 -07:00
Evan Tschannen	d34c56c9a5	ensure that the processId exists in id_worker before accessing it	2019-03-22 18:54:39 -07:00
Balachandar Namasivayam	ac8ad07b45	Address review comments.	2019-03-22 18:48:49 -07:00
Balachandar Namasivayam	4ed323ac52	Fixed bug and addressed review comments.	2019-03-22 18:48:49 -07:00
Balachandar Namasivayam	d75020b44a	Fix bug where accessing shared memory created by boost 1.52 leads to error when accessed by boost 1.67.	2019-03-22 18:48:49 -07:00
Evan Tschannen	36ab852bb1	Merge branch 'master' into ratekeeper # Conflicts: # fdbserver/ClusterController.actor.cpp	2019-03-22 18:41:00 -07:00
Evan Tschannen	6254a1a8e4	fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop	2019-03-22 18:37:39 -07:00
Evan Tschannen	7dd1c1b60c	fix: processClassFitness could be wrong if the client changed their class while rebooting	2019-03-22 18:37:39 -07:00
Evan Tschannen	ddb6058770	simplified ratekeeper monitoring loop	2019-03-22 18:22:45 -07:00
Jingyu Zhou	12917d8c7d	Add actors to store halt request futures Address best fitness in checking better DD or RK.	2019-03-22 18:06:38 -07:00
Jingyu Zhou	e8977aeb98	Remove clusterControllerDcId check This is no longer needed since it'll be set in the ctor.	2019-03-22 18:01:54 -07:00
Evan Tschannen	82bc447e29	startRatekeeper is responsible for updating serverDBInfo	2019-03-22 17:56:16 -07:00
Evan Tschannen	82c80c225d	make sure id_worker is updated before setting ratekeeper or data distribution	2019-03-22 17:08:54 -07:00
Evan Tschannen	6a9c9d79cc	Update fdbserver/ClusterController.actor.cpp	2019-03-22 17:00:58 -07:00
Evan Tschannen	70b1c88cdd	Update fdbserver/ClusterController.actor.cpp	2019-03-22 17:00:52 -07:00
Jingyu Zhou	16f54577ee	Restore master PID in cluster controller worker registration CC may think master failed and clear the master PID, which can block both data distributor and ratekeeper recruitment. Fix by restoring it during worker registration.	2019-03-22 14:53:05 -07:00
A.J. Beamon	fc48b6050e	When tabulating read workload metrics, ignore the absence of any particular storage server.	2019-03-22 14:22:22 -07:00
Evan Tschannen	78f7a2e40b	fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop	2019-03-22 14:13:58 -07:00
A.J. Beamon	4eb5715689	Add support for a client or worker having multiple issues.	2019-03-22 08:29:41 -07:00
Jingyu Zhou	da338c3ad6	Avoid unnecessary recuriting of DD or RK While waiting for recruting data distributor or ratekeeper, a previous one could already joined. So we can skip this unnecessary recruiting. Revert the change of worker.actor.cpp for ratekeeper. Instead, recruiting ratekeeper should avoid the process with an existing one. This fixes a bug where the ratekeeper interface became zombie, killing other healthy ratekeeper but doing no useful work. Found by: -r simulation --crash -f tests/fast/WriteDuringRead.txt -s 31858110 -b on	2019-03-21 22:40:07 -07:00
Evan Tschannen	fe4464e786	fix: processClassFitness could be wrong if the client changed their class while rebooting	2019-03-21 17:56:04 -07:00
Jingyu Zhou	299961aecb	Move ratekeeper or data distributor from excluded servers	2019-03-21 17:17:33 -07:00
Evan Tschannen	3ced178348	maxVersionDifference is a copy of a knob which is a double	2019-03-21 12:58:48 -07:00
Jingyu Zhou	48324ad4be	Fix a race during ratekeeper registration When a ratekeeper registers, the monitorRatekeeper wakes up and recruits a new ratekeeper. Adding a 0s delay to avoid this. If a ratekeeper is recruited on an existing machine, update the interface so that the cluster controller can clear the ratekeeperID.	2019-03-21 12:56:56 -07:00
Evan Tschannen	e692f0f70f	fix: degraded is only used for tlog recruitment, so we should not use it in the fitness calculation for other roles	2019-03-21 11:23:49 -07:00
Jingyu Zhou	8edefda193	Fix test stuck due to invalid worker in cluster controller Test case: -r simulation --crash -f ./tests/rare/CloggedCycleWithKills.txt -s 688927581 -b off	2019-03-20 22:24:01 -07:00
Evan Tschannen	59abd8f3d8	fix: make sure recoveryLocation is always a valid page	2019-03-20 18:12:56 -07:00
Evan Tschannen	3730142fcc	fix: after a rollback, uncommitted changes to the byte sample could be missed	2019-03-20 18:10:26 -07:00
Jingyu Zhou	937b6dde31	Fix a race of DD, RK, Master failure If all DD, RK, Master run on the same process and failed. Recruiting of new DD or RK could try to use the old master worker interface, which is an invalid one and causes recruitment to be stuck. Fix by adding a delay and checking master is valid before recruitment.	2019-03-20 16:19:20 -07:00
Evan Tschannen	2ed1d58d16	fix: change the location where stopped is checked, because a yield could cause cause stopped to be set after the existing check	2019-03-20 14:28:32 -07:00
Jingyu Zhou	ce5c6d18d2	Fix ratekeeper recruitment bug	2019-03-20 14:22:22 -07:00
Jingyu Zhou	86b687981b	Fix ratekeeper and data distributor recruiting bug Avoid multiple concurrent recuriting of ratekeepers with a recruiting flag. Fix endless recruiting when the chosen worker is a proxy or a resolver -- prefer master in this case.	2019-03-20 10:00:31 -07:00
Evan Tschannen	5a00f567be	fix CheckSatelliteTagLocation	2019-03-20 09:30:11 -07:00
Jingyu Zhou	474abd81bd	Move placement monitoring inside doCheckOutstandingRequests	2019-03-19 22:48:21 -07:00
Evan Tschannen	2605257737	Merge branch 'master' of github.com:apple/foundationdb	2019-03-19 18:47:29 -07:00
Evan Tschannen	f9aad46573	made use_provisional_proxies a transaction option	2019-03-19 18:44:37 -07:00
Evan Tschannen	20764efa24	Merge pull request #1320 from bnamasivayam/dc-as-satellite-config Support config where the primary and remote DC's can be used as satel…	2019-03-19 15:49:24 -07:00
Balachandar Namasivayam	f9560e1abd	Addressed Review Comments	2019-03-19 15:23:14 -07:00
Jingyu Zhou	bc6fdaea3e	Recruit a new ratekeeper before halting the old	2019-03-19 15:21:46 -07:00
Evan Tschannen	5b9c45ea0b	clients do not attempt to connect to provisional proxies	2019-03-19 13:37:50 -07:00
Jingyu Zhou	0fb6a03c07	First round of review comment fixes for PR#1307	2019-03-19 11:29:19 -07:00
A.J. Beamon	2d7b48dadc	Merge pull request #1311 from etschannen/feature-increase-grv-batch Increased the GRV client batch size	2019-03-19 08:23:05 -07:00
A.J. Beamon	7f4adcc338	Merge pull request #1314 from etschannen/feature-ssd-memory-spill configure memory now selects the ssd engine for transaction log spilling	2019-03-19 08:22:22 -07:00
Vishesh Yadav	fea18e7be0	fix: fdbserver segfault when started with wrong arguments Public address is required for roles FDBD, NetworkTestServer and Restore only. Therefore, check those cases, and for others follow the earlier behaviour of using default ip address 0. FIXES #1305	2019-03-19 02:05:11 -07:00
Evan Tschannen	2554fed965	reduce max transaction to start	2019-03-18 16:16:03 -07:00
Evan Tschannen	87e2a1a029	The proxy budget is implemented to let one request over its limit through, and then pay back what was over the limit in the next update	2019-03-18 16:09:57 -07:00
Alex Miller	b11ecb3210	Remove random bits of code that were either unneeded or leftover from debugging.	2019-03-18 15:47:20 -07:00
Evan Tschannen	eb54a700ba	changed the old memory configuration to memory-1	2019-03-18 15:10:04 -07:00
Alex Miller	37ea71b117	Implement limiting how many bytes recovery will read. This time, track what location in the DiskQueue has been spilled in persistent state, and then feed it back into the disk queue before recovery. This also introduces an ASSERT that recovery only reads exactly the bytes that it needs to have in memory.	2019-03-18 15:09:43 -07:00
Alex Miller	29ab7370cd	Clear versionLocation when spilling, and pop DQ separately. Popping the disk queue now requires potentially recovering the location to which we can pop from the spilled data itself, and for each tag we must maintain the first location with relevant data. The previous queue we had to represent the ordering, queueOrder, was used by spilling, and popped when a TLog had been spilled. This means that as soon as a TLog has been fully spilled, we have no idea how it relates in order to other fully spilled TLogs. Instead, use queueOrder to keep track of all the TLog UIDs until they're removed, and use spillOrder to keep track of the order only for spilling.	2019-03-18 15:09:22 -07:00
Jingyu Zhou	8d609eb51d	Protect ratekeeper registration race during recruitment This is similar one to DataDistributor.	2019-03-18 13:53:50 -07:00
Balachandar Namasivayam	5471725db5	Support config where the primary and remote DC's can be used as satellites.	2019-03-18 12:17:59 -07:00
Jingyu Zhou	2b41a97a6e	Fix the issue of slow dying Data Distributor Test with: -r simulation -f ./foundationdb/tests/slow/CommitBug.txt -s 67828576 -b on The test has the following event sequence: - Time 113.3s, CC noticed DD failure, cleard DD interface. - 1s later, DD rejoined and registered with CC. - Time 131.7s, DD actor cancelled. This old DD raced to register with CC and the failure monitor is not installed because monitorDataDistributor is stalled waiting for new DD. - Time 161.4s, new DD running. New DD recruting was delayed due to no servers in the period. Fix by disabling DD registration during the recruting process.	2019-03-17 22:19:23 -07:00
Evan Tschannen	44e25e219c	do not suppress KeyValueStoreMemory_OutOfSpace in simulation	2019-03-17 00:35:48 -07:00
Evan Tschannen	ec6c843124	increased the GRV client batch size, similarly increased the proxy limits related to the number of transactions started in a batch	2019-03-16 16:18:58 -07:00
Stephen Atherton	f88e53e640	Merge branch 'master' of https://github.com/apple/foundationdb into fix-parent-directory	2019-03-16 00:13:09 -07:00
Stephen Atherton	2efb6f4c0d	Added cleanPath() which puts a path in a canonical form without .., ., or duplicate separators without using the filesystem or resolving symbolic links. absPath() redefined to use cleanPath() so it will return the same result for a path without symbolic links regardless of whether or not the path actually exists. Redefined parentDirectory() to use absPath() and error on certain inputs. Added comments describing behavior of these functions, and added a unit test which verbosely tests many inputs to them.	2019-03-15 23:54:33 -07:00
Jingyu Zhou	254c78053c	Fix a segfault error After wait, ServerDBInfo may have changed. Using the old copy is wrong.	2019-03-15 22:11:13 -07:00
Alex Miller	7f5bc2981f	Checksum DiskQueue pages on read, but at a lower priority. If a server has its data spilled, then it's behind the 5s window. Feeding it data is less important than committing, so we can hide the extra CPU usage from checksumming the read amplified disk queue pages.	2019-03-15 21:01:19 -07:00
Alex Miller	ee4721a63f	Make checking or ignoring checksums part of the IDiskQueue::read API.	2019-03-15 21:01:18 -07:00
Alex Miller	81c59e88a8	Persist the protocol version of a TLog instance when it is created. This allows us to do easy upgrades of SpilledData in the future, if the need arises, because we then have a protocol version to compare against.	2019-03-15 21:01:17 -07:00
Alex Miller	bf247eeed0	If TLogVersion >= 3, use crc32c for the DiskQueue hash for TLogs. We don't have a forward compatibility story for the memory storage engine, so its DiskQueue will still be hashlittle2 until one exists.	2019-03-15 21:01:16 -07:00
Alex Miller	686b097397	Remove verification code from DiskQueue and TLogServer.	2019-03-15 21:01:15 -07:00
Alex Miller	bdd7d5d3df	Initialize firstPages with 0xFF. There's various ASSERT()'s that assume firstPages is empty, and enforces things about `seq`. Some of these asserts have spuriously passed, since uninitialized pages look like they have a `seq` of 0, which would be the beginning of the disk queue. Now they'll look like the end of the disk queue, which is far easier to fail on.	2019-03-15 21:01:14 -07:00
Alex Miller	77f596743f	Bump persistFormat in TLogServer to differ from OldTLogServer* Though this format is being deprecated in favor of an eventual plumbing through of TLogVersion, we should probably bump it anyway. And also remove the fallback to OldTLogServer code. It should never be executed, as OldTLogServer_6_0 is entirely relied upon to execute OldTLogServer_4_6.	2019-03-15 21:01:13 -07:00
Alex Miller	4f98634f59	Add LogId to all TLog TraceEvents that have it.	2019-03-15 21:01:12 -07:00
Jingyu Zhou	12ddd56698	Fix Ratekeeper and DataDistributor placement Make sure both RateKeeper and DataDistributor are placed in the same data center as the Master. Make sure only one RateKeeper is live in the cluster as well.	2019-03-15 17:09:28 -07:00
Jingyu Zhou	bb5686eb75	Fix monitoring of DD and RK	2019-03-15 16:02:17 -07:00
Jingyu Zhou	9f6fe5f649	Merge remote-tracking branch 'apple/master' into ratekeeper	2019-03-15 11:30:04 -07:00
Jingyu Zhou	40860e0093	Attempt to fix.	2019-03-15 11:29:04 -07:00
A.J. Beamon	85b3f11e71	Fix various compiler warnings	2019-03-15 10:34:57 -07:00
Stephen Atherton	126252a274	Changed checksum to crc32. Disabled pager housekeeping for now. Added more btree read/write/commit metrics. Changed readPage to use disk read priority. Bug fix in CommitSubtree causing it to recurse to children unnecessarily. Added point read speed test at the end of set performance unit test.	2019-03-15 00:46:09 -07:00
Jingyu Zhou	9e59c9c253	Check DataDistributor and RateKeeper fitness Fail the test if they are not put in the best fitness.	2019-03-14 16:14:57 -07:00
Jingyu Zhou	99d521ef4f	Monitor Ratekeeper and DataDistributor to use stateless processes Since Ratekeeper and DataDistributor are no longer running with Master, they might be running with stateful processes before a new Master becomes alive, which is undesirable. This PR adds a monitoring of both Ratekeeper and DataDistributor at Cluster Controller -- if Master runs on a stateless class and RK/DD runs at a worse class, then RK/DD will be killed. I.e., RK/DD should be running at their own classes or on the same stateless process as Master. After restart, RK/DD should be running at a better process class.	2019-03-14 15:00:57 -07:00
Balachandar Namasivayam	2ac07fe7e0	Merge pull request #1248 from satherton/feature-backup-json JSON output options for fdbbackup status and describe	2019-03-14 13:41:28 -07:00
Meng Xu	5a10bf5dfc	Merge branch 'master' into mengxu/tls-switch-status-PR	2019-03-14 10:35:12 -07:00
Meng Xu	e30e2af1f3	ClientKnobs: Add CHECK_CONNECTED_COORDINATOR_NUM_DELAY	2019-03-13 16:54:56 -07:00
Evan Tschannen	e7d1f9e5f1	fixed review comments	2019-03-13 15:59:03 -07:00
Evan Tschannen	7f48025348	optimize confirm epoch alive	2019-03-13 14:47:17 -07:00
Steve Atherton	dbacfcbc82	Merge branch 'master' into feature-backup-json	2019-03-13 13:30:45 -07:00
Evan Tschannen	a2108047aa	removed LocalitySetRef and IRepPolicyRef typedefs, because for clarity the Ref suffix is reserved for arena allocated objects instead of reference counted objects.	2019-03-13 13:14:39 -07:00
Evan Tschannen	e068c478b5	merge master	2019-03-12 18:31:25 -07:00
Steve Atherton	8aab719c22	Merge branch 'master' into feature-backup-json	2019-03-12 18:23:16 -07:00
Evan Tschannen	a7e45cff91	Merge pull request #1176 from jzhou77/ratekeeper Make Ratekeeper a separate role	2019-03-12 15:58:59 -07:00
Meng Xu	85c24b0067	Merge branch 'master' into mengxu/tls-switch-status-PR	2019-03-12 15:20:54 -07:00
Evan Tschannen	5392742902	fixed review comments	2019-03-12 14:38:54 -07:00
Evan Tschannen	c5a18945b6	Merge pull request #1260 from vishesh/task/tls-upgrade Allows cluster string to contain coordinators with different TLS states	2019-03-12 13:45:08 -07:00
A.J. Beamon	a25e224cda	Merge pull request #1213 from etschannen/feature-metadata-version Added a metadata version key	2019-03-12 13:36:33 -07:00
Jingyu Zhou	2b0139670e	Fix review comment for PR 1176	2019-03-12 12:02:30 -07:00
Stephen Atherton	f0eae0295f	Merge branch 'master' of https://github.com/apple/foundationdb into feature-backup-json	2019-03-12 03:35:03 -07:00
Stephen Atherton	e9b8bf601e	Added backup status JSON output to backup workload to get sim coverage.	2019-03-12 03:34:38 -07:00
Balachandar Namasivayam	880e8643d1	Fix Windows link errors	2019-03-11 17:49:03 -07:00
Meng Xu	46f4b02807	TLS Status: Resolve review comments Use connectedCoordinatorsNumDelayed to reduce the load on cluster controller; Set connectedCoordinatorsNum to null by default for monitorLeader()	2019-03-11 17:10:08 -07:00
Evan Tschannen	5873705228	tlog commits very rarely take an additional 6 seconds	2019-03-11 12:11:17 -07:00
Meng Xu	435e515985	Merge branch 'master' into mengxu/tls-switch-status-PR	2019-03-11 11:17:40 -07:00
Evan Tschannen	80c3f2f8e2	added status fields detailing which processes are degraded, and also the total number of degraded processes	2019-03-10 22:58:15 -07:00
Evan Tschannen	c6e94293bf	reset a process to not be degraded after 2 days	2019-03-10 22:39:21 -07:00
Evan Tschannen	2627bcd35e	Merge branch 'master' into feature-metadata-version	2019-03-10 21:13:28 -07:00
Evan Tschannen	1be9ae5ce3	fixed merge conflict	2019-03-08 22:51:06 -05:00
Evan Tschannen	044b6b4f8a	Merge branch 'master' into feature-degraded-tlog # Conflicts: # fdbserver/ClusterController.actor.cpp	2019-03-08 22:50:41 -05:00
mpilman	ebffe8c633	print correct pahes in alloc instrumentation	2019-03-08 15:03:17 -08:00
Evan Tschannen	45fe6b369b	tlog recruitment will prefer non-degraded processes, however it will not choose less than desired number of tlogs to avoid degraded processes better master exists will switch the master to avoid degraded processes	2019-03-08 14:40:00 -05:00
Evan Tschannen	53f16b5347	when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded	2019-03-08 11:46:34 -05:00
Evan Tschannen	710a64dc4e	replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails	2019-03-08 11:25:07 -05:00
mpilman	2537f26de6	First implementaion of more user-friendly cpack Up unto here this code is only very rudiemantery tested. This is a firest attempt of making cpack more user-friendly. The basic idea is to generate a component for package type so that we can have different paths depending on whether we build an RPM, a DEB, a TGZ, or a MacOS installer. The cpack package config file will then chose the correct components to use. In a later point this should make it possible to build these with `make packages` and the ugly iteration with calling cmake between each package would be obsolete. While this solution is a bit more bloated, it is also much more flexible and it will be much easier to use. Another benefit is, that this will get rid of all warnings during a cpack run	2019-03-07 16:49:29 -08:00
Jingyu Zhou	cdfe906c30	Data distributor pulls batch limited info from proxy Add a flag in HealthMetrics to indicate that batch priority is rate limited. Data distributor pulls this flag from proxy to know roughly when rate limiting happens. DD uses this information to determine when to do the rebalance in the background, i.e., moving data from heavily loaded servers to lighter ones. If the cluster is currently rate limited for batch commits, then the rebalance will use longer time intervals, otherwise use shorter intervals. See BgDDMountainChopper() and BgDDValleyFiller() in DataDistributionQueue.actor.cpp.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	f43277e819	Format Ratekeeper.actor.cpp code	2019-03-07 13:16:20 -08:00
Jingyu Zhou	dc129207a9	Minor fix after rebase.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	835cc278c3	Fix rebase conflicts.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	7340998261	Fix status message for ratekeeper	2019-03-07 13:16:20 -08:00
Jingyu Zhou	517966fce2	Remove lastLimited from rate keeper Refactor code to make IDE happy.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	d52ff738c0	Fix merge conflicts during rebase.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	b2ee41ba33	Remove lastLimited from data distribution Fix a serialization bug in ServerDBInfo, which causes test failures.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	36a51a7b57	Fix a segfault bug due to uncopied ratekeeper interface	2019-03-07 13:16:20 -08:00
Jingyu Zhou	e6ac3f7fe8	Minor fix on ratekeeper work registration.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	3c86643822	Separate Ratekeeper from data distribution. Add a new role for ratekeeper. Remove StorageServerChanges from data distribution. Ratekeeper monitors storage servers, which borrows the idea from DataDistribution.	2019-03-07 13:16:20 -08:00
Andrew Noyes	27d199409e	Add KillRegion.actor.cpp workload to cmake	2019-03-07 12:14:42 -08:00
Balachandar Namasivayam	9e4c780baa	Merge pull request #1249 from xumengpanda/mengxu/status/teamcollection-info Status:healthy: Add optimizing_team_collections	2019-03-07 11:44:24 -08:00
Balachandar Namasivayam	f3391ea413	Merge pull request #1240 from satherton/feature-restore-by-timestamp Restore by timestamp	2019-03-06 16:21:06 -08:00
Vishesh Yadav	ed49d603a0	Allows cluster string to contain coordinators with different TLS states During live TLS upgrades, we can hence switch one coordinator at a time to TLS than all of them together.	2019-03-06 16:05:10 -08:00
Meng Xu	845f8fdcbc	Status:healthy: Add optimizing_team_collections Change removing_redundant_teams status name to optimizing_team_collections. The new name is more general and can be applied in the future when we switch storage engines.	2019-03-06 15:05:23 -08:00
Meng Xu	04880e3d4d	Merge branch 'master' into mengxu/tls-switch-status-PR	2019-03-06 13:41:16 -08:00
Stephen Atherton	7778112f6a	Bug fix, restore was using the destination cluster to look up timestamps when printing the backup description instead of (optionally) the original cluster which generated the backup. Made missing cluster file errors more clear.	2019-03-06 02:45:55 -08:00
Alex Miller	c6a65389ae	Remove noexcept macro and replace with BOOST_NOEXCEPT. BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a way that is correctly maintained over time.	2019-03-05 22:06:12 -08:00
Alex Miller	af617d68e6	boost 1.52.0 -> 1.67.0 in all vcxproj files	2019-03-05 22:06:12 -08:00
Meng Xu	820548223a	Status: connected_coordinators misc minor changes Change the rst document file; Change the coding style to be consistent with the nearby code; Ensure we always initilize the connectedCoordinatesNum to 0 even when the variable is not used.	2019-03-05 21:45:18 -08:00
Meng Xu	b7a52e81e2	Status: Count connected coordinators per client A client will always try to connect all coordinators. This commit let Status track the number of connected coordinators for each client. This allows us to do canary in coordinators. For example, when we switch from non-TLS to TLS, we can switch 1 coordinator from non-TLS to TLS. This can help check if a client has the ability to connect through TLS. We can make the non-TLS to TLS switch for each coordinators one by one. This avoid the risk of losing connection in the switch.	2019-03-05 21:21:23 -08:00
Alex Miller	ad0aca21b5	Update fdbserver/fdbserver.vcxproj Co-Authored-By: atn34 <anoyes34@gmail.com>	2019-03-05 18:03:57 -08:00
anoyes	981426bac9	More ide fixes	2019-03-05 18:03:57 -08:00
Evan Tschannen	82d957e0bb	Merge pull request #1178 from vishesh/task/issue-963-IPv6 IPv6 Support	2019-03-05 17:14:16 -08:00
Vishesh Yadav	a9562f61be	fix: missing argument to printf in fdbserver	2019-03-05 14:03:09 -08:00
Steve Atherton	21f55e1878	Merge pull request #1190 from bnamasivayam/restore-multiple-ranges Add support for restoring multiple ranges.	2019-03-05 10:15:55 -08:00
Evan Tschannen	f1897f3eb6	Merge branch 'master' into feature-metadata-version # Conflicts: # fdbclient/NativeAPI.actor.cpp	2019-03-04 21:06:16 -08:00
Evan Tschannen	69d7633d5b	Merge pull request #1217 from alexmiller-apple/tstlog-goodref Spill-By-Reference TLog Part 4: Actually Usable Reference Spilling	2019-03-04 20:58:24 -08:00
Evan Tschannen	3d196c9e97	fix: metadataVersionKey log range removal needs to be checked for each logDestination	2019-03-04 20:56:31 -08:00
Evan Tschannen	988add9fb5	cache the metadataVersion for commits, so that doing setVersion with the commit version of a different transaction will	2019-03-04 16:48:34 -08:00
Meng Xu	c0535c49bb	Status: TLS client status Use ClientStatusInfo structure for each network address (client), instead of passing each status info as a parameter.	2019-03-04 16:35:10 -08:00
Trevor Clinkenbeard	89cbb77b4e	Merge branch 'master' of https://github.com/apple/foundationdb into lazily-fetch-health-metrics	2019-03-04 14:17:58 -08:00
Trevor Clinkenbeard	56ae46f89e	Client lazily fetches health metrics from proxies	2019-03-04 14:16:39 -08:00
Vishesh Yadav	e93cd0ff21	Add some checks and comments to IPv6 changes #963	2019-03-04 14:12:45 -08:00
Vishesh Yadav	592e224155	net: add/use formatIpPort to format IP:PORT pairs #963	2019-03-04 14:12:45 -08:00
Vishesh Yadav	cc9ad0e202	net: Use IPv6 in simulation testing #963 25% times we will use IPv6 addresses	2019-03-04 14:12:45 -08:00
Vishesh Yadav	57832e625d	net: Support IPv6 #963 - NetworkAddress now contains IPAddress object which can be either IPv4 or IPv6 address. 128bits are used even for IPv4 addresses, however only 32bits are used when using/serializing IPv4 address. - ConnectPacket is updated to store IPv6 address. Backward compatible with old format since the first 32bits of IP address field is used for serialization of IPv4. - Mainly updates rest of the code to use IPAddress structure instead of plain uint32_t. - IPv6 address/pair ports should be represented as `[ip]:port` as per convention. This applies to both cluster files and command line arguments.	2019-03-04 14:12:41 -08:00
Alex Miller	baa3e1af2c	Replace `/sizeof(Page)*sizeof(Page)` with `pageFloor()`.	2019-03-04 01:42:39 -08:00
Alex Miller	ee64b43366	Change DQ shrink logic to consider "active" bytes rather than file size. We know what the current ideal size of the DQ file should be, so we should use it.	2019-03-04 01:42:39 -08:00
Alex Miller	244903a9de	Spill txsTag by value under TagMsg/ and not TagMsgRef/ There's not a tremendous reason as to why this matters now, but I feel like I might regret sometime later not keeping the same schema under the same key.	2019-03-04 01:42:39 -08:00
Alex Miller	72c2cf11ab	Replace ResourceLimiter with FlowLock.	2019-03-04 01:42:38 -08:00
Alex Miller	94bf75cb00	Allow the disk queue to shrink if it has unneeded slack space.	2019-03-04 01:42:38 -08:00
Alex Miller	52d5a721a6	Don't allocate 2x the memory for a read to save 1% of allocated memory.	2019-03-04 01:42:38 -08:00
Alex Miller	aff9ebe21a	Spill (start,length) instead of (begin,end) to save a few bytes.	2019-03-04 01:42:38 -08:00
Alex Miller	2aa527c0ef	Fix a bug resulting from concurrent TLog changes. TLogServer was forked into OldTLogServer_6_0 at the same time that `3247d594` modified TLogServer, so the modification never made it into OldTLogServer_6_0, resulting in a rare failure. Manual code inspection revealed that there was also `78976161` that concurrently modified TLogServer, so that change was copied to OldTLogServer_6_0 as well.	2019-03-04 01:42:38 -08:00
Alex Miller	fb4cb8c3a8	Print out configuration changes in ConfigureTest.	2019-03-04 01:42:38 -08:00
Alex Miller	9ef283d4e7	Implement hard limiting of memory used to serve peek requests.	2019-03-04 01:42:38 -08:00
Alex Miller	e3506ad9af	Add a yield to parseMessagesForTag	2019-03-04 01:42:38 -08:00
Alex Miller	742f6e1847	Solve overreading via pre-calculating tag bytes per commit	2019-03-04 01:42:38 -08:00
Alex Miller	e7d8520c63	Batch more when spilling data.	2019-03-04 01:42:38 -08:00
Alex Miller	71a794ccc3	Re-enable spill-by-reference testing.	2019-03-04 01:42:38 -08:00
Alex Miller	04e1170c88	Spill txsTag by value	2019-03-04 01:42:38 -08:00
Alex Miller	4d4e0a1d54	Fix the build on -O0. C++ < 17 requires definitions of declared static constexpr variables.	2019-03-04 01:42:38 -08:00
Alex Miller	db546af4a3	Fix the build on -O0. C++ < 17 requires definitions of declared static constexpr variables.	2019-03-04 01:38:58 -08:00
Evan Tschannen	075fdef31a	Merge branch 'master' into feature-metadata-version # Conflicts: # fdbclient/DatabaseContext.h	2019-03-03 22:58:45 -08:00
Evan Tschannen	057ebe56e4	fix: unknownCommit handling relied on soleOwnership of the version stamp keys, so we need to use a second key to track the commit version for the metadataVersionKey renamed a confusing option	2019-03-03 21:31:40 -08:00

... 4 5 6 7 8 ...

2034 Commits