Commit Graph

2034 Commits

Author SHA1 Message Date
Evan Tschannen cacd82758e Reduced data distribution speeds 2019-04-26 13:54:49 -07:00
Evan Tschannen 9ff8aca1da Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation) 2019-04-26 13:53:56 -07:00
Evan Tschannen 1f37f82b87 invalid knob overrides do not prevent fdbserver from starting 2019-04-25 17:08:13 -07:00
Evan Tschannen 6c77864731 separate GetStorageServerRejoinInfoRequest from GetKeyServerLocationsRequest, to avoid yielding for the rejoin requests 2019-04-25 17:07:35 -07:00
Alex Miller 797d431934 Add an \xff keyrange that is backed by the txnStateStore. 2019-04-25 17:04:20 -07:00
Trevor Clinkenbeard d339becd7c Fix currentRate calculation for local ratekeeper 2019-04-25 15:35:34 -07:00
Jingyu Zhou 5462f560e7 Add pseudo locality for log routers and tlogs
This changes the logic of pop operations from log routers (LG):
- LG pops tagLocalityLogRouterMapped from TLogs;
- TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before
  popping.

Later when we add more psuedo localities, the same pattern can be used.
2019-04-23 21:35:56 -07:00
A.J. Beamon 253d2400ef Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-04-23 14:38:52 -07:00
A.J. Beamon ea7abff9df Clean up from review 2019-04-23 14:16:52 -07:00
A.J. Beamon 4ad0496b39 Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process. 2019-04-23 14:01:51 -07:00
Stephen Atherton df0548503d Merge branch 'release-6.1' of https://github.com/apple/foundationdb into sqlite-grow-bigger 2019-04-23 13:43:58 -07:00
A.J. Beamon e0f76edf77
Merge pull request #1471 from AlvinMooreSr/release-6.1-merge
Merge Release 6.1 Into Master
2019-04-23 11:08:21 -07:00
Stephen Atherton 83db547306 Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES. 2019-04-23 04:50:58 -07:00
Evan Tschannen e0f7ec96aa Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers 2019-04-22 17:29:46 -07:00
Jingyu Zhou 439d5a3843 Use emplace_back instead of push_back in Proxy 2019-04-22 14:03:48 -07:00
Jingyu Zhou d2b215b926 Refactor tag population of ServerCacheInfo 2019-04-22 11:55:04 -07:00
Jingyu Zhou 7cb61c766b Fix tLogLocalities for current LogSet
In toCoreState(), the serialization of current LogSet is different from old
TLog sets. The locality data should be generated, not copied over.

Found by:
-r simulation --crash -f tests/fast/KillRegionCycle.txt -s 254666356 -b on
2019-04-21 10:41:07 -07:00
Jingyu Zhou 8b67da57bb Fix upgrade test failure
Serialize pseudoLocalities if protocol version is larger than 0x0FDB00B061060001LL.
Note this version may need to be changed to "currentProtocolVersion" when merging
into the master, and "currentProtocolVersion" should be incremented.
2019-04-21 10:41:07 -07:00
Jingyu Zhou 9e8ffd2ff7 Refactor OldLogData ctor 2019-04-21 10:41:07 -07:00
Jingyu Zhou 97986a28b7 Replace push_back with emplace_back for efficiency
And better code readability.
2019-04-21 10:41:07 -07:00
Jingyu Zhou 010f825aff Remove pseudoLocalities from LogSet, TLogSet, and CoreTLogSet 2019-04-21 10:41:07 -07:00
Jingyu Zhou 7befce6bf1 More pseudoLocalities and refactors. 2019-04-21 10:41:07 -07:00
Jingyu Zhou 66000a07a5 Use emplace_back instead of push_back 2019-04-21 10:41:07 -07:00
Jingyu Zhou 966ec30fcc Add pseudoLocalities for special tag consumers 2019-04-21 10:41:07 -07:00
Jingyu Zhou b4e7e7a85b Refactor StorageCache updates 2019-04-21 10:41:07 -07:00
Jingyu Zhou 82ec80c42f Refactor TLogSet ctor 2019-04-21 10:41:07 -07:00
Jingyu Zhou d19b0cf1c1 Refactor LogSet with two new constructors 2019-04-21 10:41:07 -07:00
Jingyu Zhou 0b1984978a Small code refactoring. 2019-04-21 10:41:07 -07:00
Jingyu Zhou ec1bc5cfca Add LogSystemType enum 2019-04-21 10:41:07 -07:00
Jingyu Zhou 6870e132b2
Merge branch 'master' into pprof 2019-04-19 14:06:44 -07:00
Andrew Noyes d1e86779a6 Address review comments 2019-04-18 08:48:27 -07:00
Andrew Noyes 5af8208c62 Fix JavaWorkload unused variable 2019-04-17 16:29:22 -07:00
Andrew Noyes ef04471a66 Fix more unused-variable warnings 2019-04-17 16:04:10 -07:00
Alvin Moore 2bea99591e Merge branch 'release-6.1' of copy of master
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-04-17 15:51:48 -07:00
Andrew Noyes 13ba915a19 Fix more unused variable warnings 2019-04-17 15:38:08 -07:00
A.J. Beamon 43533b3d72 Don't validate the shard size estimate unless enough keys are sampled with a less than 100% probability. 2019-04-17 11:01:23 -07:00
Trevor Clinkenbeard 3426205167 Fixed readGuard usage bug 2019-04-16 15:05:57 -07:00
Trevor Clinkenbeard 1d921da170 readGuard sends server_overloaded error if request is rejected 2019-04-16 11:29:01 -07:00
Trevor Clinkenbeard 8a7d9afbe9 Fixed name of LocalRatekeeperWorkloadFactory 2019-04-16 10:36:09 -07:00
Trevor Clinkenbeard 0594154644 Fixed getPenalty calculation 2019-04-16 10:17:41 -07:00
Andrew Noyes baa3e806ef Address review comments from #1446 2019-04-16 09:48:15 -07:00
Andrew Noyes 6207d724f8 Fix all -Wunused-variable warnings 2019-04-15 18:13:00 -07:00
Evan Tschannen cd5c9d91fa
Merge pull request #1443 from etschannen/master
Merge 6.1 into master
2019-04-10 17:43:07 -07:00
Balachandar Namasivayam 04e9aa6afd For small clusters that are growing quickly, it could happen that the rateLimit is set to a low value and it would take very long to read the entire database. Fix this by setting the rateLimit to the maximum allowed value if reading the entire database is taking a long time. 2019-04-10 17:13:37 -07:00
Jingyu Zhou ab834c4f7e Move profiling option help message to devhelp 2019-04-09 13:26:12 -07:00
Evan Tschannen 6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
Remove unused functions
2019-04-09 11:49:45 -07:00
A.J. Beamon 058d028099
Merge pull request #1301 from mpilman/features/cheaper-traces
Defer formatting in traces to make them cheaper
2019-04-09 10:11:04 -07:00
Evan Tschannen 21c0ba555c Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-04-08 18:38:42 -07:00
Evan Tschannen d126730b4d fixed a spurious test error where process_behind was treated as an error 2019-04-08 17:09:54 -07:00
A.J. Beamon 538b431656 Apply suggestions from code review 2019-04-08 14:55:58 -07:00
A.J. Beamon a7288e1325 Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate. 2019-04-08 14:21:24 -07:00
mpilman 789cd67bcd Don't compile JavaWorkload by default 2019-04-08 13:06:29 -07:00
mpilman c45fe8c697 Fixed typo 2019-04-08 11:33:45 -07:00
Trevor Clinkenbeard b286102d34 Update fdbserver/workloads/LocalRatekeeper.actor.cpp
Co-Authored-By: mpilman <markus@pilman.ch>
2019-04-08 11:06:17 -07:00
mpilman d2e74cb2c0 Fix stupid rounding error 2019-04-08 11:05:29 -07:00
mpilman aaa8f73bdc fixed missing refactoring code 2019-04-08 11:05:29 -07:00
mpilman bdba8e22eb Added test and bugfixes 2019-04-08 11:05:29 -07:00
mpilman b944e0b116 generalized read guards, allow for penalty+error 2019-04-08 11:04:44 -07:00
mpilman 207049e852 fixed serialization 2019-04-08 11:04:44 -07:00
mpilman 32393ec4c9 Prototype of local ratekeeper 2019-04-08 11:04:44 -07:00
Evan Tschannen 05869a8383 do not log a degraded reset message if the previous reset was more than a week ago 2019-04-07 23:00:58 -07:00
Jingyu Zhou 4b08042a88 Change memory profiling threshold to a flag 2019-04-05 16:33:51 -07:00
Andrew Noyes d7612a4426 Fix OPEN_FOR_IDE build errors 2019-04-05 16:30:42 -07:00
Jingyu Zhou 09b2c35d11 Dump heap profiler when memory usage is high
Set the threshold of dump to 2GB.
2019-04-05 16:12:23 -07:00
mpilman d01cbf3455 Addressed code review comments 2019-04-05 13:12:20 -07:00
mpilman 4287b1d2a1 resolved minor merge issues 2019-04-05 13:12:19 -07:00
A.J. Beamon 614a599a04 Update fdbserver/SimulatedCluster.actor.cpp
Co-Authored-By: mpilman <markus@pilman.ch>
2019-04-05 13:12:19 -07:00
mpilman 39ecbedd74 Fixed compilation errors on OS X & gcc8 2019-04-05 13:12:19 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
mpilman ea67b742c7 Implemented Traceable for printable types 2019-04-05 13:12:19 -07:00
mpilman bb82f8560a process all volatile ints correctly in traces 2019-04-05 13:12:19 -07:00
mpilman 02e3b634fb Compile sqlite with NDEBUG so we can debug 2019-04-05 13:12:19 -07:00
mpilman c008e16c81 Defer formatting in traces to make them cheaper
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:

- TraceEvent::detail now takes a c-string instead of std::string for
  literals. This prevents unnecessary allocations if the trace is not
  going to be printed in the first place (for example for SevDebug).
  Before that `detail` expected a `std::string` as key, which mean that
  any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
  specialized for any type that needs to be printed. The actual
  formatting will be deferred to after the `enabled` check. This
  provides two benefits: (1) if a TraceEvent is disabled, we don't pay
  for the formatting and (2) TraceEvent can trace types that it doesn't
  know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
  passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
  calls, a call to detail will only introduce a if-branch which is much
  cheaper than a function call.
2019-04-05 13:12:19 -07:00
Jingyu Zhou acf60c5e9a
Merge pull request #1414 from jzhou77/pprof
Add manually triggered heap profiling
2019-04-04 22:27:33 -07:00
Jingyu Zhou 5be592632b Change trace event message
If heap profiler is not running, we can't take a snapshot of the profile.
2019-04-04 15:29:50 -07:00
Jingyu Zhou f538df5e6c Add TraceEvent if unable to invoke heap profiler 2019-04-04 15:26:41 -07:00
Evan Tschannen 390ab9cfed A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy 2019-04-04 14:11:12 -07:00
Alex Miller 8f49be480b
Update fdbserver/worker.actor.cpp
Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>
2019-04-04 13:32:10 -07:00
Jingyu Zhou eaaf58ee34 Refactor profiler into cpu and heap profilers 2019-04-03 20:54:30 -07:00
Jingyu Zhou 3371cf22d4 Add manually triggered heap profiling
At client side:
fdb> profile
ERROR: Usage: profile <client|list|flow|heap>
fdb> profile heap 127.0.0.1:4500

On the server side:
$ HEAPPROFILE=/tmp/fdbserver bin/fdbserver -C ../test.cluster -p 127.0.0.1:4500
Starting tracking the heap
FDBD joined cluster.
Dumping heap profile to /tmp/fdbserver.0001.heap (1024 MB allocated cumulatively, 13 MB currently in use)
Dumping heap profile to /tmp/fdbserver.0002.heap (User triggered heap dump)
2019-04-03 16:00:54 -07:00
Markus Pilman 101a05ae77
Merge branch 'master' into features/client-simulator 2019-04-03 10:03:56 -08:00
Jingyu Zhou fc59587b3c
Merge pull request #1393 from jzhou77/pprof
Gperftools Profiling fix.
2019-04-03 10:35:31 -07:00
Evan Tschannen 39c595223b Merge branch 'release-6.1' 2019-04-02 22:30:02 -07:00
Evan Tschannen 30133a30e0
Merge pull request #1403 from etschannen/release-6.1
Ported a bug fix to the 6.0 log system, and updated documentation
2019-04-02 17:56:18 -07:00
Jingyu Zhou 56a1128a9b Enhance cmake's gperftools support
Add compiler flags and link flags for gperftools.
2019-04-02 17:34:29 -07:00
Evan Tschannen 31ed73d9f5 Ported the bug fix https://github.com/apple/foundationdb/pull/1379 to OldTLogServer_6_0 2019-04-02 15:27:37 -07:00
Evan Tschannen 1d4a6ab551 cleaned up status to keep the healthyZone read separated from relicaFutures 2019-04-02 14:46:56 -07:00
Evan Tschannen a38c396283 made all maintenance transactions lock aware 2019-04-02 14:27:48 -07:00
Evan Tschannen 628fec8c8b updated status with information about ongoing maintenance
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
mpilman 371a41dbba Allow classPath to be modified at runtime 2019-04-02 11:56:40 -07:00
mpilman e19901186f Fixed buggy register preparation for natives 2019-04-02 11:56:03 -07:00
Evan Tschannen 72203ba47a Merge commit '56f3f0b1bc60604f965152d856ae29a591227703' 2019-04-01 18:45:38 -07:00
Evan Tschannen 781cf9b5a0 added the ability to make a zoneId for maintenance in fdbcli 2019-04-01 17:55:13 -07:00
Evan Tschannen f5de52de91 fix: cancel the previous log system recruitment before calling newEpoch, to avoid multiple actors attempting to modify oldLogSystem at the same time 2019-04-01 16:38:25 -07:00
Jingyu Zhou 49fdc35e5e Gperftools Profiling fix.
Fix a bug and update gperftools compiling flags

The added flags are recommended by gperftools here:
https://github.com/gperftools/gperftools

Verified that heap profiles are saved with the following command:
HEAPPROFILE=/tmp/fdbserver fdbserver [args...]
2019-04-01 14:42:18 -07:00
mpilman b148981bba Fixed compilation issues with char* 2019-04-01 14:29:45 -07:00
Jingyu Zhou 47b4b82628
Merge branch 'master' into fix-unreferenced 2019-04-01 14:07:19 -07:00
Jingyu Zhou 3f76be8f45 Merge remote-tracking branch 'apple/master' into fix-unreferenced 2019-04-01 14:00:43 -07:00
Jingyu Zhou f7f8ddd894 Fix warnings on unused variables
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
mpilman e23e63c6ac Implemented JavaWorkload
This change allows a user to write a workload in Java.

The way this is implemented is by creating a JVM within the
simulator and calling the corresponding workload class. A
workload can then run in the simulator or on a testing cluster.

If the workload is executed within the simulator, the resulting
test will not be deterministic anymore as it will execute in a
different thread (and even without that it is not clear, whether
we could get determinism as the JVM does a lot of stuff that are
not deterministic).

This is intendet to get better testing of the Java client and
layer authors can use the simulator to test their layers on a single
machine but they can still simulate failing machines etc.
2019-03-31 17:57:43 -07:00
Evan Tschannen a46620fbee Merge branch 'release-6.1' 2019-03-30 17:59:28 -07:00
Evan Tschannen 8ebf771392 cleanup cluster controller trace events 2019-03-30 14:17:18 -07:00
Alex Miller e7ad39246c
Fix typo 2019-03-29 20:16:26 -07:00
Evan Tschannen a44ffd851e fix: the shared tlog could fail to update a stopped tlog’s queueCommitVersion to version if a second tlog registered before it could issue the first commit for the tlog 2019-03-29 20:11:30 -07:00
Evan Tschannen d882c060bf Merge commit '5dd6396eed0de0dfea6cf9eecc307995eff5cedc' 2019-03-28 18:00:55 -07:00
Balachandar Namasivayam 0bbdc15f71 Multi-test processes waits until a timeout if any of the tester processes restarts. Use getReplyUnlessFailedFor instead of getReply to detect the restarts and fail quickly instead of waiting for a timeout which is usually large. 2019-03-28 17:05:30 -07:00
Evan Tschannen b6008558d3 renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen 836bb95a7a
Merge pull request #1372 from etschannen/master
Merge 6.1 into master
2019-03-27 21:00:49 -07:00
Evan Tschannen 34b9d5e722
Merge pull request #1364 from etschannen/feature-fast-serialize
A few performance optimizations
2019-03-27 20:57:25 -07:00
Evan Tschannen e5a80f2c94 optimized IPaddress 2019-03-27 18:21:13 -07:00
Jingyu Zhou a55f06e082 Remove unused functions
Found with -Wunused-function flag.
2019-03-27 15:45:28 -07:00
Stephen Atherton 64554e90d4 Change this to THIS in actors for IDE compatibility. 2019-03-27 13:42:49 -07:00
Stephen Atherton d5c8b6b083 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
#	flow/flow.h
2019-03-27 13:37:15 -07:00
A.J. Beamon 91014d4529 Add file changes that I accidentally failed to commit; fix naming issue in worker. 2019-03-27 08:41:19 -07:00
A.J. Beamon 71e2fdafb8 Changes to ratekeeper camel case 2019-03-27 08:24:25 -07:00
A.J. Beamon d508658569 Make ratekeeper one word to match our existing convention 2019-03-27 08:15:19 -07:00
Jingyu Zhou 38c6681349 Fix some signed and unsigned mismatch warnings. 2019-03-26 14:54:11 -07:00
Jingyu Zhou c0b58080ee Fix type name warning for DDTeamCollection
Seen using 'class' now seen using 'struct' in DataDistribution.actor.cpp
2019-03-26 14:18:25 -07:00
Jingyu Zhou 7c02ee6fdd Fix compiler warning about unreferenced exception variable 2019-03-26 13:43:47 -07:00
Jingyu Zhou 466a59a99d Merge remote-tracking branch 'apple/release-6.1' into ratekeeper 2019-03-25 15:27:38 -07:00
Jingyu Zhou f57a22e2ed Add data distributor and ratekeeper to status output 2019-03-25 15:11:29 -07:00
Trevor Clinkenbeard 007abbc45b Added 96-byte FastAllocator
Since storage queue nodes account for a large portion of memory usage,
we can save space by only allocating 96 bytes instead of 128 bytes for
each node.
2019-03-25 13:44:39 -07:00
Evan Tschannen 5e03e178de
Merge pull request #1345 from ajbeamon/support-multiple-client-or-worker-issues
Add support for a client or worker having multiple issues.
2019-03-24 17:27:50 -07:00
Evan Tschannen d45159ebf7
Merge pull request #1307 from jzhou77/ratekeeper
Monitor placement of Ratekeeper and DataDistributor
2019-03-24 17:26:07 -07:00
Evan Tschannen d6ad027d37 ratekeeper needs to be recruited for proxies to make progress, so if one has not registered with the cluster controller by the time we are accepting commits, recruit a new one 2019-03-24 16:48:24 -07:00
Evan Tschannen f426d732ea fix: forgot to remove one location where id_used was incremented for distributor and ratekeeper 2019-03-24 16:04:59 -07:00
Evan Tschannen e8948726e8 once we recruit a ratekeeper, do not allow any other ratekeepers to register 2019-03-24 11:04:39 -07:00
Evan Tschannen 24c92a1870
Merge pull request #1352 from etschannen/feature-network-address-list
Changed NetworkAddressList to at most two addresses for performance
2019-03-24 10:22:38 -07:00
Evan Tschannen 50a4403661 fix: missing parathesis 2019-03-23 21:52:15 -07:00
Jingyu Zhou 40eec20252 Restore master PID in worker registration
This fix is lost during merge.
2019-03-23 21:02:11 -07:00
Jingyu Zhou 3ef26e6be3 Fix fitness assignment statements
Found by MacOS build.
2019-03-23 19:16:04 -07:00
Evan Tschannen 1fc6937802 changed NetworkAddressList to at most two addresses for performance 2019-03-23 17:54:46 -07:00
Evan Tschannen b51a24453e the data distributor and ratekeeper are not included in id_used, but when comparing equally good options we prefer to avoid sharing with those roles
excluded data distributor and ratekeeper were improperly killed when the best option was also excluded
2019-03-23 13:25:36 -07:00
Jingyu Zhou 10988f89d9 Code refactoring for ConsistencyCheck.actor.cpp 2019-03-23 11:06:43 -07:00
Jingyu Zhou fdc5b5ddbf Fix: spurious ratekeeper registration
A rare race condition:
-r simulation -f ./foundationdb/tests/slow/WriteDuringReadAtomicRestore.txt -s 114256311 -b on

- A is the ratekeeper.
- CC recruit B and B starts
- CC halts ratekeeper A and A is halted
- A registers back with CC, which then halts B. CC sets A to be the ratekeeper.

CC starts recruiting and finds A is the best machine. But skips recruiting
because CC thinks A is already used. Now the cluster is left with no ratekeeper.

Fix by disallowing ratekeeper registration with previous ID.
2019-03-23 11:03:51 -07:00
Jingyu Zhou 6523cd4931 Fix: recruit ratekeeper is not triggerred 2019-03-23 09:20:54 -07:00
Steve Atherton 09f37cf3d2
Merge pull request #533 from ajbeamon/fix-parent-directory
Fixes to parentDirectory() and abspath()
2019-03-22 23:53:46 -07:00
Evan Tschannen 2da46e3172 fix: halt if datacenters are different 2019-03-22 23:53:21 -07:00
Evan Tschannen b68bc46042
Merge pull request #1348 from ajbeamon/fix-missing-metrics-when-ss-down
Fix missing read workload metrics
2019-03-22 19:08:04 -07:00
Evan Tschannen d34c56c9a5 ensure that the processId exists in id_worker before accessing it 2019-03-22 18:54:39 -07:00
Balachandar Namasivayam ac8ad07b45 Address review comments. 2019-03-22 18:48:49 -07:00
Balachandar Namasivayam 4ed323ac52 Fixed bug and addressed review comments. 2019-03-22 18:48:49 -07:00
Balachandar Namasivayam d75020b44a Fix bug where accessing shared memory created by boost 1.52 leads to error when accessed by boost 1.67. 2019-03-22 18:48:49 -07:00
Evan Tschannen 36ab852bb1 Merge branch 'master' into ratekeeper
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-22 18:41:00 -07:00
Evan Tschannen 6254a1a8e4 fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop 2019-03-22 18:37:39 -07:00
Evan Tschannen 7dd1c1b60c fix: processClassFitness could be wrong if the client changed their class while rebooting 2019-03-22 18:37:39 -07:00
Evan Tschannen ddb6058770 simplified ratekeeper monitoring loop 2019-03-22 18:22:45 -07:00
Jingyu Zhou 12917d8c7d Add actors to store halt request futures
Address best fitness in checking better DD or RK.
2019-03-22 18:06:38 -07:00
Jingyu Zhou e8977aeb98 Remove clusterControllerDcId check
This is no longer needed since it'll be set in the ctor.
2019-03-22 18:01:54 -07:00
Evan Tschannen 82bc447e29 startRatekeeper is responsible for updating serverDBInfo 2019-03-22 17:56:16 -07:00
Evan Tschannen 82c80c225d make sure id_worker is updated before setting ratekeeper or data distribution 2019-03-22 17:08:54 -07:00
Evan Tschannen 6a9c9d79cc
Update fdbserver/ClusterController.actor.cpp 2019-03-22 17:00:58 -07:00
Evan Tschannen 70b1c88cdd
Update fdbserver/ClusterController.actor.cpp 2019-03-22 17:00:52 -07:00
Jingyu Zhou 16f54577ee Restore master PID in cluster controller worker registration
CC may think master failed and clear the master PID, which can block both data
distributor and ratekeeper recruitment. Fix by restoring it during worker
registration.
2019-03-22 14:53:05 -07:00
A.J. Beamon fc48b6050e When tabulating read workload metrics, ignore the absence of any particular storage server. 2019-03-22 14:22:22 -07:00
Evan Tschannen 78f7a2e40b fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop 2019-03-22 14:13:58 -07:00
A.J. Beamon 4eb5715689 Add support for a client or worker having multiple issues. 2019-03-22 08:29:41 -07:00
Jingyu Zhou da338c3ad6 Avoid unnecessary recuriting of DD or RK
While waiting for recruting data distributor or ratekeeper, a previous one
could already joined. So we can skip this unnecessary recruiting.

Revert the change of worker.actor.cpp for ratekeeper. Instead, recruiting
ratekeeper should avoid the process with an existing one. This fixes a bug
where the ratekeeper interface became zombie, killing other healthy ratekeeper
but doing no useful work. Found by:

-r simulation --crash -f tests/fast/WriteDuringRead.txt -s 31858110 -b on
2019-03-21 22:40:07 -07:00
Evan Tschannen fe4464e786 fix: processClassFitness could be wrong if the client changed their class while rebooting 2019-03-21 17:56:04 -07:00
Jingyu Zhou 299961aecb Move ratekeeper or data distributor from excluded servers 2019-03-21 17:17:33 -07:00
Evan Tschannen 3ced178348 maxVersionDifference is a copy of a knob which is a double 2019-03-21 12:58:48 -07:00
Jingyu Zhou 48324ad4be Fix a race during ratekeeper registration
When a ratekeeper registers, the monitorRatekeeper wakes up and recruits a new
ratekeeper. Adding a 0s delay to avoid this.

If a ratekeeper is recruited on an existing machine, update the interface so
that the cluster controller can clear the ratekeeperID.
2019-03-21 12:56:56 -07:00
Evan Tschannen e692f0f70f fix: degraded is only used for tlog recruitment, so we should not use it in the fitness calculation for other roles 2019-03-21 11:23:49 -07:00
Jingyu Zhou 8edefda193 Fix test stuck due to invalid worker in cluster controller
Test case:
-r simulation --crash -f ./tests/rare/CloggedCycleWithKills.txt -s 688927581 -b off
2019-03-20 22:24:01 -07:00
Evan Tschannen 59abd8f3d8 fix: make sure recoveryLocation is always a valid page 2019-03-20 18:12:56 -07:00
Evan Tschannen 3730142fcc fix: after a rollback, uncommitted changes to the byte sample could be missed 2019-03-20 18:10:26 -07:00
Jingyu Zhou 937b6dde31 Fix a race of DD, RK, Master failure
If all DD, RK, Master run on the same process and failed. Recruiting of new
DD or RK could try to use the old master worker interface, which is an invalid
one and causes recruitment to be stuck.

Fix by adding a delay and checking master is valid before recruitment.
2019-03-20 16:19:20 -07:00
Evan Tschannen 2ed1d58d16 fix: change the location where stopped is checked, because a yield could cause cause stopped to be set after the existing check 2019-03-20 14:28:32 -07:00
Jingyu Zhou ce5c6d18d2 Fix ratekeeper recruitment bug 2019-03-20 14:22:22 -07:00
Jingyu Zhou 86b687981b Fix ratekeeper and data distributor recruiting bug
Avoid multiple concurrent recuriting of ratekeepers with a recruiting flag.
Fix endless recruiting when the chosen worker is a proxy or a resolver --
prefer master in this case.
2019-03-20 10:00:31 -07:00
Evan Tschannen 5a00f567be fix CheckSatelliteTagLocation 2019-03-20 09:30:11 -07:00
Jingyu Zhou 474abd81bd Move placement monitoring inside doCheckOutstandingRequests 2019-03-19 22:48:21 -07:00
Evan Tschannen 2605257737 Merge branch 'master' of github.com:apple/foundationdb 2019-03-19 18:47:29 -07:00
Evan Tschannen f9aad46573 made use_provisional_proxies a transaction option 2019-03-19 18:44:37 -07:00
Evan Tschannen 20764efa24
Merge pull request #1320 from bnamasivayam/dc-as-satellite-config
Support config where the primary and remote DC's can be used as satel…
2019-03-19 15:49:24 -07:00
Balachandar Namasivayam f9560e1abd Addressed Review Comments 2019-03-19 15:23:14 -07:00
Jingyu Zhou bc6fdaea3e Recruit a new ratekeeper before halting the old 2019-03-19 15:21:46 -07:00
Evan Tschannen 5b9c45ea0b clients do not attempt to connect to provisional proxies 2019-03-19 13:37:50 -07:00
Jingyu Zhou 0fb6a03c07 First round of review comment fixes for PR#1307 2019-03-19 11:29:19 -07:00
A.J. Beamon 2d7b48dadc
Merge pull request #1311 from etschannen/feature-increase-grv-batch
Increased the GRV client batch size
2019-03-19 08:23:05 -07:00
A.J. Beamon 7f4adcc338
Merge pull request #1314 from etschannen/feature-ssd-memory-spill
configure memory now selects the ssd engine for transaction log spilling
2019-03-19 08:22:22 -07:00
Vishesh Yadav fea18e7be0 fix: fdbserver segfault when started with wrong arguments
Public address is required for roles FDBD, NetworkTestServer and
Restore only. Therefore, check those cases, and for others follow the
earlier behaviour of using default ip address 0.

FIXES #1305
2019-03-19 02:05:11 -07:00
Evan Tschannen 2554fed965 reduce max transaction to start 2019-03-18 16:16:03 -07:00
Evan Tschannen 87e2a1a029 The proxy budget is implemented to let one request over its limit through, and then pay back what was over the limit in the next update 2019-03-18 16:09:57 -07:00
Alex Miller b11ecb3210 Remove random bits of code that were either unneeded or leftover from debugging. 2019-03-18 15:47:20 -07:00
Evan Tschannen eb54a700ba changed the old memory configuration to memory-1 2019-03-18 15:10:04 -07:00
Alex Miller 37ea71b117 Implement limiting how many bytes recovery will read.
This time, track what location in the DiskQueue has been spilled in
persistent state, and then feed it back into the disk queue before
recovery.

This also introduces an ASSERT that recovery only reads exactly the
bytes that it needs to have in memory.
2019-03-18 15:09:43 -07:00
Alex Miller 29ab7370cd Clear versionLocation when spilling, and pop DQ separately.
Popping the disk queue now requires potentially recovering the location
to which we can pop from the spilled data itself, and for each tag we
must maintain the first location with relevant data.

The previous queue we had to represent the ordering, queueOrder, was
used by spilling, and popped when a TLog had been spilled.  This means
that as soon as a TLog has been fully spilled, we have no idea how it
relates in order to other fully spilled TLogs.

Instead, use queueOrder to keep track of all the TLog UIDs until they're
removed, and use spillOrder to keep track of the order only for
spilling.
2019-03-18 15:09:22 -07:00
Jingyu Zhou 8d609eb51d Protect ratekeeper registration race during recruitment
This is similar one to DataDistributor.
2019-03-18 13:53:50 -07:00
Balachandar Namasivayam 5471725db5 Support config where the primary and remote DC's can be used as satellites. 2019-03-18 12:17:59 -07:00
Jingyu Zhou 2b41a97a6e Fix the issue of slow dying Data Distributor
Test with:
-r simulation -f ./foundationdb/tests/slow/CommitBug.txt -s 67828576 -b on

The test has the following event sequence:
- Time 113.3s, CC noticed DD failure, cleard DD interface.
- 1s later, DD rejoined and registered with CC.
- Time 131.7s, DD actor cancelled. This old DD raced to register with CC and
the failure monitor is not installed because monitorDataDistributor is stalled
waiting for new DD.
- Time 161.4s, new DD running. New DD recruting was delayed due to no servers
in the period.

Fix by disabling DD registration during the recruting process.
2019-03-17 22:19:23 -07:00
Evan Tschannen 44e25e219c do not suppress KeyValueStoreMemory_OutOfSpace in simulation 2019-03-17 00:35:48 -07:00
Evan Tschannen ec6c843124 increased the GRV client batch size, similarly increased the proxy limits related to the number of transactions started in a batch 2019-03-16 16:18:58 -07:00
Stephen Atherton f88e53e640 Merge branch 'master' of https://github.com/apple/foundationdb into fix-parent-directory 2019-03-16 00:13:09 -07:00
Stephen Atherton 2efb6f4c0d Added cleanPath() which puts a path in a canonical form without .., ., or duplicate separators without using the filesystem or resolving symbolic links. absPath() redefined to use cleanPath() so it will return the same result for a path without symbolic links regardless of whether or not the path actually exists. Redefined parentDirectory() to use absPath() and error on certain inputs. Added comments describing behavior of these functions, and added a unit test which verbosely tests many inputs to them. 2019-03-15 23:54:33 -07:00
Jingyu Zhou 254c78053c Fix a segfault error
After wait, ServerDBInfo may have changed. Using the old copy is wrong.
2019-03-15 22:11:13 -07:00
Alex Miller 7f5bc2981f Checksum DiskQueue pages on read, but at a lower priority.
If a server has its data spilled, then it's behind the 5s window.
Feeding it data is less important than committing, so we can hide the
extra CPU usage from checksumming the read amplified disk queue pages.
2019-03-15 21:01:19 -07:00
Alex Miller ee4721a63f Make checking or ignoring checksums part of the IDiskQueue::read API. 2019-03-15 21:01:18 -07:00
Alex Miller 81c59e88a8 Persist the protocol version of a TLog instance when it is created.
This allows us to do easy upgrades of SpilledData in the future, if the
need arises, because we then have a protocol version to compare against.
2019-03-15 21:01:17 -07:00
Alex Miller bf247eeed0 If TLogVersion >= 3, use crc32c for the DiskQueue hash for TLogs.
We don't have a forward compatibility story for the memory storage
engine, so its DiskQueue will still be hashlittle2 until one exists.
2019-03-15 21:01:16 -07:00
Alex Miller 686b097397 Remove verification code from DiskQueue and TLogServer. 2019-03-15 21:01:15 -07:00
Alex Miller bdd7d5d3df Initialize firstPages with 0xFF.
There's various ASSERT()'s that assume firstPages is empty, and enforces
things about `seq`.  Some of these asserts have spuriously passed, since
uninitialized pages look like they have a `seq` of 0, which would be the
beginning of the disk queue.

Now they'll look like the end of the disk queue, which is far easier to
fail on.
2019-03-15 21:01:14 -07:00
Alex Miller 77f596743f Bump persistFormat in TLogServer to differ from OldTLogServer*
Though this format is being deprecated in favor of an eventual plumbing
through of TLogVersion, we should probably bump it anyway.

And also remove the fallback to OldTLogServer code.  It should never be
executed, as OldTLogServer_6_0 is entirely relied upon to execute
OldTLogServer_4_6.
2019-03-15 21:01:13 -07:00
Alex Miller 4f98634f59 Add LogId to all TLog TraceEvents that have it. 2019-03-15 21:01:12 -07:00
Jingyu Zhou 12ddd56698 Fix Ratekeeper and DataDistributor placement
Make sure both RateKeeper and DataDistributor are placed in the same data
center as the Master. Make sure only one RateKeeper is live in the cluster as
well.
2019-03-15 17:09:28 -07:00
Jingyu Zhou bb5686eb75 Fix monitoring of DD and RK 2019-03-15 16:02:17 -07:00
Jingyu Zhou 9f6fe5f649 Merge remote-tracking branch 'apple/master' into ratekeeper 2019-03-15 11:30:04 -07:00
Jingyu Zhou 40860e0093 Attempt to fix. 2019-03-15 11:29:04 -07:00
A.J. Beamon 85b3f11e71 Fix various compiler warnings 2019-03-15 10:34:57 -07:00
Stephen Atherton 126252a274 Changed checksum to crc32. Disabled pager housekeeping for now. Added more btree read/write/commit metrics. Changed readPage to use disk read priority. Bug fix in CommitSubtree causing it to recurse to children unnecessarily. Added point read speed test at the end of set performance unit test. 2019-03-15 00:46:09 -07:00
Jingyu Zhou 9e59c9c253 Check DataDistributor and RateKeeper fitness
Fail the test if they are not put in the best fitness.
2019-03-14 16:14:57 -07:00
Jingyu Zhou 99d521ef4f Monitor Ratekeeper and DataDistributor to use stateless processes
Since Ratekeeper and DataDistributor are no longer running with Master, they
might be running with stateful processes before a new Master becomes alive,
which is undesirable.

This PR adds a monitoring of both Ratekeeper and DataDistributor at Cluster
Controller -- if Master runs on a stateless class and RK/DD runs at a worse
class, then RK/DD will be killed. I.e., RK/DD should be running at their own
classes or on the same stateless process as Master. After restart, RK/DD should
be running at a better process class.
2019-03-14 15:00:57 -07:00
Balachandar Namasivayam 2ac07fe7e0
Merge pull request #1248 from satherton/feature-backup-json
JSON output options for fdbbackup status and describe
2019-03-14 13:41:28 -07:00
Meng Xu 5a10bf5dfc Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-14 10:35:12 -07:00
Meng Xu e30e2af1f3 ClientKnobs: Add CHECK_CONNECTED_COORDINATOR_NUM_DELAY 2019-03-13 16:54:56 -07:00
Evan Tschannen e7d1f9e5f1 fixed review comments 2019-03-13 15:59:03 -07:00
Evan Tschannen 7f48025348 optimize confirm epoch alive 2019-03-13 14:47:17 -07:00
Steve Atherton dbacfcbc82
Merge branch 'master' into feature-backup-json 2019-03-13 13:30:45 -07:00
Evan Tschannen a2108047aa removed LocalitySetRef and IRepPolicyRef typedefs, because for clarity the Ref suffix is reserved for arena allocated objects instead of reference counted objects. 2019-03-13 13:14:39 -07:00
Evan Tschannen e068c478b5 merge master 2019-03-12 18:31:25 -07:00
Steve Atherton 8aab719c22
Merge branch 'master' into feature-backup-json 2019-03-12 18:23:16 -07:00
Evan Tschannen a7e45cff91
Merge pull request #1176 from jzhou77/ratekeeper
Make Ratekeeper a separate role
2019-03-12 15:58:59 -07:00
Meng Xu 85c24b0067 Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-12 15:20:54 -07:00
Evan Tschannen 5392742902 fixed review comments 2019-03-12 14:38:54 -07:00
Evan Tschannen c5a18945b6
Merge pull request #1260 from vishesh/task/tls-upgrade
Allows cluster string to contain coordinators with different TLS states
2019-03-12 13:45:08 -07:00
A.J. Beamon a25e224cda
Merge pull request #1213 from etschannen/feature-metadata-version
Added a metadata version key
2019-03-12 13:36:33 -07:00
Jingyu Zhou 2b0139670e Fix review comment for PR 1176 2019-03-12 12:02:30 -07:00
Stephen Atherton f0eae0295f Merge branch 'master' of https://github.com/apple/foundationdb into feature-backup-json 2019-03-12 03:35:03 -07:00
Stephen Atherton e9b8bf601e Added backup status JSON output to backup workload to get sim coverage. 2019-03-12 03:34:38 -07:00
Balachandar Namasivayam 880e8643d1 Fix Windows link errors 2019-03-11 17:49:03 -07:00
Meng Xu 46f4b02807 TLS Status: Resolve review comments
Use connectedCoordinatorsNumDelayed to reduce the load on cluster controller;
Set connectedCoordinatorsNum to null by default for monitorLeader()
2019-03-11 17:10:08 -07:00
Evan Tschannen 5873705228 tlog commits very rarely take an additional 6 seconds 2019-03-11 12:11:17 -07:00
Meng Xu 435e515985 Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-11 11:17:40 -07:00
Evan Tschannen 80c3f2f8e2 added status fields detailing which processes are degraded, and also the total number of degraded processes 2019-03-10 22:58:15 -07:00
Evan Tschannen c6e94293bf reset a process to not be degraded after 2 days 2019-03-10 22:39:21 -07:00
Evan Tschannen 2627bcd35e Merge branch 'master' into feature-metadata-version 2019-03-10 21:13:28 -07:00
Evan Tschannen 1be9ae5ce3 fixed merge conflict 2019-03-08 22:51:06 -05:00
Evan Tschannen 044b6b4f8a Merge branch 'master' into feature-degraded-tlog
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
mpilman ebffe8c633 print correct pahes in alloc instrumentation 2019-03-08 15:03:17 -08:00
Evan Tschannen 45fe6b369b tlog recruitment will prefer non-degraded processes, however it will not choose less than desired number of tlogs to avoid degraded processes
better master exists will switch the master to avoid degraded processes
2019-03-08 14:40:00 -05:00
Evan Tschannen 53f16b5347 when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded 2019-03-08 11:46:34 -05:00
Evan Tschannen 710a64dc4e replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails 2019-03-08 11:25:07 -05:00
mpilman 2537f26de6 First implementaion of more user-friendly cpack
Up unto here this code is only very rudiemantery tested.

This is a firest attempt of making cpack more user-friendly.
The basic idea is to generate a component for package type so
that we can have different paths depending on whether we build
an RPM, a DEB, a TGZ, or a MacOS installer. The cpack package
config file will then chose the correct components to use.

In a later point this should make it possible to build these
with `make packages` and the ugly iteration with calling cmake
between each package would be obsolete. While this solution is
a bit more bloated, it is also much more flexible and it will be
much easier to use.

Another benefit is, that this will get rid of all warnings during
a cpack run
2019-03-07 16:49:29 -08:00
Jingyu Zhou cdfe906c30 Data distributor pulls batch limited info from proxy
Add a flag in HealthMetrics to indicate that batch priority is rate limited.
Data distributor pulls this flag from proxy to know roughly when rate limiting
happens.

DD uses this information to determine when to do the rebalance in the background,
i.e., moving data from heavily loaded servers to lighter ones. If the cluster is
currently rate limited for batch commits, then the rebalance will use longer
time intervals, otherwise use shorter intervals. See BgDDMountainChopper() and
BgDDValleyFiller() in DataDistributionQueue.actor.cpp.
2019-03-07 13:16:20 -08:00
Jingyu Zhou f43277e819 Format Ratekeeper.actor.cpp code 2019-03-07 13:16:20 -08:00
Jingyu Zhou dc129207a9 Minor fix after rebase. 2019-03-07 13:16:20 -08:00
Jingyu Zhou 835cc278c3 Fix rebase conflicts. 2019-03-07 13:16:20 -08:00
Jingyu Zhou 7340998261 Fix status message for ratekeeper 2019-03-07 13:16:20 -08:00
Jingyu Zhou 517966fce2 Remove lastLimited from rate keeper
Refactor code to make IDE happy.
2019-03-07 13:16:20 -08:00
Jingyu Zhou d52ff738c0 Fix merge conflicts during rebase. 2019-03-07 13:16:20 -08:00
Jingyu Zhou b2ee41ba33 Remove lastLimited from data distribution
Fix a serialization bug in ServerDBInfo, which causes test failures.
2019-03-07 13:16:20 -08:00
Jingyu Zhou 36a51a7b57 Fix a segfault bug due to uncopied ratekeeper interface 2019-03-07 13:16:20 -08:00
Jingyu Zhou e6ac3f7fe8 Minor fix on ratekeeper work registration. 2019-03-07 13:16:20 -08:00
Jingyu Zhou 3c86643822 Separate Ratekeeper from data distribution.
Add a new role for ratekeeper.

Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Andrew Noyes 27d199409e Add KillRegion.actor.cpp workload to cmake 2019-03-07 12:14:42 -08:00
Balachandar Namasivayam 9e4c780baa
Merge pull request #1249 from xumengpanda/mengxu/status/teamcollection-info
Status:healthy: Add optimizing_team_collections
2019-03-07 11:44:24 -08:00
Balachandar Namasivayam f3391ea413
Merge pull request #1240 from satherton/feature-restore-by-timestamp
Restore by timestamp
2019-03-06 16:21:06 -08:00
Vishesh Yadav ed49d603a0 Allows cluster string to contain coordinators with different TLS states
During live TLS upgrades, we can hence switch one coordinator at a time
to TLS than all of them together.
2019-03-06 16:05:10 -08:00
Meng Xu 845f8fdcbc Status:healthy: Add optimizing_team_collections
Change removing_redundant_teams status name to
optimizing_team_collections.
The new name is more general and can be applied in the future
when we switch storage engines.
2019-03-06 15:05:23 -08:00
Meng Xu 04880e3d4d Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-06 13:41:16 -08:00
Stephen Atherton 7778112f6a Bug fix, restore was using the destination cluster to look up timestamps when printing the backup description instead of (optionally) the original cluster which generated the backup. Made missing cluster file errors more clear. 2019-03-06 02:45:55 -08:00
Alex Miller c6a65389ae Remove noexcept macro and replace with BOOST_NOEXCEPT.
BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a
way that is correctly maintained over time.
2019-03-05 22:06:12 -08:00
Alex Miller af617d68e6 boost 1.52.0 -> 1.67.0 in all vcxproj files 2019-03-05 22:06:12 -08:00
Meng Xu 820548223a Status: connected_coordinators misc minor changes
Change the rst document file;
Change the coding style to be consistent with the nearby code;
Ensure we always initilize the connectedCoordinatesNum to 0
even when the variable is not used.
2019-03-05 21:45:18 -08:00
Meng Xu b7a52e81e2 Status: Count connected coordinators per client
A client will always try to connect all coordinators.
This commit let Status track the number of connected coordinators
for each client.

This allows us to do canary in coordinators. For example,
when we switch from non-TLS to TLS, we can switch 1 coordinator
from non-TLS to TLS. This can help check if a client has the ability
to connect through TLS.
We can make the non-TLS to TLS switch for each coordinators
one by one. This avoid the risk of losing connection in the switch.
2019-03-05 21:21:23 -08:00
Alex Miller ad0aca21b5 Update fdbserver/fdbserver.vcxproj
Co-Authored-By: atn34 <anoyes34@gmail.com>
2019-03-05 18:03:57 -08:00
anoyes 981426bac9 More ide fixes 2019-03-05 18:03:57 -08:00
Evan Tschannen 82d957e0bb
Merge pull request #1178 from vishesh/task/issue-963-IPv6
IPv6 Support
2019-03-05 17:14:16 -08:00
Vishesh Yadav a9562f61be fix: missing argument to printf in fdbserver 2019-03-05 14:03:09 -08:00
Steve Atherton 21f55e1878
Merge pull request #1190 from bnamasivayam/restore-multiple-ranges
Add support for restoring multiple ranges.
2019-03-05 10:15:55 -08:00
Evan Tschannen f1897f3eb6 Merge branch 'master' into feature-metadata-version
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
2019-03-04 21:06:16 -08:00
Evan Tschannen 69d7633d5b
Merge pull request #1217 from alexmiller-apple/tstlog-goodref
Spill-By-Reference TLog Part 4: Actually Usable Reference Spilling
2019-03-04 20:58:24 -08:00
Evan Tschannen 3d196c9e97 fix: metadataVersionKey log range removal needs to be checked for each logDestination 2019-03-04 20:56:31 -08:00
Evan Tschannen 988add9fb5 cache the metadataVersion for commits, so that doing setVersion with the commit version of a different transaction will 2019-03-04 16:48:34 -08:00
Meng Xu c0535c49bb Status: TLS client status
Use ClientStatusInfo structure for each network address (client),
instead of passing each status info as a parameter.
2019-03-04 16:35:10 -08:00
Trevor Clinkenbeard 89cbb77b4e Merge branch 'master' of https://github.com/apple/foundationdb into lazily-fetch-health-metrics 2019-03-04 14:17:58 -08:00
Trevor Clinkenbeard 56ae46f89e Client lazily fetches health metrics from proxies 2019-03-04 14:16:39 -08:00
Vishesh Yadav e93cd0ff21 Add some checks and comments to IPv6 changes #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav 592e224155 net: add/use formatIpPort to format IP:PORT pairs #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav cc9ad0e202 net: Use IPv6 in simulation testing #963
25% times we will use IPv6 addresses
2019-03-04 14:12:45 -08:00
Vishesh Yadav 57832e625d net: Support IPv6 #963
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.

- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.

- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.

- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
Alex Miller baa3e1af2c Replace `/sizeof(Page)*sizeof(Page)` with `pageFloor()`. 2019-03-04 01:42:39 -08:00
Alex Miller ee64b43366 Change DQ shrink logic to consider "active" bytes rather than file size.
We know what the current ideal size of the DQ file should be, so we
should use it.
2019-03-04 01:42:39 -08:00
Alex Miller 244903a9de Spill txsTag by value under TagMsg/ and not TagMsgRef/
There's not a tremendous reason as to why this matters now, but I feel
like I might regret sometime later not keeping the same schema under the
same key.
2019-03-04 01:42:39 -08:00
Alex Miller 72c2cf11ab Replace ResourceLimiter with FlowLock. 2019-03-04 01:42:38 -08:00
Alex Miller 94bf75cb00 Allow the disk queue to shrink if it has unneeded slack space. 2019-03-04 01:42:38 -08:00
Alex Miller 52d5a721a6 Don't allocate 2x the memory for a read to save 1% of allocated memory. 2019-03-04 01:42:38 -08:00
Alex Miller aff9ebe21a Spill (start,length) instead of (begin,end) to save a few bytes. 2019-03-04 01:42:38 -08:00
Alex Miller 2aa527c0ef Fix a bug resulting from concurrent TLog changes.
TLogServer was forked into OldTLogServer_6_0 at the same time that
3247d594 modified TLogServer, so the modification never made it into
OldTLogServer_6_0, resulting in a rare failure.

Manual code inspection revealed that there was also
78976161 that concurrently modified TLogServer, so that change was
copied to OldTLogServer_6_0 as well.
2019-03-04 01:42:38 -08:00
Alex Miller fb4cb8c3a8 Print out configuration changes in ConfigureTest. 2019-03-04 01:42:38 -08:00
Alex Miller 9ef283d4e7 Implement hard limiting of memory used to serve peek requests. 2019-03-04 01:42:38 -08:00
Alex Miller e3506ad9af Add a yield to parseMessagesForTag 2019-03-04 01:42:38 -08:00
Alex Miller 742f6e1847 Solve overreading via pre-calculating tag bytes per commit 2019-03-04 01:42:38 -08:00
Alex Miller e7d8520c63 Batch more when spilling data. 2019-03-04 01:42:38 -08:00
Alex Miller 71a794ccc3 Re-enable spill-by-reference testing. 2019-03-04 01:42:38 -08:00
Alex Miller 04e1170c88 Spill txsTag by value 2019-03-04 01:42:38 -08:00
Alex Miller 4d4e0a1d54 Fix the build on -O0.
C++ < 17 requires definitions of declared static constexpr variables.
2019-03-04 01:42:38 -08:00
Alex Miller db546af4a3 Fix the build on -O0.
C++ < 17 requires definitions of declared static constexpr variables.
2019-03-04 01:38:58 -08:00
Evan Tschannen 075fdef31a Merge branch 'master' into feature-metadata-version
# Conflicts:
#	fdbclient/DatabaseContext.h
2019-03-03 22:58:45 -08:00
Evan Tschannen 057ebe56e4 fix: unknownCommit handling relied on soleOwnership of the version stamp keys, so we need to use a second key to track the commit version for the metadataVersionKey
renamed a confusing option
2019-03-03 21:31:40 -08:00