Commit Graph

263 Commits

Author SHA1 Message Date
Alex Miller 2d26e98d07 Add a cross-platform getLastWrite() to get a file's mtime. 2018-07-20 19:00:32 -07:00
A.J. Beamon a7a1124c11 Fix incompatible connection accounting that was incorrectly decrementing the incompatible count in some cases. 2018-07-17 11:36:05 -07:00
A.J. Beamon 8879954254
Merge pull request #609 from etschannen/release-6.0
Improved simulation strength by only remove datacenters that have been killed
2018-07-16 15:59:28 -07:00
Evan Tschannen e0caa28758 code cleanup 2018-07-16 15:56:43 -07:00
AlvinMooreSr aafb3c5c00
Merge pull request #593 from AlvinMooreSr/release-6.0-tls-funct
Replaced separate TLS Log function with FDB TraceEvent logger
2018-07-16 12:01:02 -07:00
Evan Tschannen f72a9f60c0 only disable fearless if a datacenter has actually been killed
fix: we must prevent recovery into the dead datacenter while reducing usable_regions
2018-07-16 10:06:57 -07:00
Alvin Moore a034acf3bd Replaced separate TLS Log function with FDB TraceEvent logger 2018-07-11 18:41:46 -07:00
Alec Grieser d5a23642a1
Merge pull request #587 from etschannen/feature-remote-logs
close unneeded connections
2018-07-10 13:27:15 -07:00
Evan Tschannen a35d5e30d9 Added a SevError trace event in case peer references becomes negative 2018-07-10 13:26:28 -07:00
Evan Tschannen c25be5699a close unneeded connections 2018-07-10 13:10:29 -07:00
Alec Grieser be9c34c6f8
Merge remote-tracking branch 'upstream/release-5.2' into merge-release-5.2 2018-07-10 10:04:48 -07:00
Alec Grieser ad37b1693d
Merge pull request #585 from etschannen/feature-remote-logs
A variety of cleanup and test strengthening commits
2018-07-10 09:58:44 -07:00
AlvinMooreSr b3916a9b71
Merge pull request #409 from joelarmstrong/tlsconnection-clang-ub-warning
Fix compilation with clang from Apple LLVM 9.1.0
2018-07-10 09:32:24 -07:00
Evan Tschannen 82cc30be62 added testing for two_satellite_fast and two_satellite_safe 2018-07-09 22:01:46 -07:00
Stephen Atherton fddb3e87e2 Differentiate between a timeout in attempting to connect vs a timeout on an active connection by converting timeouts during connection attempts to connection_failed errors. 2018-07-09 19:40:01 -07:00
Stephen Atherton 3ce7c78d36 If an HTTP request fails due to a connection failure or a timeout, do not convert the error to the more generic http_request_failed. 2018-07-09 18:58:33 -07:00
Evan Tschannen e503dc975c fix: destroy peers that are inactive
do not open new connections to send replies
2018-07-09 13:37:06 -07:00
Evan Tschannen 5a2cb3037b merge 5.2 into 6.0 2018-07-08 20:14:06 -07:00
Evan Tschannen 0e97ce79b4 fix: destroy peers that are inactive
do not open new connections to send replies
2018-07-08 10:26:41 -07:00
Stephen Atherton a2f16e217e Memory waste fix, when a Peer disconnects an extra packet buffer block is allocated to copy unsent reliable bytes to even if there aren't any. 2018-07-06 19:44:30 -07:00
Evan Tschannen 6d7172ef7e fix: canKillProcesses did not take into account the remoteTLogPolicy when checking notEnoughLeft 2018-07-05 21:36:09 -07:00
Evan Tschannen 6f4ca2eba2 fix: get all processes did not include rebooting processes 2018-07-05 21:13:56 -07:00
Evan Tschannen cd4fb9285a waitForExlusion requires both regions to be healthy, which is only possible if we do not kill all logs in a region 2018-07-05 14:04:42 -07:00
Evan Tschannen 7315e5da55 fix: isExcluded and isCleared were exactly wrong
fix: isCleared should mean the process is dead
2018-07-05 02:22:22 -07:00
Evan Tschannen e17dfea3b6 fix: desiredTLogCount was used instead of getDesiredLogs(), which caused problems with recruitment when desiredTLogCount was -1.
canKillProcess logic was wrong.
We still need to configure usable_regions because if datacenterVersionDifference is too large we cannot complete data movement.
2018-07-04 16:22:32 -04:00
Alvin Moore c3f88dbfe1 Merge branch 'master' of github.com:apple/foundationdb into tls-static 2018-07-01 23:13:57 -07:00
Alvin Moore 132e2d9267 Defined TLS build flags for projects
Updated TLS documentation
2018-07-01 22:49:39 -07:00
Evan Tschannen 899f880ce0 fix: log router class did not have the proper fitness for becoming the cluster controller 2018-06-28 23:20:01 -07:00
Alvin Moore 45849d1f95 Added support for no-op legacy TLS options 2018-06-27 09:25:05 -07:00
Alvin Moore 65d8b38ae9 Changed generic plugin code to work as expected plugin code except for TLS use case
Defined TLS plugin name constant
Changed TLS plugin name to get_tls_plugin
Fixed link script
Removed compilation flags from info make target
2018-06-26 16:01:25 -07:00
Alvin Moore ef8de426d3 Changed the TLS_DISABLED macro
Disable TLS within Windows until working
2018-06-26 12:08:32 -07:00
Evan Tschannen 0123627d67 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-06-22 10:43:07 -07:00
Evan Tschannen 5fc8199abc Swapped OkayFit and UnsetFit, because generally if machine classes are set on one machine they are set everywhere and it helps with wait_for_good_recruitment logic
wait_for_good_recruitment now requires that you have the desired count of each roll
remote recruitment is given a much longer wait_for_good_recruitment time interval, which does not start until enough remote machines have registered
2018-06-22 10:15:24 -07:00
Evan Tschannen 1dce97f28c Merge branch 'release-5.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/SimulatedCluster.actor.cpp
#	packaging/msi/FDBInstaller.wxs
#	versions.target
2018-06-21 17:05:11 -07:00
Balachandar Namasivayam d7dba11366 Throw tls_error instead of internal_error when not able to create a TLS connection. 2018-06-21 15:33:00 -07:00
Stephen Atherton e9e1e194f0 Added operation-specific rate controls to blob store interface. 2018-06-20 20:34:34 -07:00
Richard Low fff6a47c43 Validate certiicates by default 2018-06-20 14:04:03 -07:00
Alvin Moore f8ce1de601 Added support for compiling TLS into binaries 2018-06-20 09:21:23 -07:00
Evan Tschannen 0913368651 added usable_regions to specify if we will replicate into a remote region
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Alex Miller 6c2cb25c53 Rename BestOtherFit -> OkayFit.
The previous order of fitness was

  BestFit > GoodFit > BestOtherFit > ...

which is baffling.  It's now:

  BestFit > GoodFit > OkayFit > ...

which won't break anyone's expectations.
2018-06-12 16:50:25 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen 48fbc407fd fix: we cannot kill all of the remote tlogs, because we still need their data to copy to the next generation in the same data center 2018-06-08 15:28:44 -07:00
A.J. Beamon 99c9958db7 Some more trace event normalization 2018-06-08 13:57:00 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Balachandar Namasivayam 529d0497f1 Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload.
Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.
2018-06-01 15:21:40 -07:00
A.J. Beamon d9c702a9e3 Merge release-5.1 into release-5.2 2018-05-30 09:09:55 -07:00
Joel Armstrong 7c35ea6ba1 Fix use of bool in va_start causing undefined behavior
The version of clang included in Apple LLVM 9.1.0 complains about
passing the bool parameter `is_error` to va_start, which causes make
to fail:

fdbrpc/TLSConnection.actor.cpp:370:16: error: passing an object that undergoes
      default argument promotion to 'va_start' has undefined behavior
      [-Werror,-Wvarargs]
        va_start( ap, is_error );
                      ^
This just switches is_error back to the type it gets promoted to (int).
2018-05-24 16:37:11 -07:00
A.J. Beamon 026458baf3 Merge release-5.2 into master 2018-05-23 15:32:56 -07:00
Richard Low 84ed35b01f Only log TLS verify failures if all verification fails; log failures at SevInfo 2018-05-21 10:58:59 -07:00
Richard Low 086700aeb1 Plumb through TLS key password to CLI and from environment 2018-05-21 10:56:10 -07:00