Commit Graph

747 Commits

Author SHA1 Message Date
Andrew Noyes 781b6ece77 Fix OPEN_FOR_IDE -Wunused-variable warnings
CC #1255, #1173
2019-04-16 15:28:01 -07:00
Andrew Noyes 75b9369583 Make checksumHistoryBudget optional
See https://github.com/apple/foundationdb/pull/1446#discussion_r275933381
2019-04-16 12:55:53 -07:00
Andrew Noyes 247f95a6e2 Add -Wunused-variable 2019-04-15 18:13:00 -07:00
Andrew Noyes 6207d724f8 Fix all -Wunused-variable warnings 2019-04-15 18:13:00 -07:00
Evan Tschannen cd5c9d91fa
Merge pull request #1443 from etschannen/master
Merge 6.1 into master
2019-04-10 17:43:07 -07:00
Evan Tschannen 8e05713a5d do not log a SevError trace event if we cannot deserialize the connect packet 2019-04-10 17:41:02 -07:00
Evan Tschannen 6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
Remove unused functions
2019-04-09 11:49:45 -07:00
A.J. Beamon 058d028099
Merge pull request #1301 from mpilman/features/cheaper-traces
Defer formatting in traces to make them cheaper
2019-04-09 10:11:04 -07:00
A.J. Beamon a7288e1325 Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate. 2019-04-08 14:21:24 -07:00
mpilman bdba8e22eb Added test and bugfixes 2019-04-08 11:05:29 -07:00
mpilman b944e0b116 generalized read guards, allow for penalty+error 2019-04-08 11:04:44 -07:00
Evan Tschannen 1358603c7a fix: getReplyUnlessFailedFor must still report endpoint failures even if the address is local 2019-04-08 10:42:58 -07:00
Evan Tschannen 1baae75ac9 merge 6.1 2019-04-07 23:24:31 -07:00
Balachandar Namasivayam 83e67d6b8f Do not start failure monitoring for local endpoints in getReplyUnlessFailedFor. 2019-04-05 20:21:22 -07:00
Andrew Noyes bd12e77213 Whitespace tweak 2019-04-05 16:30:42 -07:00
Andrew Noyes c882743afa Make actual build happy again 2019-04-05 16:30:42 -07:00
Andrew Noyes d7612a4426 Fix OPEN_FOR_IDE build errors 2019-04-05 16:30:42 -07:00
mpilman c008e16c81 Defer formatting in traces to make them cheaper
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:

- TraceEvent::detail now takes a c-string instead of std::string for
  literals. This prevents unnecessary allocations if the trace is not
  going to be printed in the first place (for example for SevDebug).
  Before that `detail` expected a `std::string` as key, which mean that
  any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
  specialized for any type that needs to be printed. The actual
  formatting will be deferred to after the `enabled` check. This
  provides two benefits: (1) if a TraceEvent is disabled, we don't pay
  for the formatting and (2) TraceEvent can trace types that it doesn't
  know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
  passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
  calls, a call to detail will only introduce a if-branch which is much
  cheaper than a function call.
2019-04-05 13:12:19 -07:00
Evan Tschannen 390ab9cfed A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy 2019-04-04 14:11:12 -07:00
Jingyu Zhou 2b75c2e684 Restore removed functions.
crc32c.cpp is 3rd party code.
orYield() in genericactors.actor.h might be used in the future code.
2019-04-04 13:24:55 -07:00
Markus Pilman 101a05ae77
Merge branch 'master' into features/client-simulator 2019-04-03 10:03:56 -08:00
Alex Miller 45c466e269 Open incrementalDelete files with OPEN_UNBUFFERED
This fixes crashes from AsyncFileWinASIO refusing to open a file that
didn't have OPEN_UNBUFFERED.
2019-04-01 17:25:08 -07:00
Jingyu Zhou 47b4b82628
Merge branch 'master' into fix-unreferenced 2019-04-01 14:07:19 -07:00
Jingyu Zhou 3f76be8f45 Merge remote-tracking branch 'apple/master' into fix-unreferenced 2019-04-01 14:00:43 -07:00
Jingyu Zhou f7f8ddd894 Fix warnings on unused variables
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
mpilman e23e63c6ac Implemented JavaWorkload
This change allows a user to write a workload in Java.

The way this is implemented is by creating a JVM within the
simulator and calling the corresponding workload class. A
workload can then run in the simulator or on a testing cluster.

If the workload is executed within the simulator, the resulting
test will not be deterministic anymore as it will execute in a
different thread (and even without that it is not clear, whether
we could get determinism as the JVM does a lot of stuff that are
not deterministic).

This is intendet to get better testing of the Java client and
layer authors can use the simulator to test their layers on a single
machine but they can still simulate failing machines etc.
2019-03-31 17:57:43 -07:00
Balachandar Namasivayam 0bbdc15f71 Multi-test processes waits until a timeout if any of the tester processes restarts. Use getReplyUnlessFailedFor instead of getReply to detect the restarts and fail quickly instead of waiting for a timeout which is usually large. 2019-03-28 17:05:30 -07:00
Evan Tschannen b6008558d3 renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen 34b9d5e722
Merge pull request #1364 from etschannen/feature-fast-serialize
A few performance optimizations
2019-03-27 20:57:25 -07:00
Evan Tschannen c10f1eea71 QueueModel changed to unordered_map 2019-03-27 20:56:44 -07:00
Evan Tschannen f1a4bdd70d changed failureMonitor to use an unordered_map 2019-03-27 19:17:08 -07:00
Evan Tschannen e5a80f2c94 optimized IPaddress 2019-03-27 18:21:13 -07:00
Jingyu Zhou a55f06e082 Remove unused functions
Found with -Wunused-function flag.
2019-03-27 15:45:28 -07:00
A.J. Beamon 71e2fdafb8 Changes to ratekeeper camel case 2019-03-27 08:24:25 -07:00
Evan Tschannen 3b5b03e435 ReplyPromise does not serialize an empty NetworkAddress 2019-03-26 12:05:43 -07:00
Evan Tschannen d45159ebf7
Merge pull request #1307 from jzhou77/ratekeeper
Monitor placement of Ratekeeper and DataDistributor
2019-03-24 17:26:07 -07:00
Evan Tschannen 1fc6937802 changed NetworkAddressList to at most two addresses for performance 2019-03-23 17:54:46 -07:00
Evan Tschannen 36ab852bb1 Merge branch 'master' into ratekeeper
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-22 18:41:00 -07:00
Evan Tschannen efbcd18987 fixed a performance regression related to broadcasting a read version to too many transactions simultaneously 2019-03-22 16:05:20 -07:00
Jingyu Zhou 0fb6a03c07 First round of review comment fixes for PR#1307 2019-03-19 11:29:19 -07:00
Jingyu Zhou 254c78053c Fix a segfault error
After wait, ServerDBInfo may have changed. Using the old copy is wrong.
2019-03-15 22:11:13 -07:00
A.J. Beamon 85b3f11e71 Fix various compiler warnings 2019-03-15 10:34:57 -07:00
Meng Xu 5a10bf5dfc Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-14 10:35:12 -07:00
Steve Atherton be0da73938
Merge pull request #1290 from etschannen/feature-cheap-policy
Optimized a few uses of the replication policy engine
2019-03-13 17:01:19 -07:00
Evan Tschannen e7d1f9e5f1 fixed review comments 2019-03-13 15:59:03 -07:00
Evan Tschannen e8cb85ed8e optimize validateAllCombinations 2019-03-13 14:47:35 -07:00
Vishesh Yadav c32504f705 io: Add DISABLE_POSIX_KERNEL_AIO knob to use EIO instead of Kernel AIO
- Some Linux filesystems don't support O_DIRECT which is required by
Kernel AIO to function properly. Instead of using O_SYNC, EIO is
much better options in terms of performance penalty.
- Some systems may not support AIO at all. Eg. Windows Subsystem for
Linux.

FIXES #842
RELATED #274
2019-03-13 13:39:45 -07:00
Evan Tschannen a2108047aa removed LocalitySetRef and IRepPolicyRef typedefs, because for clarity the Ref suffix is reserved for arena allocated objects instead of reference counted objects. 2019-03-13 13:14:39 -07:00
Evan Tschannen e068c478b5 merge master 2019-03-12 18:31:25 -07:00
Evan Tschannen a7e45cff91
Merge pull request #1176 from jzhou77/ratekeeper
Make Ratekeeper a separate role
2019-03-12 15:58:59 -07:00
Meng Xu 85c24b0067 Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-12 15:20:54 -07:00
Balachandar Namasivayam 880e8643d1 Fix Windows link errors 2019-03-11 17:49:03 -07:00
Evan Tschannen 044b6b4f8a Merge branch 'master' into feature-degraded-tlog
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
Evan Tschannen 41c493f8d4 fix: connectPacket accessed uninitialized variables 2019-03-08 14:40:32 -05:00
Jingyu Zhou 5dcde9efe0 Fix locality per review comment and a mac compile error 2019-03-07 13:16:20 -08:00
Jingyu Zhou 3c86643822 Separate Ratekeeper from data distribution.
Add a new role for ratekeeper.

Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Meng Xu 04880e3d4d Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-06 13:41:16 -08:00
Alex Miller c6a65389ae Remove noexcept macro and replace with BOOST_NOEXCEPT.
BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a
way that is correctly maintained over time.
2019-03-05 22:06:12 -08:00
Alex Miller af617d68e6 boost 1.52.0 -> 1.67.0 in all vcxproj files 2019-03-05 22:06:12 -08:00
Meng Xu 820548223a Status: connected_coordinators misc minor changes
Change the rst document file;
Change the coding style to be consistent with the nearby code;
Ensure we always initilize the connectedCoordinatesNum to 0
even when the variable is not used.
2019-03-05 21:45:18 -08:00
anoyes 981426bac9 More ide fixes 2019-03-05 18:03:57 -08:00
Evan Tschannen 82d957e0bb
Merge pull request #1178 from vishesh/task/issue-963-IPv6
IPv6 Support
2019-03-05 17:14:16 -08:00
Meng Xu afd7c1d497 AsynFileWinASIO: Make error checking consistent with Linux
In Linux, KAIO uses ASSERT to make sure open() flags have
OPEN_UNBUFFERED set.

In Windows, we uses if-condition and return io_errors() when the
flag is not set.

This PR makes Windoes implementation always use ASSERT to check the
flag.
2019-03-04 16:36:04 -08:00
Vishesh Yadav 5cd8bac6cb fix: segfault due external assignment of Endpoint::addresses #1201
isLocal() now checks if the address is equal to default
NetworkAddress() which should match the behaviour before TLS changes.
2019-03-04 15:49:11 -08:00
Vishesh Yadav 1d3e62c4e3 net: Don't use a union of IP in ConnectPacket #963
Since keeping a union and using the packet size to figure out whether
the ConnectPacket is using IPv6 to IPv4 address is not easily
maintainable. For simplicity, we just serialize everything in
ConnectPacket and be backward compatible with older format.

However, some code for some much older stuff is removed.
2019-03-04 14:12:45 -08:00
Vishesh Yadav e93cd0ff21 Add some checks and comments to IPv6 changes #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav 592e224155 net: add/use formatIpPort to format IP:PORT pairs #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav cc9ad0e202 net: Use IPv6 in simulation testing #963
25% times we will use IPv6 addresses
2019-03-04 14:12:45 -08:00
Vishesh Yadav 57832e625d net: Support IPv6 #963
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.

- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.

- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.

- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
Meng Xu 94385447bc Status: Get if client configured TLS
To understand if all clients have configured TLS,
we check the tlsoption when a client tries to open database.
This is similar to how we track the versions of multi-version clients.
2019-03-01 15:17:01 -08:00
Stephen Atherton 7d287c6999 Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2019-02-28 14:01:00 -08:00
Stephen Atherton 887856b6b0 Bug fix in AsyncFileReadAhead where a file size that is an integer multiple of the read chunk size will cause a crash when reading the file's final block. BackupContainerLocalDirectory now uses AsyncFileReadAhead in simulation to get simulation coverage of that class, and FileBackup will generate file sizes which expose the bug. 2019-02-28 00:22:38 -08:00
Evan Tschannen 8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Alex Miller 2dc57568cb Change many things about log_version.
* log_version in the database (`/conf/log_version`) is now a hint that gets
  rounded to the nearest supported version.
* fdbcli and FDB enforce that only a valid log_version can be configured to
* TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet)
* Some comments here and there
* Add an assert on filename length to make sure KV-pairs in filename
  don't exceed a maximum length.
2019-02-26 16:47:04 -08:00
Evan Tschannen b8910ba7cd Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.h
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Trevor Clinkenbeard 25b397977c Never assign DataDistributor role to process of class CoordinatorClass 2019-02-20 17:22:01 -08:00
Trevor Clinkenbeard 1bb384db4d Merge branch 'master' of https://github.com/apple/foundationdb into add-no-assign-class 2019-02-20 13:13:12 -08:00
mpilman f14dee764b Use fwd decl for connectionReader - fdbrpc compiling 2019-02-19 15:16:59 -08:00
mpilman 3bd9b9047b Minor fixes - flow now compiling with intellisense 2019-02-19 15:16:59 -08:00
Evan Tschannen 065a45e05f Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Vishesh Yadav 0898686c9b Remove old TODO 2019-02-18 15:43:27 -08:00
Evan Tschannen 62603d11a1 updated the killRegion simulation test to test a much larger variety of failure scenarios 2019-02-18 15:32:51 -08:00
Vishesh Yadav e05b53d755 Merge remote-tracking branch 'apple/master' into task/tls-upgrade 2019-02-15 20:37:07 -08:00
Vishesh Yadav 345fd7e4da Prefer unencrypted ports at client side during transition 2019-02-15 20:23:07 -08:00
Evan Tschannen 83060c6e56
Merge pull request #1062 from jzhou77/PR
Add a new DataDistributor role.
2019-02-15 13:51:27 -08:00
mpilman 75f692b931 simplify actorcompiler and target to compile coveragetool 2019-02-15 00:01:42 -08:00
Jingyu Zhou c35d1bf2ef Fix according Alex's comment 2019-02-14 16:30:13 -08:00
Jingyu Zhou 886e7ab2ba Add a new DataDistributor role.
Let cluster controller to start a new data distributor role by sending a
message to a chosen worker.
Change MasterInterface usage in DataDistribution to masterId

Add DataDistributor rejoin handling.

This allows the data distributor to tell the new cluster controller of its
existence so that the controller doesn't spawn a new one. I.e., there should
be only ONE data distributor in the cluster.

If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries
to recruit one as DD. CC also monitors DD and restarts one if it failed.

The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for
the new DD.

Add GetRecoveryInfo RPC to master server, which is called by data distributor
to obtain the recovery Transaction version from the master server.
2019-02-14 16:30:13 -08:00
Vishesh Yadav 907446d0ce Merge remote-tracking branch 'apple/master' into task/tls-upgrade 2019-02-14 11:37:38 -08:00
A.J. Beamon 9272a41e5f
Merge pull request #1146 from atn34/fix-actor-warning
Fix actor warning for cmake build
2019-02-13 11:01:37 -08:00
Andrew Noyes 3a38bff8ee Use DISABLE_ACTOR_WITHOUT_WAIT_WARNING consistently 2019-02-13 10:30:35 -08:00
Andrew Noyes 067a445e06 Replace unused _ variables with wait(success(...)) 2019-02-12 17:30:30 -08:00
Andrew Noyes 874a58cb4f Suppress actor without wait for tests in cmake 2019-02-12 11:01:17 -08:00
mpilman 8a94d80deb fdbservice and fdbrpc now compiling 2019-02-07 15:37:04 -08:00
Evan Tschannen 486e0e13c3
Merge pull request #1116 from alexmiller-apple/tstlog
Random cleanups that prepare for Spill-By-Reference TLog
2019-02-05 18:09:06 -08:00
A.J. Beamon 882f8d70b7
Merge pull request #1066 from etschannen/master
fix: coordinators auto could put two coordinators in the same zone
2019-02-05 11:52:04 -08:00
Alex Miller 6668b7c544 Make simulation enforce what KAIO requires. 2019-02-04 18:04:22 -08:00
Evan Tschannen e9ddd94e27 The failure monitor is given a list of all IP addresses associated with a process
The connect packet includes the correct remote address
Did a lot of code cleanup
Simulation test mixed TLS and non-TLS listeners on the same process
2019-01-31 18:20:14 -08:00
Balachandar Namasivayam 9cf2b4e1e7 Improve TLS logging on error scenarios. 2019-01-29 17:04:09 -08:00
A.J. Beamon 05b38167d0
Update fdbrpc/sim2.actor.cpp
Co-Authored-By: etschannen <36455792+etschannen@users.noreply.github.com>
2019-01-29 11:35:02 -08:00
Trevor Clinkenbeard 2e0b3a7f1d Added ProcessClass::CoordinatorClass, which can be used by coordinators, so that coordinators do not have to take on other roles if desired 2019-01-25 11:03:13 -08:00
Evan Tschannen 1d7fec3074 Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2
# Conflicts:
#	.gitignore
2019-01-24 17:43:06 -08:00
Evan Tschannen 9cf77d70bc fix: getFirstLocalAddress has to be the same as primary address, because it is what we put in the connect packet, and we always connect from the primary address 2019-01-24 17:28:26 -08:00
Evan Tschannen 699f8dd617 fix: coordinators auto could put two coordinators in the same zone
simulation now tests two machines in the same zone
2019-01-18 15:42:48 -08:00
Evan Tschannen 4eb11d74af
Merge pull request #1029 from bnamasivayam/reenable-check_desired_classes
Re-enable CheckDesiredClasses after making necessary changes for mult…
2019-01-11 17:15:05 -08:00
Balachandar Namasivayam a8e2e75cd5 Re-enable CheckDesiredClasses after making necessary changes for multi-region setup.
Fixed a couple of bugs
1) A rare race condition where a worker is being roles even after it died.
2) Fix how RoleFitness is calculated for TLog and LogRouter. Only worst fitness is compared to see if a better fit is available.
2019-01-10 10:28:32 -08:00
Evan Tschannen 684a22a52b Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/BackupCorrectness.actor.cpp
#	versions.target
2019-01-09 16:14:46 -08:00
Vishesh Yadav 31c4ac07ac WIP: FailureMonitoring use endpointAddressList (create individual endpoints for each address) WIP: g_currentDeliveryPeerAddress WIP: FlowTransport endpoint map WIP: Add peerReference to addressToEndpointMap 2019-01-09 07:46:01 -08:00
Vishesh Yadav 51b89ae083 WIP 2019-01-09 07:41:02 -08:00
Alex Miller cebdb83def Revert "Merge pull request #977 from alexmiller-apple/abspath"
This reverts commit 9881b1d074, reversing
changes made to 6d278e466b.
2019-01-08 16:52:09 -08:00
Evan Tschannen 57293a2db0 byte sample recovery did not use limits for its range reads, leading to slow tasks 2019-01-04 10:32:31 -08:00
Andrew Noyes d5430d7bf8 Remove ignore "-Wreturn-local-addr" pragma
This seems to still build on gcc 8
2019-01-03 13:55:17 -08:00
Markus Pilman dbe9baff1f Several small compilation fixes for new versions of gcc
There are several missing includes for cmath in the code, I added those.

Next, Coro returns a reference to a stack variable and this causes a
warning. As this is probably ok for Coro, I disabled the warning in
that file for GCC. I want to have this warning in the build system as
it is generally a very useful warning to have.

Another change is that major and minor are deprecated for a while now.
I replaced those with gnu_dev_major and gnu_dev_minor.

ErrorOr currently implements operators ==, !=, and <. These do not
compile because Error does not implement ==. This compiles on older
versions of gcc and clang because ErrorOr<T>::operator== is not used
anywhere. It is still wrong though and newer gcc versions complain.
I simply removed these methods.

The most interesting fix is that TraceEvent::~TraceEvent is currently
throwing exceptions. This is illegal behavior in C++11 and a idea in
older versions of C++. For now I simply removed the throw, but this
might need some more thought.
2019-01-03 12:44:19 -08:00
Bhaskar Muppana aa2a76ef4c
Merge pull request #981 from alexmiller-apple/cmake
Add a CMake build system
2019-01-02 18:50:15 -08:00
A.J. Beamon d8f33a2419 Add parentheses to bitwise ops (turned up by clang after recent change) 2019-01-02 10:15:59 -08:00
anoyes 6a4d87802b Replace & operator with variadic function 2018-12-28 11:33:42 -08:00
Steve Atherton 9881b1d074
Merge pull request #977 from alexmiller-apple/abspath
Use abspath when dealing with the simulator file-cache
2018-12-20 14:56:38 -08:00
Vishesh Yadav 209ecd09ee Keep local addresses in a vector 2018-12-17 11:25:44 -08:00
Meng Xu 486a7b04fa TeamCollection: Fix build in osX
In osX, we cannot adding unsigned long to a string to append to the string.
2018-12-14 13:44:11 -08:00
Markus Pilman 4ae701d8a9 minor bugfix to look up correct filename in cache
(manually cherry-picked from flat-buffers branch)
2018-12-13 22:21:25 -08:00
Markus Pilman 0207831fd6 Use abspath when dealing with the simulator file-cache
The simulator uses a hash table to cache all open files to make sure
that several simulated processes don't open the file more than once.
This currently doesn't work properly and deleted files are often kept
open forever. As a result, we often ran out of file descriptors.

The problem is luckily quite simple: files are often opened with an
absolute path but later a relativ path is passed for deletion. This
is not working because the map that is used to store the file
descriptors is not aware of paths - so deleted files are often not
removed from this map. The fix that works for us is to just always
work with absolute paths when adding and removing files from this map.
2018-12-13 22:21:06 -08:00
Alex Miller a982b9da72 Additional changes from a merge commit. 2018-12-13 17:13:41 -08:00
Alex Miller e70e59a895 Change some file locations. 2018-12-13 14:53:19 -08:00
Markus Pilman dce290909d fdbserver now compiling 2018-12-13 14:13:47 -08:00
mpilman 51beb8b48c fdbrpc compiling with cmake 2018-12-13 14:02:16 -08:00
Vishesh Yadav e04abf25f7 simulator: Support multiple listeners on single process
Sim2Listener can now take the network address to listen on. This is
used to listen to multiple ports in simulator and test the patch
which added multiple network addresses to single endpoint.
2018-12-13 13:36:52 -08:00
Vishesh Yadav 3eb9b23024 Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint
- This patch will make FDB listen to multiple addresses given via
  command line. Although, we'll still use first address in most places,
  this patch starts using vector<NetworkAddress> in Endpoint at some basic
  places.
- When sending packets to an endpoint, pick a random network address in
  endpoints
- Renames Endpoint::address to Endpoint::addresses since it
  now holds a vector of addresses.
2018-12-13 13:36:52 -08:00
Vishesh Yadav 43e5a46f9b Change Endpoint::address(NetworkAddress) to vector<NetworkAddress>
Extend `Endpoint` class to take multiple NetworkAddresses instead of
just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll
have multiple IP:PORT pairs.

This patch simply adds the field and makes changes to compile the
codebase. The first element of of `address` field is used everywhere.
Hence the way we talk to remains same with this patch.

NOTE:

Directly accessing the first memeber of Endpoint::address is unsafe
as Endpoint() doesn't enforces non-empty address list. However, since
the correctness test pass for now and are anyway replacing all those
unsafe accesses with ones considering the whole vector, this patch
ignores to access them in safe way.
2018-12-13 13:36:52 -08:00
Vishesh Yadav e8e01b2406 Remove unused localAddress parameter from newNet2 and Net2 classes 2018-12-13 13:36:52 -08:00
Evan Tschannen d9626895b1
Merge pull request #964 from xumengpanda/mengxu/teamcollection-release
TeamCollection: Use machine teams to create server teams to increase availability at scale when a machine has multiple servers
2018-12-13 13:18:54 -08:00
Meng Xu e069b5c31c TeamCollection: Use clang format
No functional change.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-06 11:39:35 -08:00
Evan Tschannen d2d68aa171 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/ManagementAPI.actor.cpp
#	versions.target
2018-12-03 18:26:52 -08:00
Evan Tschannen 55a9c4a0f0
Merge pull request #955 from ajbeamon/fix-bad-error-creation-and-whitespace
throw platform_error; -> throw platform_error();. Convert some spaces to tabs.
2018-12-03 15:12:37 -08:00
A.J. Beamon 50c9dfdd01 Errors that occur in platform that are the result of IO issues are now raised as io_error rather than platform_error. 2018-11-30 10:55:19 -08:00
A.J. Beamon 97847f517b throw platform_error; -> throw platform_error();. Convert some spaces to tabs. 2018-11-28 12:56:57 -08:00
Meng Xu 8de031f9a6 TeamCollection: clang-format
Format the changes with git clang-format.
No functional changes.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-21 11:18:26 -08:00
Meng Xu f7a7e069f0 TeamCollection: Remove unnecessary comments
Pass 41806 tests with no failure

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:56:35 -08:00
Meng Xu 73c58852f0 TeamCollection: Resolve code review comments
Resolve code review comments:
1) Improve the code efficiency by avoiding unnecessary map search
   and avoiding unnecessary checking
2) Remove or comment out trace events when they can be spammy
3) Improve coding style

Tested for 1 hour and no error was found.
KillRegionCycle.txt test was excluded from the test because
existing code cannot pass that test either

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:55:33 -08:00
Meng Xu 5051b35c61 TeamCollection: Use machine team to create server team
Current server team collection logic does not consider
the fact that multipe storage servers can run on the same machine.
When multiple machines fail, all servers on the machines will fail, and
the possibility of having one process team fail and lose data is very high.

To reduce the possibility of losing data when multiple machine fails,
we first create machine teams which span across different fault zones;
we then create server teams based on machine teams by
first picking 1 machine team, and then
picking 1 server from each machine in the machine team.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:53:22 -08:00
Evan Tschannen 4b5d0b4e2c Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/AsyncFileBlobStore.actor.cpp
#	fdbclient/AsyncFileBlobStore.actor.h
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/BlobStore.h
#	fdbclient/HTTP.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/batcher.actor.h
#	fdbrpc/fdbrpc.vcxproj
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen 6f4ad84777
Merge pull request #903 from ajbeamon/move-batcher-into-proxy
Move the sort of generic batcher from fdbrpc and make it specific to …
2018-11-10 09:56:03 -08:00
Evan Tschannen b8381b3cea Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0 2018-11-10 09:51:49 -08:00
A.J. Beamon 67a152ae9f Move the sort of generic batcher from fdbrpc and make it specific to batching commits in master proxy. Also a couple minor formatting changes. 2018-11-09 14:19:18 -08:00
Evan Tschannen 56c51c1bb3 fix: usableRegions was uninitialized 2018-11-09 10:17:35 -08:00
Stephen Atherton 9d73166b3b Many bug fixes related to concurrent page operations and pager shutdown. 2018-11-06 19:31:16 -08:00
Evan Tschannen 87295cc263 suppressed spammy trace events, and avoid reporting a long master recovery duration when the cluster is first created 2018-11-04 23:07:56 -08:00
Evan Tschannen bf6545a9cf clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations
allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary
reduced unnecessary state variables in waitMetrics requests
2018-11-02 13:15:09 -07:00
Stephen Atherton df3bdde50b Many bug fixes. AsyncFileCached write() on a page with a zero-copy read in progress would orphan the old page before the read was finished. Pager file operations were not converting page id to int64 for byte offset calculation. Pager was not calling releaseZeroCopy() after readZeroCopy() if there was an error or cancellation. Pager reads were using some variables that could go out of scope. BusyPage's mechanism for notifying when a physical page is no longer in use is itself no longer in use and therefore removed. Pager shutdown now cancels all outstanding reads. Improved some debug output. 2018-10-31 02:14:55 -07:00
A.J. Beamon 776b289bfe Move AsyncFileBlobStore and related files to fdbclient. 2018-10-26 13:49:42 -07:00
A.J. Beamon 58a0e22d3c Remove sim2 dependency on fdbclient:
* Remove unused 'exclusionSet' that used a type from fdbclient.
* Replace usages of describe(x) with x.toString().

Also removed some using statements.
2018-10-26 09:23:12 -07:00
Alex Miller 6bb1f4093d
Merge pull request #856 from dropbox/pr/include-fix
Adjust all includes to be relative to the root.
2018-10-22 09:51:55 -07:00
Alex Miller e2fc1c9b95 Remove specifying non-root directory as a path to search for includes. 2018-10-19 18:56:45 -07:00
Evan Tschannen 1ef29cbf0d more windows build fixes 2018-10-19 17:00:24 -07:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen db71b60d72
Merge pull request #819 from satherton/feature-redwood
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen 0217aed74c Merge branch 'release-6.0'
# Conflicts:
#	bindings/go/README.md
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2018-10-15 18:38:51 -07:00
A.J. Beamon a963ff7a64 Fix line endings 2018-10-08 09:30:09 -07:00
Stephen Atherton 22f8a4efa9 Normalized all unit test names to begin with "/" if they should be included in random unit testing. 2018-10-05 22:09:58 -07:00
A.J. Beamon 664f64881c Port truncate optimization from Snowflake PR in order to make quick changes for a patch release. 2018-10-05 15:05:26 -07:00
Stephen Atherton 7c1dc305cb Merge commit 'a72c8f5cb2e79a673abc0ed3d27ef1c51028fb13' into feature-redwood 2018-10-05 10:15:10 -07:00
Evan Tschannen 3922e477a5 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/LogSystemDiskQueueAdapter.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
A.J. Beamon 92990d6aef Merge release-6.0 into master 2018-09-21 16:14:39 -07:00
Evan Tschannen 77e2fb787e Merge branch 'release-6.0' into feature-fix-forced-recovery 2018-09-21 14:55:37 -07:00
Stephen Atherton 2fc86c5ff3 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbrpc/AsyncFileCached.actor.h
#	fdbserver/IKeyValueStore.h
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-09-20 03:39:55 -07:00
Evan Tschannen 42a67efb0c the cluster controller should prefer to be located on a transaction class machine over a storage server class machine 2018-09-19 18:04:59 -07:00
Evan Tschannen 200e65fe61 added a workload which tests killing an entire region, and recovering from the failure with data loss.
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Evan Tschannen 4dd2dda0a3 Merge branch 'release-6.0'
# Conflicts:
#	fdbserver/worker.actor.cpp
2018-09-05 16:11:06 -07:00
Evan Tschannen df406a340e
Merge pull request #742 from ajbeamon/roles-in-trace-events
Add the roles running on a process as a field on trace events in the …
2018-09-05 16:08:12 -07:00
Evan Tschannen 90301f497f Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/TLSConnection.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	versions.target
2018-09-05 16:06:33 -07:00
A.J. Beamon 2de0b5d6d7 Add the roles running on a process as a field on trace events in the form of a comma delimited string of role abbreviations. 2018-09-05 15:06:14 -07:00
Evan Tschannen dcdbb3ec4d Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-movekey-fixes 2018-09-05 10:29:13 -07:00
Evan Tschannen 21f5cf9ce9 suppress spammy trace events 2018-09-04 17:12:26 -07:00
Steve Atherton 89dd9cc4a3 Cherry-pick pull request #717 to release-6.0
Which contains:
* Improve TLS cert refresh logging.
* Loading a mismatching cert shouldn't prevent TLS connections.
* Initialize the cached copy of ca/cert/key data.
* Open certificates as uncached, which means they can be write-protected.
2018-08-23 16:53:40 -07:00
Steve Atherton 365fe992b4
Merge pull request #717 from alexmiller-apple/tls-refresh-fixes
Fix certificate reloading issues
2018-08-22 15:09:12 -07:00
Evan Tschannen 717c43a69f merge 6.0 into master 2018-08-22 00:28:04 -07:00
Alex Miller d2da969412 Improve TLS cert refresh logging.
Explicitly call out failure/success, and surface repeated cert
mismatches.
2018-08-21 15:05:41 -07:00
Alex Miller 4113b36df7 Loading a mismatching cert shouldn't prevent TLS connections.
set_{cert,key,ca}_data returns pass/fail and not throw.  The existing
code wrongly assumed that they threw.
2018-08-21 15:02:54 -07:00
Evan Tschannen 26ec6ebac8 fixed line endings 2018-08-21 14:58:26 -07:00
Evan Tschannen 712aa00261 a better fix to the windows build issue 2018-08-21 14:54:38 -07:00
Alex Miller 4caacaaf4e I would like to atone for my sins. But later.
This fixes the windows build.  For some reason, MSVC believes that the
actor-compiled version of networkSender actually exists, but the
non-actor-compiled version doesn't exist.

This is a hackish workaround, as the largest reason to not include a
.g.h file is because it defines a POST_ACTOR_COMPILER define that messes
with actorcompiler.h's #defines.  We can just undefine that after
including the file.   ...but carefully.
2018-08-20 20:33:38 -07:00
Alex Miller 3ece3cf301 Initialize the cached copy of ca/cert/key data.
This was just purely an accidental oversight from before.  The variables
were there and handled like they were actually initilized with the
contents of the various certificate files at start-up, but never
actually were.

And add a few trace events to make it easy to see when the system
noticed and tried to reload certificate data.
2018-08-20 19:09:34 -07:00
Alex Miller fd866a3b47 Open certificates as uncached, which means they can be write-protected.
OPEN_READONLY still opens the file as read-write.  To actually be
read-only, one needs to open the file as READONLY and UNCACHED.
2018-08-20 19:07:58 -07:00
Alex Miller 63b1e85338 Ban `Void _ = wait(...)` constructions, and require just `wait(...)`.
There's never any reason to save the value of a Void return, and it's
the easiest source of redefined variable bugs that will creep back in
over time.  So just `wait(...)`, it's cleaner that way.
2018-08-14 15:50:26 -07:00
Alex Miller 86dbe1f0e9 Fix more instances of actorcompiler.h being in the wrong place. 2018-08-14 15:50:26 -07:00
Alex Miller 7feb5d8209 Remove including flow.h in actorcompiler.h, and fix resulting breakage.
For files that required flow.h, and only got it through actorcompiler.h,
their version of flow.h would have the actorcompiler #defines defined.
Then, if it included a STL/boost file, the same breakage would result.

This needs to not happen, so the include of flow.h in actorcompiler.h
was removed.
2018-08-14 15:50:26 -07:00
Alex Miller bca324eaa6 More actorcompiler.h fixes and additions. 2018-08-14 15:50:26 -07:00
Alex Miller fb31a6999f Rewrite all files to have #include actorcompiler.h as the last include. 2018-08-14 15:50:26 -07:00
Alex Miller 07e5281142 Restrict actor keyword #defines to actor files.
This introduces a new rule in our codebase, that any file that #includes
actorcompiler.h needs to do it as the last #include, and it needs to
then #include unactorcompiler.h at the end of the file.

The point of this is that it prevents our actorcompiler.h #defines from
leaking into boost or the c++ standard library.  Both of these start
throwing errors if you s/state// their code, which `#define state `
effectively does.
2018-08-14 15:50:26 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen cdcf056aef Merge branch 'release-6.0' 2018-08-14 09:43:51 -07:00
A.J. Beamon 168dce94cb Remove some trace event suppressions that were happening off the network thread. Downgrade some trace events related to trace logging problems from SevError to SevWarnAlways. 2018-08-14 09:00:43 -07:00
Evan Tschannen 3186fac397 Make sure we still accept some connections even if we are CPU bound by high priority work 2018-08-10 17:47:21 -07:00
A.J. Beamon 574c5576a2 Merge branch 'release-6.0' of github.com:apple/foundationdb
# Conflicts:
#	fdbrpc/TLSConnection.actor.cpp
#	versions.target
2018-08-10 14:31:58 -07:00
A.J. Beamon 3535ddad80
Merge pull request #674 from alexmiller-apple/glibcxx-debug-fixes
Fix bugs uncovered by -D_GLIBCXX_DEBUG
2018-08-09 08:18:51 -07:00
A.J. Beamon 24dec1529b
Merge pull request #673 from etschannen/release-6.0
A variety of bug fixes and performance improvements
2018-08-07 10:55:46 -07:00
Alex Miller ff0e14d5a7 Fix a compilation error on windows. 2018-08-06 18:36:01 -07:00
Evan Tschannen b5a133865d Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0
# Conflicts:
#	fdbrpc/TLSConnection.actor.cpp
2018-08-06 18:26:54 -07:00
Evan Tschannen 22f2a1fedd Merge pull request #676 from etschannen/master
fix: we should not free statdata ourselves, it will be deleted by libeio itself
2018-08-06 18:08:45 -07:00
Steve Atherton fb46385a39 Merge pull request #628 from alexmiller-apple/reloadcertificates
Reload certificates if changed.

This is a cherry-pick of #628 back to release-6.0
2018-08-06 18:04:04 -07:00
Evan Tschannen 56e0b729c8 fix: we should not free statdata ourselves, it will be deleted by libeio itself 2018-08-06 17:46:53 -07:00
Alex Miller d99592f8bd Fix an out-of-bounds vector access. 2018-08-06 12:50:34 -07:00
Evan Tschannen 6f328d41ac suppressed spammy trace events 2018-08-06 12:12:55 -07:00
Evan Tschannen 538e684f1c Merge branch 'release-6.0'
# Conflicts:
#	versions.target
2018-08-03 11:41:46 -07:00
Evan Tschannen 2619234477 Merge branch 'release-5.2' into release-6.0
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2018-08-03 11:40:24 -07:00
Evan Tschannen 21fe6adac4 fix: give time to do other work between accepting connections. It is expensive to accept TLS connections, so we have a slow task (which can kill other connections) if we accept too many connections in a row 2018-08-03 11:37:10 -07:00
Alex Miller 1a7cda4149 Stop performing self-moves. (e.g. a = std::move(a))
self-moves are frowned upon in C++, and in our code this generally happens from
calls to swap as part of trying to implement a "unordered erase" function via
swap-to-the-end-and-pop_back.  For convenience, a swapAndPop() function is now
offered that performs this, while disallowing self-moves.
2018-08-01 18:09:54 -07:00
Evan Tschannen 1c29275672 call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details. 2018-08-01 14:30:57 -07:00
Alex Miller f70f204d55 Fix a compilation error on windows. 2018-07-30 17:13:37 -07:00
Evan Tschannen 28a26d54f2 Merge commit 'ccf4384c79d026edbf76152e95e7410ebe621c1f' into release-6.0
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbrpc/FlowTransport.actor.cpp
2018-07-28 09:11:31 -07:00
Evan Tschannen fa3b61508c fix: do not increase numIncompatibleConnections if the connect was already incompatible 2018-07-28 08:50:54 -07:00
Stephen Atherton 4379a58bbe Suppress potentially spammy event and don't log cancellation errors. 2018-07-27 21:03:10 -07:00
Stephen Atherton 59e005485d Fixed bug where incompatible connection count was sometimes decremented twice for the same peer. 2018-07-27 20:48:14 -07:00
Stephen Atherton 6a3834c3f8 Fixed memory leak when destroying a FlowTransport. 2018-07-27 20:46:54 -07:00
Stephen Atherton c593d1c6a2 Bug fix causing clients to sometimes (rarely) not reconnect to upgraded clusters. Reliable packets were being dropped to incompatible peers intentionally, but now this is only done if the peer is newer since successful communication with a newer peer will never be possible. 2018-07-27 20:42:06 -07:00
Steve Atherton d1a877039d
Merge pull request #628 from alexmiller-apple/reloadcertificates
Reload certificates if changed.
2018-07-26 17:21:23 -07:00
Stephen Atherton 40762d9f9b Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-07-25 17:58:52 -07:00
Evan Tschannen 95bc695f0e Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0 2018-07-25 13:06:54 -07:00
Evan Tschannen 89a3e2e1b4 Backed out connection closing changes because of upgrade problems 2018-07-25 13:06:13 -07:00
Alex Miller 262af775eb Implement overly simple file write timestamps for simulation, and clean up code. 2018-07-24 17:20:31 -07:00
Alex Miller 168496f819 Poll the certificate files if TLS is enabled and reload them if changed.
This allows certificates to be changed/updated without having to restart fdbserver.
2018-07-20 19:00:32 -07:00
Alex Miller 2d26e98d07 Add a cross-platform getLastWrite() to get a file's mtime. 2018-07-20 19:00:32 -07:00
A.J. Beamon a7a1124c11 Fix incompatible connection accounting that was incorrectly decrementing the incompatible count in some cases. 2018-07-17 11:36:05 -07:00
A.J. Beamon 8879954254
Merge pull request #609 from etschannen/release-6.0
Improved simulation strength by only remove datacenters that have been killed
2018-07-16 15:59:28 -07:00
Evan Tschannen e0caa28758 code cleanup 2018-07-16 15:56:43 -07:00
AlvinMooreSr aafb3c5c00
Merge pull request #593 from AlvinMooreSr/release-6.0-tls-funct
Replaced separate TLS Log function with FDB TraceEvent logger
2018-07-16 12:01:02 -07:00
Evan Tschannen f72a9f60c0 only disable fearless if a datacenter has actually been killed
fix: we must prevent recovery into the dead datacenter while reducing usable_regions
2018-07-16 10:06:57 -07:00
Alvin Moore a034acf3bd Replaced separate TLS Log function with FDB TraceEvent logger 2018-07-11 18:41:46 -07:00
Stephen Atherton 96389c74cd Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 16:42:34 -07:00
Alec Grieser d5a23642a1
Merge pull request #587 from etschannen/feature-remote-logs
close unneeded connections
2018-07-10 13:27:15 -07:00
Evan Tschannen a35d5e30d9 Added a SevError trace event in case peer references becomes negative 2018-07-10 13:26:28 -07:00
Evan Tschannen c25be5699a close unneeded connections 2018-07-10 13:10:29 -07:00
Alec Grieser be9c34c6f8
Merge remote-tracking branch 'upstream/release-5.2' into merge-release-5.2 2018-07-10 10:04:48 -07:00
Alec Grieser ad37b1693d
Merge pull request #585 from etschannen/feature-remote-logs
A variety of cleanup and test strengthening commits
2018-07-10 09:58:44 -07:00
AlvinMooreSr b3916a9b71
Merge pull request #409 from joelarmstrong/tlsconnection-clang-ub-warning
Fix compilation with clang from Apple LLVM 9.1.0
2018-07-10 09:32:24 -07:00
Stephen Atherton 1bc95862b7 Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 04:16:02 -07:00
Evan Tschannen 82cc30be62 added testing for two_satellite_fast and two_satellite_safe 2018-07-09 22:01:46 -07:00
Stephen Atherton fddb3e87e2 Differentiate between a timeout in attempting to connect vs a timeout on an active connection by converting timeouts during connection attempts to connection_failed errors. 2018-07-09 19:40:01 -07:00
Stephen Atherton 3ce7c78d36 If an HTTP request fails due to a connection failure or a timeout, do not convert the error to the more generic http_request_failed. 2018-07-09 18:58:33 -07:00
Evan Tschannen e503dc975c fix: destroy peers that are inactive
do not open new connections to send replies
2018-07-09 13:37:06 -07:00
Evan Tschannen 5a2cb3037b merge 5.2 into 6.0 2018-07-08 20:14:06 -07:00
Evan Tschannen 0e97ce79b4 fix: destroy peers that are inactive
do not open new connections to send replies
2018-07-08 10:26:41 -07:00
Stephen Atherton a2f16e217e Memory waste fix, when a Peer disconnects an extra packet buffer block is allocated to copy unsent reliable bytes to even if there aren't any. 2018-07-06 19:44:30 -07:00
Evan Tschannen 6d7172ef7e fix: canKillProcesses did not take into account the remoteTLogPolicy when checking notEnoughLeft 2018-07-05 21:36:09 -07:00
Evan Tschannen 6f4ca2eba2 fix: get all processes did not include rebooting processes 2018-07-05 21:13:56 -07:00
Evan Tschannen cd4fb9285a waitForExlusion requires both regions to be healthy, which is only possible if we do not kill all logs in a region 2018-07-05 14:04:42 -07:00
Stephen Atherton 9d85a05372 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-05 12:52:06 -07:00
Stephen Atherton 2cb0362102 AsyncFileCached now allows writing and truncation of whole pages previously read using readZeroCopy and not yet released without prior readers seeing the effects of the write. 2018-07-05 02:59:13 -07:00
Evan Tschannen 7315e5da55 fix: isExcluded and isCleared were exactly wrong
fix: isCleared should mean the process is dead
2018-07-05 02:22:22 -07:00
Evan Tschannen e17dfea3b6 fix: desiredTLogCount was used instead of getDesiredLogs(), which caused problems with recruitment when desiredTLogCount was -1.
canKillProcess logic was wrong.
We still need to configure usable_regions because if datacenterVersionDifference is too large we cannot complete data movement.
2018-07-04 16:22:32 -04:00
Stephen Atherton 2925b9b984 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-07-03 23:03:56 -07:00
Alvin Moore c3f88dbfe1 Merge branch 'master' of github.com:apple/foundationdb into tls-static 2018-07-01 23:13:57 -07:00
Alvin Moore 132e2d9267 Defined TLS build flags for projects
Updated TLS documentation
2018-07-01 22:49:39 -07:00
Stephen Atherton b95a2bd6c1 Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
# Conflicts:
#	flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen 899f880ce0 fix: log router class did not have the proper fitness for becoming the cluster controller 2018-06-28 23:20:01 -07:00
Alvin Moore 45849d1f95 Added support for no-op legacy TLS options 2018-06-27 09:25:05 -07:00
Alvin Moore 65d8b38ae9 Changed generic plugin code to work as expected plugin code except for TLS use case
Defined TLS plugin name constant
Changed TLS plugin name to get_tls_plugin
Fixed link script
Removed compilation flags from info make target
2018-06-26 16:01:25 -07:00
Alvin Moore ef8de426d3 Changed the TLS_DISABLED macro
Disable TLS within Windows until working
2018-06-26 12:08:32 -07:00
Evan Tschannen 0123627d67 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-06-22 10:43:07 -07:00
Evan Tschannen 5fc8199abc Swapped OkayFit and UnsetFit, because generally if machine classes are set on one machine they are set everywhere and it helps with wait_for_good_recruitment logic
wait_for_good_recruitment now requires that you have the desired count of each roll
remote recruitment is given a much longer wait_for_good_recruitment time interval, which does not start until enough remote machines have registered
2018-06-22 10:15:24 -07:00
Evan Tschannen 1dce97f28c Merge branch 'release-5.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/SimulatedCluster.actor.cpp
#	packaging/msi/FDBInstaller.wxs
#	versions.target
2018-06-21 17:05:11 -07:00
Balachandar Namasivayam d7dba11366 Throw tls_error instead of internal_error when not able to create a TLS connection. 2018-06-21 15:33:00 -07:00
Stephen Atherton e9e1e194f0 Added operation-specific rate controls to blob store interface. 2018-06-20 20:34:34 -07:00
Richard Low fff6a47c43 Validate certiicates by default 2018-06-20 14:04:03 -07:00
Alvin Moore f8ce1de601 Added support for compiling TLS into binaries 2018-06-20 09:21:23 -07:00
Stephen Atherton e5c48d453a Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-18 22:45:27 -07:00
Evan Tschannen 0913368651 added usable_regions to specify if we will replicate into a remote region
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Stephen Atherton 1eae9d621b Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-13 15:58:21 -07:00
Stephen Atherton 2878f30f29 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/IKeyValueStore.h
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Alex Miller 6c2cb25c53 Rename BestOtherFit -> OkayFit.
The previous order of fitness was

  BestFit > GoodFit > BestOtherFit > ...

which is baffling.  It's now:

  BestFit > GoodFit > OkayFit > ...

which won't break anyone's expectations.
2018-06-12 16:50:25 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen 48fbc407fd fix: we cannot kill all of the remote tlogs, because we still need their data to copy to the next generation in the same data center 2018-06-08 15:28:44 -07:00
A.J. Beamon 99c9958db7 Some more trace event normalization 2018-06-08 13:57:00 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Balachandar Namasivayam 529d0497f1 Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload.
Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.
2018-06-01 15:21:40 -07:00
A.J. Beamon d9c702a9e3 Merge release-5.1 into release-5.2 2018-05-30 09:09:55 -07:00
Joel Armstrong 7c35ea6ba1 Fix use of bool in va_start causing undefined behavior
The version of clang included in Apple LLVM 9.1.0 complains about
passing the bool parameter `is_error` to va_start, which causes make
to fail:

fdbrpc/TLSConnection.actor.cpp:370:16: error: passing an object that undergoes
      default argument promotion to 'va_start' has undefined behavior
      [-Werror,-Wvarargs]
        va_start( ap, is_error );
                      ^
This just switches is_error back to the type it gets promoted to (int).
2018-05-24 16:37:11 -07:00
A.J. Beamon 026458baf3 Merge release-5.2 into master 2018-05-23 15:32:56 -07:00
Richard Low 84ed35b01f Only log TLS verify failures if all verification fails; log failures at SevInfo 2018-05-21 10:58:59 -07:00
Richard Low 086700aeb1 Plumb through TLS key password to CLI and from environment 2018-05-21 10:56:10 -07:00
Evan Tschannen 520aaf731d merge release 5.2 into master 2018-05-10 14:33:08 -07:00
Evan Tschannen b5b8c5d587 fix: white space issue in getKnobDescription 2018-05-10 14:27:10 -07:00
Balachandar Namasivayam b2c32ea4f2 Add secure_connection param to BlobStore to configure security.
Default is https. Setting secure_connection=0 makes it http.
2018-05-10 13:53:46 -07:00
Evan Tschannen 7bca7b80e6 fixed merge conflicts 2018-05-10 09:13:41 -07:00
Evan Tschannen 8f984cb2c9 Merge branch 'release-5.2'
# Conflicts:
#	fdbrpc/TLSConnection.h
2018-05-10 09:13:22 -07:00
Evan Tschannen d3450ce5b0
Merge pull request #343 from bnamasivayam/tls-plugin
Tls plugin
2018-05-09 16:35:53 -07:00
Balachandar Namasivayam 479dbf4c04 Addressed review comments.
Remove redundant FDBLibTLS/ITLSPlugin.h.
2018-05-09 16:16:09 -07:00
Balachandar Namasivayam 0c2960a221 Use smart pointer instead of naked ones in set_peer_verify() method. 2018-05-09 14:53:01 -07:00
Balachandar Namasivayam 7591931a09 Revert "Make tls_verify_peers as a comma separated string of constraints."
This reverts commit 2033847e4b.
2018-05-09 14:40:36 -07:00
Balachandar Namasivayam 2033847e4b Make tls_verify_peers as a comma separated string of constraints. 2018-05-09 14:37:39 -07:00
Balachandar Namasivayam e8b7f4b190 Add password support for tls. 2018-05-08 20:46:31 -07:00
Balachandar Namasivayam 49af5d685b Restore previous behavior of not specifying peer_verify option means disable checking. 2018-05-08 18:54:44 -07:00
Balachandar Namasivayam d3b5cfb93c Support latest TLS plugin.
Add support for https in backup.
2018-05-08 16:28:13 -07:00
Evan Tschannen 7acdc314e4 Merge branch 'release-5.2'
# Conflicts:
#	fdbrpc/TLSConnection.actor.cpp
2018-05-08 13:22:53 -07:00
Evan Tschannen 1f6c6a886b Merge branch 'release-5.1' into release-5.2 2018-05-08 13:08:11 -07:00
Alvin Moore 9aa94e87a3 Renamed the default TLS search plugin 2018-05-07 17:01:14 -07:00
Alex Miller bc8e6acbe8 Fix the other half of simulation requiring a TLS Plugin.
This commit:
1. Restores --tls_plugin as a way to provide the path to the TLS plugin when running in simulation.
2. Removes the TLS Plugin as being required for 5% of tests.
3. Standardizes on 'sslEnabled' as a variable name.

And is a fix/improvement upon commit f7733d1b.

(1) previously didn't work, because we would create multiple new TLSOptions
instances and run init_plugin multiple times.  Only the first call would use
the argument specified on the command line.  To fix this, the TLSOptions
derived from the command line is threaded through all the simulation code that
needs it.

(2) was an oversight in f7733d1b, which didn't actually make "should we be TLS"
dependant on if the TLS plugin was available or not.

(3) is just nice for trying to grep around in the codebase.
2018-04-30 18:26:29 -07:00
Stephen Atherton af61d3596d Merge branch 'public-master' into feature-redwood
# Conflicts:
#	fdbserver/DatabaseConfiguration.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
2018-04-24 17:22:21 -07:00
Alex Miller f7733d1bd0 Do not require the TLS Plugin for simulation.
It appears that explicit calls to TLS-related things had snuck in over time,
which meant that simulation runs that weren't even configured to use SSL still
wanted and required the TLS plugin.

This commit instead threads through the understanding of if any TLS-related
options were provided, and if not, then don't call anything TLS-related so that
we don't require the TLS plugin.

Hopefully this makes life easier for the opensource folk. :)
2018-04-24 16:53:30 -07:00
Dennis Schafroth 290122637b Using ASSERT_ABORT in destructors 2018-04-23 14:05:10 +02:00
Evan Tschannen c1ccc8522c Merge branch 'release-5.2' 2018-04-17 18:38:12 -07:00