Commit Graph

710 Commits

Author SHA1 Message Date
A.J. Beamon 9272a41e5f
Merge pull request #1146 from atn34/fix-actor-warning
Fix actor warning for cmake build
2019-02-13 11:01:37 -08:00
Andrew Noyes 3a38bff8ee Use DISABLE_ACTOR_WITHOUT_WAIT_WARNING consistently 2019-02-13 10:30:35 -08:00
Andrew Noyes 067a445e06 Replace unused _ variables with wait(success(...)) 2019-02-12 17:30:30 -08:00
Andrew Noyes 874a58cb4f Suppress actor without wait for tests in cmake 2019-02-12 11:01:17 -08:00
mpilman 8a94d80deb fdbservice and fdbrpc now compiling 2019-02-07 15:37:04 -08:00
Evan Tschannen 486e0e13c3
Merge pull request #1116 from alexmiller-apple/tstlog
Random cleanups that prepare for Spill-By-Reference TLog
2019-02-05 18:09:06 -08:00
A.J. Beamon 882f8d70b7
Merge pull request #1066 from etschannen/master
fix: coordinators auto could put two coordinators in the same zone
2019-02-05 11:52:04 -08:00
Alex Miller 6668b7c544 Make simulation enforce what KAIO requires. 2019-02-04 18:04:22 -08:00
Evan Tschannen e9ddd94e27 The failure monitor is given a list of all IP addresses associated with a process
The connect packet includes the correct remote address
Did a lot of code cleanup
Simulation test mixed TLS and non-TLS listeners on the same process
2019-01-31 18:20:14 -08:00
Meng Xu 550f2e2682 Merge with master to use the latest backup container
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
Balachandar Namasivayam 9cf2b4e1e7 Improve TLS logging on error scenarios. 2019-01-29 17:04:09 -08:00
Meng Xu 76e1ba2934 add blob_credential_file option 2019-01-29 16:00:52 -08:00
A.J. Beamon 05b38167d0
Update fdbrpc/sim2.actor.cpp
Co-Authored-By: etschannen <36455792+etschannen@users.noreply.github.com>
2019-01-29 11:35:02 -08:00
Trevor Clinkenbeard 2e0b3a7f1d Added ProcessClass::CoordinatorClass, which can be used by coordinators, so that coordinators do not have to take on other roles if desired 2019-01-25 11:03:13 -08:00
Evan Tschannen 1d7fec3074 Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2
# Conflicts:
#	.gitignore
2019-01-24 17:43:06 -08:00
Evan Tschannen 9cf77d70bc fix: getFirstLocalAddress has to be the same as primary address, because it is what we put in the connect packet, and we always connect from the primary address 2019-01-24 17:28:26 -08:00
Evan Tschannen 699f8dd617 fix: coordinators auto could put two coordinators in the same zone
simulation now tests two machines in the same zone
2019-01-18 15:42:48 -08:00
Evan Tschannen 4eb11d74af
Merge pull request #1029 from bnamasivayam/reenable-check_desired_classes
Re-enable CheckDesiredClasses after making necessary changes for mult…
2019-01-11 17:15:05 -08:00
Balachandar Namasivayam a8e2e75cd5 Re-enable CheckDesiredClasses after making necessary changes for multi-region setup.
Fixed a couple of bugs
1) A rare race condition where a worker is being roles even after it died.
2) Fix how RoleFitness is calculated for TLog and LogRouter. Only worst fitness is compared to see if a better fit is available.
2019-01-10 10:28:32 -08:00
Evan Tschannen 684a22a52b Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/BackupCorrectness.actor.cpp
#	versions.target
2019-01-09 16:14:46 -08:00
Vishesh Yadav 31c4ac07ac WIP: FailureMonitoring use endpointAddressList (create individual endpoints for each address) WIP: g_currentDeliveryPeerAddress WIP: FlowTransport endpoint map WIP: Add peerReference to addressToEndpointMap 2019-01-09 07:46:01 -08:00
Vishesh Yadav 51b89ae083 WIP 2019-01-09 07:41:02 -08:00
Alex Miller cebdb83def Revert "Merge pull request #977 from alexmiller-apple/abspath"
This reverts commit 9881b1d074, reversing
changes made to 6d278e466b.
2019-01-08 16:52:09 -08:00
Evan Tschannen 57293a2db0 byte sample recovery did not use limits for its range reads, leading to slow tasks 2019-01-04 10:32:31 -08:00
Andrew Noyes d5430d7bf8 Remove ignore "-Wreturn-local-addr" pragma
This seems to still build on gcc 8
2019-01-03 13:55:17 -08:00
Markus Pilman dbe9baff1f Several small compilation fixes for new versions of gcc
There are several missing includes for cmath in the code, I added those.

Next, Coro returns a reference to a stack variable and this causes a
warning. As this is probably ok for Coro, I disabled the warning in
that file for GCC. I want to have this warning in the build system as
it is generally a very useful warning to have.

Another change is that major and minor are deprecated for a while now.
I replaced those with gnu_dev_major and gnu_dev_minor.

ErrorOr currently implements operators ==, !=, and <. These do not
compile because Error does not implement ==. This compiles on older
versions of gcc and clang because ErrorOr<T>::operator== is not used
anywhere. It is still wrong though and newer gcc versions complain.
I simply removed these methods.

The most interesting fix is that TraceEvent::~TraceEvent is currently
throwing exceptions. This is illegal behavior in C++11 and a idea in
older versions of C++. For now I simply removed the throw, but this
might need some more thought.
2019-01-03 12:44:19 -08:00
Bhaskar Muppana aa2a76ef4c
Merge pull request #981 from alexmiller-apple/cmake
Add a CMake build system
2019-01-02 18:50:15 -08:00
A.J. Beamon d8f33a2419 Add parentheses to bitwise ops (turned up by clang after recent change) 2019-01-02 10:15:59 -08:00
anoyes 6a4d87802b Replace & operator with variadic function 2018-12-28 11:33:42 -08:00
Steve Atherton 9881b1d074
Merge pull request #977 from alexmiller-apple/abspath
Use abspath when dealing with the simulator file-cache
2018-12-20 14:56:38 -08:00
Vishesh Yadav 209ecd09ee Keep local addresses in a vector 2018-12-17 11:25:44 -08:00
Meng Xu 486a7b04fa TeamCollection: Fix build in osX
In osX, we cannot adding unsigned long to a string to append to the string.
2018-12-14 13:44:11 -08:00
Markus Pilman 4ae701d8a9 minor bugfix to look up correct filename in cache
(manually cherry-picked from flat-buffers branch)
2018-12-13 22:21:25 -08:00
Markus Pilman 0207831fd6 Use abspath when dealing with the simulator file-cache
The simulator uses a hash table to cache all open files to make sure
that several simulated processes don't open the file more than once.
This currently doesn't work properly and deleted files are often kept
open forever. As a result, we often ran out of file descriptors.

The problem is luckily quite simple: files are often opened with an
absolute path but later a relativ path is passed for deletion. This
is not working because the map that is used to store the file
descriptors is not aware of paths - so deleted files are often not
removed from this map. The fix that works for us is to just always
work with absolute paths when adding and removing files from this map.
2018-12-13 22:21:06 -08:00
Alex Miller a982b9da72 Additional changes from a merge commit. 2018-12-13 17:13:41 -08:00
Alex Miller e70e59a895 Change some file locations. 2018-12-13 14:53:19 -08:00
Markus Pilman dce290909d fdbserver now compiling 2018-12-13 14:13:47 -08:00
mpilman 51beb8b48c fdbrpc compiling with cmake 2018-12-13 14:02:16 -08:00
Vishesh Yadav e04abf25f7 simulator: Support multiple listeners on single process
Sim2Listener can now take the network address to listen on. This is
used to listen to multiple ports in simulator and test the patch
which added multiple network addresses to single endpoint.
2018-12-13 13:36:52 -08:00
Vishesh Yadav 3eb9b23024 Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint
- This patch will make FDB listen to multiple addresses given via
  command line. Although, we'll still use first address in most places,
  this patch starts using vector<NetworkAddress> in Endpoint at some basic
  places.
- When sending packets to an endpoint, pick a random network address in
  endpoints
- Renames Endpoint::address to Endpoint::addresses since it
  now holds a vector of addresses.
2018-12-13 13:36:52 -08:00
Vishesh Yadav 43e5a46f9b Change Endpoint::address(NetworkAddress) to vector<NetworkAddress>
Extend `Endpoint` class to take multiple NetworkAddresses instead of
just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll
have multiple IP:PORT pairs.

This patch simply adds the field and makes changes to compile the
codebase. The first element of of `address` field is used everywhere.
Hence the way we talk to remains same with this patch.

NOTE:

Directly accessing the first memeber of Endpoint::address is unsafe
as Endpoint() doesn't enforces non-empty address list. However, since
the correctness test pass for now and are anyway replacing all those
unsafe accesses with ones considering the whole vector, this patch
ignores to access them in safe way.
2018-12-13 13:36:52 -08:00
Vishesh Yadav e8e01b2406 Remove unused localAddress parameter from newNet2 and Net2 classes 2018-12-13 13:36:52 -08:00
Evan Tschannen d9626895b1
Merge pull request #964 from xumengpanda/mengxu/teamcollection-release
TeamCollection: Use machine teams to create server teams to increase availability at scale when a machine has multiple servers
2018-12-13 13:18:54 -08:00
Meng Xu e069b5c31c TeamCollection: Use clang format
No functional change.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-06 11:39:35 -08:00
Evan Tschannen d2d68aa171 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/ManagementAPI.actor.cpp
#	versions.target
2018-12-03 18:26:52 -08:00
Evan Tschannen 55a9c4a0f0
Merge pull request #955 from ajbeamon/fix-bad-error-creation-and-whitespace
throw platform_error; -> throw platform_error();. Convert some spaces to tabs.
2018-12-03 15:12:37 -08:00
A.J. Beamon 50c9dfdd01 Errors that occur in platform that are the result of IO issues are now raised as io_error rather than platform_error. 2018-11-30 10:55:19 -08:00
A.J. Beamon 97847f517b throw platform_error; -> throw platform_error();. Convert some spaces to tabs. 2018-11-28 12:56:57 -08:00
Meng Xu 8de031f9a6 TeamCollection: clang-format
Format the changes with git clang-format.
No functional changes.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-21 11:18:26 -08:00
Meng Xu f7a7e069f0 TeamCollection: Remove unnecessary comments
Pass 41806 tests with no failure

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:56:35 -08:00
Meng Xu 73c58852f0 TeamCollection: Resolve code review comments
Resolve code review comments:
1) Improve the code efficiency by avoiding unnecessary map search
   and avoiding unnecessary checking
2) Remove or comment out trace events when they can be spammy
3) Improve coding style

Tested for 1 hour and no error was found.
KillRegionCycle.txt test was excluded from the test because
existing code cannot pass that test either

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:55:33 -08:00
Meng Xu 5051b35c61 TeamCollection: Use machine team to create server team
Current server team collection logic does not consider
the fact that multipe storage servers can run on the same machine.
When multiple machines fail, all servers on the machines will fail, and
the possibility of having one process team fail and lose data is very high.

To reduce the possibility of losing data when multiple machine fails,
we first create machine teams which span across different fault zones;
we then create server teams based on machine teams by
first picking 1 machine team, and then
picking 1 server from each machine in the machine team.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:53:22 -08:00
Evan Tschannen 4b5d0b4e2c Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/AsyncFileBlobStore.actor.cpp
#	fdbclient/AsyncFileBlobStore.actor.h
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/BlobStore.h
#	fdbclient/HTTP.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/batcher.actor.h
#	fdbrpc/fdbrpc.vcxproj
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen 6f4ad84777
Merge pull request #903 from ajbeamon/move-batcher-into-proxy
Move the sort of generic batcher from fdbrpc and make it specific to …
2018-11-10 09:56:03 -08:00
Evan Tschannen b8381b3cea Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0 2018-11-10 09:51:49 -08:00
A.J. Beamon 67a152ae9f Move the sort of generic batcher from fdbrpc and make it specific to batching commits in master proxy. Also a couple minor formatting changes. 2018-11-09 14:19:18 -08:00
Evan Tschannen 56c51c1bb3 fix: usableRegions was uninitialized 2018-11-09 10:17:35 -08:00
Stephen Atherton 9d73166b3b Many bug fixes related to concurrent page operations and pager shutdown. 2018-11-06 19:31:16 -08:00
Evan Tschannen 87295cc263 suppressed spammy trace events, and avoid reporting a long master recovery duration when the cluster is first created 2018-11-04 23:07:56 -08:00
Evan Tschannen bf6545a9cf clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations
allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary
reduced unnecessary state variables in waitMetrics requests
2018-11-02 13:15:09 -07:00
Stephen Atherton df3bdde50b Many bug fixes. AsyncFileCached write() on a page with a zero-copy read in progress would orphan the old page before the read was finished. Pager file operations were not converting page id to int64 for byte offset calculation. Pager was not calling releaseZeroCopy() after readZeroCopy() if there was an error or cancellation. Pager reads were using some variables that could go out of scope. BusyPage's mechanism for notifying when a physical page is no longer in use is itself no longer in use and therefore removed. Pager shutdown now cancels all outstanding reads. Improved some debug output. 2018-10-31 02:14:55 -07:00
A.J. Beamon 776b289bfe Move AsyncFileBlobStore and related files to fdbclient. 2018-10-26 13:49:42 -07:00
A.J. Beamon 58a0e22d3c Remove sim2 dependency on fdbclient:
* Remove unused 'exclusionSet' that used a type from fdbclient.
* Replace usages of describe(x) with x.toString().

Also removed some using statements.
2018-10-26 09:23:12 -07:00
Alex Miller 6bb1f4093d
Merge pull request #856 from dropbox/pr/include-fix
Adjust all includes to be relative to the root.
2018-10-22 09:51:55 -07:00
Alex Miller e2fc1c9b95 Remove specifying non-root directory as a path to search for includes. 2018-10-19 18:56:45 -07:00
Evan Tschannen 1ef29cbf0d more windows build fixes 2018-10-19 17:00:24 -07:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen db71b60d72
Merge pull request #819 from satherton/feature-redwood
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen 0217aed74c Merge branch 'release-6.0'
# Conflicts:
#	bindings/go/README.md
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2018-10-15 18:38:51 -07:00
A.J. Beamon a963ff7a64 Fix line endings 2018-10-08 09:30:09 -07:00
Stephen Atherton 22f8a4efa9 Normalized all unit test names to begin with "/" if they should be included in random unit testing. 2018-10-05 22:09:58 -07:00
A.J. Beamon 664f64881c Port truncate optimization from Snowflake PR in order to make quick changes for a patch release. 2018-10-05 15:05:26 -07:00
Stephen Atherton 7c1dc305cb Merge commit 'a72c8f5cb2e79a673abc0ed3d27ef1c51028fb13' into feature-redwood 2018-10-05 10:15:10 -07:00
Evan Tschannen 3922e477a5 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/LogSystemDiskQueueAdapter.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
A.J. Beamon 92990d6aef Merge release-6.0 into master 2018-09-21 16:14:39 -07:00
Evan Tschannen 77e2fb787e Merge branch 'release-6.0' into feature-fix-forced-recovery 2018-09-21 14:55:37 -07:00
Stephen Atherton 2fc86c5ff3 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbrpc/AsyncFileCached.actor.h
#	fdbserver/IKeyValueStore.h
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-09-20 03:39:55 -07:00
Evan Tschannen 42a67efb0c the cluster controller should prefer to be located on a transaction class machine over a storage server class machine 2018-09-19 18:04:59 -07:00
Evan Tschannen 200e65fe61 added a workload which tests killing an entire region, and recovering from the failure with data loss.
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Evan Tschannen 4dd2dda0a3 Merge branch 'release-6.0'
# Conflicts:
#	fdbserver/worker.actor.cpp
2018-09-05 16:11:06 -07:00
Evan Tschannen df406a340e
Merge pull request #742 from ajbeamon/roles-in-trace-events
Add the roles running on a process as a field on trace events in the …
2018-09-05 16:08:12 -07:00
Evan Tschannen 90301f497f Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/TLSConnection.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	versions.target
2018-09-05 16:06:33 -07:00
A.J. Beamon 2de0b5d6d7 Add the roles running on a process as a field on trace events in the form of a comma delimited string of role abbreviations. 2018-09-05 15:06:14 -07:00
Evan Tschannen dcdbb3ec4d Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-movekey-fixes 2018-09-05 10:29:13 -07:00
Evan Tschannen 21f5cf9ce9 suppress spammy trace events 2018-09-04 17:12:26 -07:00
Steve Atherton 89dd9cc4a3 Cherry-pick pull request #717 to release-6.0
Which contains:
* Improve TLS cert refresh logging.
* Loading a mismatching cert shouldn't prevent TLS connections.
* Initialize the cached copy of ca/cert/key data.
* Open certificates as uncached, which means they can be write-protected.
2018-08-23 16:53:40 -07:00
Steve Atherton 365fe992b4
Merge pull request #717 from alexmiller-apple/tls-refresh-fixes
Fix certificate reloading issues
2018-08-22 15:09:12 -07:00
Evan Tschannen 717c43a69f merge 6.0 into master 2018-08-22 00:28:04 -07:00
Alex Miller d2da969412 Improve TLS cert refresh logging.
Explicitly call out failure/success, and surface repeated cert
mismatches.
2018-08-21 15:05:41 -07:00
Alex Miller 4113b36df7 Loading a mismatching cert shouldn't prevent TLS connections.
set_{cert,key,ca}_data returns pass/fail and not throw.  The existing
code wrongly assumed that they threw.
2018-08-21 15:02:54 -07:00
Evan Tschannen 26ec6ebac8 fixed line endings 2018-08-21 14:58:26 -07:00
Evan Tschannen 712aa00261 a better fix to the windows build issue 2018-08-21 14:54:38 -07:00
Alex Miller 4caacaaf4e I would like to atone for my sins. But later.
This fixes the windows build.  For some reason, MSVC believes that the
actor-compiled version of networkSender actually exists, but the
non-actor-compiled version doesn't exist.

This is a hackish workaround, as the largest reason to not include a
.g.h file is because it defines a POST_ACTOR_COMPILER define that messes
with actorcompiler.h's #defines.  We can just undefine that after
including the file.   ...but carefully.
2018-08-20 20:33:38 -07:00
Alex Miller 3ece3cf301 Initialize the cached copy of ca/cert/key data.
This was just purely an accidental oversight from before.  The variables
were there and handled like they were actually initilized with the
contents of the various certificate files at start-up, but never
actually were.

And add a few trace events to make it easy to see when the system
noticed and tried to reload certificate data.
2018-08-20 19:09:34 -07:00
Alex Miller fd866a3b47 Open certificates as uncached, which means they can be write-protected.
OPEN_READONLY still opens the file as read-write.  To actually be
read-only, one needs to open the file as READONLY and UNCACHED.
2018-08-20 19:07:58 -07:00
Alex Miller 63b1e85338 Ban `Void _ = wait(...)` constructions, and require just `wait(...)`.
There's never any reason to save the value of a Void return, and it's
the easiest source of redefined variable bugs that will creep back in
over time.  So just `wait(...)`, it's cleaner that way.
2018-08-14 15:50:26 -07:00
Alex Miller 86dbe1f0e9 Fix more instances of actorcompiler.h being in the wrong place. 2018-08-14 15:50:26 -07:00
Alex Miller 7feb5d8209 Remove including flow.h in actorcompiler.h, and fix resulting breakage.
For files that required flow.h, and only got it through actorcompiler.h,
their version of flow.h would have the actorcompiler #defines defined.
Then, if it included a STL/boost file, the same breakage would result.

This needs to not happen, so the include of flow.h in actorcompiler.h
was removed.
2018-08-14 15:50:26 -07:00
Alex Miller bca324eaa6 More actorcompiler.h fixes and additions. 2018-08-14 15:50:26 -07:00
Alex Miller fb31a6999f Rewrite all files to have #include actorcompiler.h as the last include. 2018-08-14 15:50:26 -07:00
Alex Miller 07e5281142 Restrict actor keyword #defines to actor files.
This introduces a new rule in our codebase, that any file that #includes
actorcompiler.h needs to do it as the last #include, and it needs to
then #include unactorcompiler.h at the end of the file.

The point of this is that it prevents our actorcompiler.h #defines from
leaking into boost or the c++ standard library.  Both of these start
throwing errors if you s/state// their code, which `#define state `
effectively does.
2018-08-14 15:50:26 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen cdcf056aef Merge branch 'release-6.0' 2018-08-14 09:43:51 -07:00
A.J. Beamon 168dce94cb Remove some trace event suppressions that were happening off the network thread. Downgrade some trace events related to trace logging problems from SevError to SevWarnAlways. 2018-08-14 09:00:43 -07:00
Evan Tschannen 3186fac397 Make sure we still accept some connections even if we are CPU bound by high priority work 2018-08-10 17:47:21 -07:00
A.J. Beamon 574c5576a2 Merge branch 'release-6.0' of github.com:apple/foundationdb
# Conflicts:
#	fdbrpc/TLSConnection.actor.cpp
#	versions.target
2018-08-10 14:31:58 -07:00
A.J. Beamon 3535ddad80
Merge pull request #674 from alexmiller-apple/glibcxx-debug-fixes
Fix bugs uncovered by -D_GLIBCXX_DEBUG
2018-08-09 08:18:51 -07:00
A.J. Beamon 24dec1529b
Merge pull request #673 from etschannen/release-6.0
A variety of bug fixes and performance improvements
2018-08-07 10:55:46 -07:00
Alex Miller ff0e14d5a7 Fix a compilation error on windows. 2018-08-06 18:36:01 -07:00
Evan Tschannen b5a133865d Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0
# Conflicts:
#	fdbrpc/TLSConnection.actor.cpp
2018-08-06 18:26:54 -07:00
Evan Tschannen 22f2a1fedd Merge pull request #676 from etschannen/master
fix: we should not free statdata ourselves, it will be deleted by libeio itself
2018-08-06 18:08:45 -07:00
Steve Atherton fb46385a39 Merge pull request #628 from alexmiller-apple/reloadcertificates
Reload certificates if changed.

This is a cherry-pick of #628 back to release-6.0
2018-08-06 18:04:04 -07:00
Evan Tschannen 56e0b729c8 fix: we should not free statdata ourselves, it will be deleted by libeio itself 2018-08-06 17:46:53 -07:00
Alex Miller d99592f8bd Fix an out-of-bounds vector access. 2018-08-06 12:50:34 -07:00
Evan Tschannen 6f328d41ac suppressed spammy trace events 2018-08-06 12:12:55 -07:00
Evan Tschannen 538e684f1c Merge branch 'release-6.0'
# Conflicts:
#	versions.target
2018-08-03 11:41:46 -07:00
Evan Tschannen 2619234477 Merge branch 'release-5.2' into release-6.0
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2018-08-03 11:40:24 -07:00
Evan Tschannen 21fe6adac4 fix: give time to do other work between accepting connections. It is expensive to accept TLS connections, so we have a slow task (which can kill other connections) if we accept too many connections in a row 2018-08-03 11:37:10 -07:00
Alex Miller 1a7cda4149 Stop performing self-moves. (e.g. a = std::move(a))
self-moves are frowned upon in C++, and in our code this generally happens from
calls to swap as part of trying to implement a "unordered erase" function via
swap-to-the-end-and-pop_back.  For convenience, a swapAndPop() function is now
offered that performs this, while disallowing self-moves.
2018-08-01 18:09:54 -07:00
Evan Tschannen 1c29275672 call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details. 2018-08-01 14:30:57 -07:00
Alex Miller f70f204d55 Fix a compilation error on windows. 2018-07-30 17:13:37 -07:00
Evan Tschannen 28a26d54f2 Merge commit 'ccf4384c79d026edbf76152e95e7410ebe621c1f' into release-6.0
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbrpc/FlowTransport.actor.cpp
2018-07-28 09:11:31 -07:00
Evan Tschannen fa3b61508c fix: do not increase numIncompatibleConnections if the connect was already incompatible 2018-07-28 08:50:54 -07:00
Stephen Atherton 4379a58bbe Suppress potentially spammy event and don't log cancellation errors. 2018-07-27 21:03:10 -07:00
Stephen Atherton 59e005485d Fixed bug where incompatible connection count was sometimes decremented twice for the same peer. 2018-07-27 20:48:14 -07:00
Stephen Atherton 6a3834c3f8 Fixed memory leak when destroying a FlowTransport. 2018-07-27 20:46:54 -07:00
Stephen Atherton c593d1c6a2 Bug fix causing clients to sometimes (rarely) not reconnect to upgraded clusters. Reliable packets were being dropped to incompatible peers intentionally, but now this is only done if the peer is newer since successful communication with a newer peer will never be possible. 2018-07-27 20:42:06 -07:00
Steve Atherton d1a877039d
Merge pull request #628 from alexmiller-apple/reloadcertificates
Reload certificates if changed.
2018-07-26 17:21:23 -07:00
Stephen Atherton 40762d9f9b Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-07-25 17:58:52 -07:00
Evan Tschannen 95bc695f0e Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0 2018-07-25 13:06:54 -07:00
Evan Tschannen 89a3e2e1b4 Backed out connection closing changes because of upgrade problems 2018-07-25 13:06:13 -07:00
Alex Miller 262af775eb Implement overly simple file write timestamps for simulation, and clean up code. 2018-07-24 17:20:31 -07:00
Alex Miller 168496f819 Poll the certificate files if TLS is enabled and reload them if changed.
This allows certificates to be changed/updated without having to restart fdbserver.
2018-07-20 19:00:32 -07:00
Alex Miller 2d26e98d07 Add a cross-platform getLastWrite() to get a file's mtime. 2018-07-20 19:00:32 -07:00
A.J. Beamon a7a1124c11 Fix incompatible connection accounting that was incorrectly decrementing the incompatible count in some cases. 2018-07-17 11:36:05 -07:00
A.J. Beamon 8879954254
Merge pull request #609 from etschannen/release-6.0
Improved simulation strength by only remove datacenters that have been killed
2018-07-16 15:59:28 -07:00
Evan Tschannen e0caa28758 code cleanup 2018-07-16 15:56:43 -07:00
AlvinMooreSr aafb3c5c00
Merge pull request #593 from AlvinMooreSr/release-6.0-tls-funct
Replaced separate TLS Log function with FDB TraceEvent logger
2018-07-16 12:01:02 -07:00
Evan Tschannen f72a9f60c0 only disable fearless if a datacenter has actually been killed
fix: we must prevent recovery into the dead datacenter while reducing usable_regions
2018-07-16 10:06:57 -07:00
Alvin Moore a034acf3bd Replaced separate TLS Log function with FDB TraceEvent logger 2018-07-11 18:41:46 -07:00
Stephen Atherton 96389c74cd Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 16:42:34 -07:00
Alec Grieser d5a23642a1
Merge pull request #587 from etschannen/feature-remote-logs
close unneeded connections
2018-07-10 13:27:15 -07:00
Evan Tschannen a35d5e30d9 Added a SevError trace event in case peer references becomes negative 2018-07-10 13:26:28 -07:00
Evan Tschannen c25be5699a close unneeded connections 2018-07-10 13:10:29 -07:00
Alec Grieser be9c34c6f8
Merge remote-tracking branch 'upstream/release-5.2' into merge-release-5.2 2018-07-10 10:04:48 -07:00
Alec Grieser ad37b1693d
Merge pull request #585 from etschannen/feature-remote-logs
A variety of cleanup and test strengthening commits
2018-07-10 09:58:44 -07:00
AlvinMooreSr b3916a9b71
Merge pull request #409 from joelarmstrong/tlsconnection-clang-ub-warning
Fix compilation with clang from Apple LLVM 9.1.0
2018-07-10 09:32:24 -07:00
Stephen Atherton 1bc95862b7 Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 04:16:02 -07:00
Evan Tschannen 82cc30be62 added testing for two_satellite_fast and two_satellite_safe 2018-07-09 22:01:46 -07:00
Stephen Atherton fddb3e87e2 Differentiate between a timeout in attempting to connect vs a timeout on an active connection by converting timeouts during connection attempts to connection_failed errors. 2018-07-09 19:40:01 -07:00
Stephen Atherton 3ce7c78d36 If an HTTP request fails due to a connection failure or a timeout, do not convert the error to the more generic http_request_failed. 2018-07-09 18:58:33 -07:00
Evan Tschannen e503dc975c fix: destroy peers that are inactive
do not open new connections to send replies
2018-07-09 13:37:06 -07:00
Evan Tschannen 5a2cb3037b merge 5.2 into 6.0 2018-07-08 20:14:06 -07:00
Evan Tschannen 0e97ce79b4 fix: destroy peers that are inactive
do not open new connections to send replies
2018-07-08 10:26:41 -07:00
Stephen Atherton a2f16e217e Memory waste fix, when a Peer disconnects an extra packet buffer block is allocated to copy unsent reliable bytes to even if there aren't any. 2018-07-06 19:44:30 -07:00
Evan Tschannen 6d7172ef7e fix: canKillProcesses did not take into account the remoteTLogPolicy when checking notEnoughLeft 2018-07-05 21:36:09 -07:00
Evan Tschannen 6f4ca2eba2 fix: get all processes did not include rebooting processes 2018-07-05 21:13:56 -07:00
Evan Tschannen cd4fb9285a waitForExlusion requires both regions to be healthy, which is only possible if we do not kill all logs in a region 2018-07-05 14:04:42 -07:00
Stephen Atherton 9d85a05372 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-05 12:52:06 -07:00
Stephen Atherton 2cb0362102 AsyncFileCached now allows writing and truncation of whole pages previously read using readZeroCopy and not yet released without prior readers seeing the effects of the write. 2018-07-05 02:59:13 -07:00
Evan Tschannen 7315e5da55 fix: isExcluded and isCleared were exactly wrong
fix: isCleared should mean the process is dead
2018-07-05 02:22:22 -07:00
Evan Tschannen e17dfea3b6 fix: desiredTLogCount was used instead of getDesiredLogs(), which caused problems with recruitment when desiredTLogCount was -1.
canKillProcess logic was wrong.
We still need to configure usable_regions because if datacenterVersionDifference is too large we cannot complete data movement.
2018-07-04 16:22:32 -04:00
Stephen Atherton 2925b9b984 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-07-03 23:03:56 -07:00
Alvin Moore c3f88dbfe1 Merge branch 'master' of github.com:apple/foundationdb into tls-static 2018-07-01 23:13:57 -07:00
Alvin Moore 132e2d9267 Defined TLS build flags for projects
Updated TLS documentation
2018-07-01 22:49:39 -07:00
Stephen Atherton b95a2bd6c1 Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
# Conflicts:
#	flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen 899f880ce0 fix: log router class did not have the proper fitness for becoming the cluster controller 2018-06-28 23:20:01 -07:00
Alvin Moore 45849d1f95 Added support for no-op legacy TLS options 2018-06-27 09:25:05 -07:00
Alvin Moore 65d8b38ae9 Changed generic plugin code to work as expected plugin code except for TLS use case
Defined TLS plugin name constant
Changed TLS plugin name to get_tls_plugin
Fixed link script
Removed compilation flags from info make target
2018-06-26 16:01:25 -07:00
Alvin Moore ef8de426d3 Changed the TLS_DISABLED macro
Disable TLS within Windows until working
2018-06-26 12:08:32 -07:00
Evan Tschannen 0123627d67 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-06-22 10:43:07 -07:00
Evan Tschannen 5fc8199abc Swapped OkayFit and UnsetFit, because generally if machine classes are set on one machine they are set everywhere and it helps with wait_for_good_recruitment logic
wait_for_good_recruitment now requires that you have the desired count of each roll
remote recruitment is given a much longer wait_for_good_recruitment time interval, which does not start until enough remote machines have registered
2018-06-22 10:15:24 -07:00
Evan Tschannen 1dce97f28c Merge branch 'release-5.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/SimulatedCluster.actor.cpp
#	packaging/msi/FDBInstaller.wxs
#	versions.target
2018-06-21 17:05:11 -07:00
Balachandar Namasivayam d7dba11366 Throw tls_error instead of internal_error when not able to create a TLS connection. 2018-06-21 15:33:00 -07:00
Stephen Atherton e9e1e194f0 Added operation-specific rate controls to blob store interface. 2018-06-20 20:34:34 -07:00
Richard Low fff6a47c43 Validate certiicates by default 2018-06-20 14:04:03 -07:00
Alvin Moore f8ce1de601 Added support for compiling TLS into binaries 2018-06-20 09:21:23 -07:00
Stephen Atherton e5c48d453a Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-18 22:45:27 -07:00
Evan Tschannen 0913368651 added usable_regions to specify if we will replicate into a remote region
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Stephen Atherton 1eae9d621b Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-13 15:58:21 -07:00
Stephen Atherton 2878f30f29 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/IKeyValueStore.h
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Alex Miller 6c2cb25c53 Rename BestOtherFit -> OkayFit.
The previous order of fitness was

  BestFit > GoodFit > BestOtherFit > ...

which is baffling.  It's now:

  BestFit > GoodFit > OkayFit > ...

which won't break anyone's expectations.
2018-06-12 16:50:25 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen 48fbc407fd fix: we cannot kill all of the remote tlogs, because we still need their data to copy to the next generation in the same data center 2018-06-08 15:28:44 -07:00
A.J. Beamon 99c9958db7 Some more trace event normalization 2018-06-08 13:57:00 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Balachandar Namasivayam 529d0497f1 Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload.
Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.
2018-06-01 15:21:40 -07:00
A.J. Beamon d9c702a9e3 Merge release-5.1 into release-5.2 2018-05-30 09:09:55 -07:00
Joel Armstrong 7c35ea6ba1 Fix use of bool in va_start causing undefined behavior
The version of clang included in Apple LLVM 9.1.0 complains about
passing the bool parameter `is_error` to va_start, which causes make
to fail:

fdbrpc/TLSConnection.actor.cpp:370:16: error: passing an object that undergoes
      default argument promotion to 'va_start' has undefined behavior
      [-Werror,-Wvarargs]
        va_start( ap, is_error );
                      ^
This just switches is_error back to the type it gets promoted to (int).
2018-05-24 16:37:11 -07:00
A.J. Beamon 026458baf3 Merge release-5.2 into master 2018-05-23 15:32:56 -07:00
Richard Low 84ed35b01f Only log TLS verify failures if all verification fails; log failures at SevInfo 2018-05-21 10:58:59 -07:00
Richard Low 086700aeb1 Plumb through TLS key password to CLI and from environment 2018-05-21 10:56:10 -07:00
Evan Tschannen 520aaf731d merge release 5.2 into master 2018-05-10 14:33:08 -07:00
Evan Tschannen b5b8c5d587 fix: white space issue in getKnobDescription 2018-05-10 14:27:10 -07:00
Balachandar Namasivayam b2c32ea4f2 Add secure_connection param to BlobStore to configure security.
Default is https. Setting secure_connection=0 makes it http.
2018-05-10 13:53:46 -07:00
Evan Tschannen 7bca7b80e6 fixed merge conflicts 2018-05-10 09:13:41 -07:00
Evan Tschannen 8f984cb2c9 Merge branch 'release-5.2'
# Conflicts:
#	fdbrpc/TLSConnection.h
2018-05-10 09:13:22 -07:00
Evan Tschannen d3450ce5b0
Merge pull request #343 from bnamasivayam/tls-plugin
Tls plugin
2018-05-09 16:35:53 -07:00
Balachandar Namasivayam 479dbf4c04 Addressed review comments.
Remove redundant FDBLibTLS/ITLSPlugin.h.
2018-05-09 16:16:09 -07:00
Balachandar Namasivayam 0c2960a221 Use smart pointer instead of naked ones in set_peer_verify() method. 2018-05-09 14:53:01 -07:00
Balachandar Namasivayam 7591931a09 Revert "Make tls_verify_peers as a comma separated string of constraints."
This reverts commit 2033847e4b.
2018-05-09 14:40:36 -07:00
Balachandar Namasivayam 2033847e4b Make tls_verify_peers as a comma separated string of constraints. 2018-05-09 14:37:39 -07:00
Balachandar Namasivayam e8b7f4b190 Add password support for tls. 2018-05-08 20:46:31 -07:00
Balachandar Namasivayam 49af5d685b Restore previous behavior of not specifying peer_verify option means disable checking. 2018-05-08 18:54:44 -07:00
Balachandar Namasivayam d3b5cfb93c Support latest TLS plugin.
Add support for https in backup.
2018-05-08 16:28:13 -07:00
Evan Tschannen 7acdc314e4 Merge branch 'release-5.2'
# Conflicts:
#	fdbrpc/TLSConnection.actor.cpp
2018-05-08 13:22:53 -07:00
Evan Tschannen 1f6c6a886b Merge branch 'release-5.1' into release-5.2 2018-05-08 13:08:11 -07:00
Alvin Moore 9aa94e87a3 Renamed the default TLS search plugin 2018-05-07 17:01:14 -07:00
Alex Miller bc8e6acbe8 Fix the other half of simulation requiring a TLS Plugin.
This commit:
1. Restores --tls_plugin as a way to provide the path to the TLS plugin when running in simulation.
2. Removes the TLS Plugin as being required for 5% of tests.
3. Standardizes on 'sslEnabled' as a variable name.

And is a fix/improvement upon commit f7733d1b.

(1) previously didn't work, because we would create multiple new TLSOptions
instances and run init_plugin multiple times.  Only the first call would use
the argument specified on the command line.  To fix this, the TLSOptions
derived from the command line is threaded through all the simulation code that
needs it.

(2) was an oversight in f7733d1b, which didn't actually make "should we be TLS"
dependant on if the TLS plugin was available or not.

(3) is just nice for trying to grep around in the codebase.
2018-04-30 18:26:29 -07:00
Stephen Atherton af61d3596d Merge branch 'public-master' into feature-redwood
# Conflicts:
#	fdbserver/DatabaseConfiguration.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
2018-04-24 17:22:21 -07:00
Alex Miller f7733d1bd0 Do not require the TLS Plugin for simulation.
It appears that explicit calls to TLS-related things had snuck in over time,
which meant that simulation runs that weren't even configured to use SSL still
wanted and required the TLS plugin.

This commit instead threads through the understanding of if any TLS-related
options were provided, and if not, then don't call anything TLS-related so that
we don't require the TLS plugin.

Hopefully this makes life easier for the opensource folk. :)
2018-04-24 16:53:30 -07:00
Dennis Schafroth 290122637b Using ASSERT_ABORT in destructors 2018-04-23 14:05:10 +02:00
Evan Tschannen c1ccc8522c Merge branch 'release-5.2' 2018-04-17 18:38:12 -07:00
Evan Tschannen db98c1b9b6 Merge branch 'release-5.1' into release-5.2
# Conflicts:
#	versions.target
2018-04-17 18:36:19 -07:00
Stephen Atherton 0169384636 Fixed rare infinite loop in blob list and delete operations. 2018-04-12 17:22:34 -07:00
Alex Miller 20082e3228 Clang fixes. 2018-04-12 11:10:53 -07:00
Alec Grieser 42c8527f43
Merge remote-tracking branch 'upstream/release-5.2' into merge-release-5.2 2018-04-11 18:35:32 -07:00
Yichi Chiang d8175471bc Merge release-5.1.5 2018-04-11 17:55:10 -07:00
A.J. Beamon ee4c966137 TransportData::numIncompatibleConnections was uninitialized. 2018-04-11 11:15:12 -07:00
Stephen Atherton 2752a28611 Merge branch 'release-5.2' of github.com:apple/foundationdb into feature-redwood 2018-04-06 16:29:37 -07:00
Alec Grieser 551ea9c7f8
Merge remote-tracking branch 'upstream/release-5.2' into master-release-5.2-merge 2018-03-19 12:34:50 -07:00
yichic ede5cab192
Merge pull request #89 from yichic/share-log-mutations-5.2
Share log mutations 5.2
2018-03-19 12:01:26 -07:00
Yichi Chiang ec02e54f64 Refactor EraseLogData() 2018-03-19 11:56:01 -07:00
Yichi Chiang d6559b144f Share log mutations between backups and DRs which have the same backup range 2018-03-19 11:32:50 -07:00
Yichi Chiang 26b93ff920 Share log mutations between backups and DRs which have the same backup range 2018-03-16 18:09:23 -07:00
Alec Grieser 0853fcb052
switch to using zu for some size_t variables in printf 2018-03-14 18:07:05 -07:00
Evan Tschannen 3abf4d7fdf Merge branch 'master' into feature-remote-logs 2018-03-09 14:50:04 -08:00
Evan Tschannen 91bb8faa45 Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
Evan Tschannen 28ea983487 Merge branch 'release-5.1' into release-5.2
# Conflicts:
#	flow/Trace.cpp
#	versions.target
2018-03-09 14:40:31 -08:00
Evan Tschannen cf6dd1437b suppress spammy trace events 2018-03-09 10:16:34 -08:00
Evan Tschannen 5390af8be4 suppress spammy logs 2018-03-09 09:40:36 -08:00
Evan Tschannen 68606c7984 fix: sim2 logic for when a kill is safe was incorrect 2018-03-06 18:38:05 -08:00
A.J. Beamon f2c804e14f Reverting changes from merge of master into release-5.2 (b25810711c). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit. 2018-03-06 10:15:04 -08:00
satherton a82d0e95be
Merge pull request #25 from apple/release-5.1
Merge release-5.1 to master
2018-03-04 23:20:31 -08:00
Stephen Atherton d0e122fdbe Blob client send and receive speed limits were being initialized using opposite knobs. 2018-03-04 23:05:55 -08:00
Evan Tschannen e3c6b66240 fix: do not commit more data after being stopped
fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center
2018-02-26 13:13:37 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alvin Moore de1551c20d Merge branch 'release-5.1' 2018-02-23 08:24:06 -08:00
Alvin Moore a1382895a6 Fixed headers and some whitespace 2018-02-23 04:50:23 -08:00
Alec Grieser e1162e9238 Merge remote-tracking branch 'upstream/release-5.1' 2018-02-22 11:16:12 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Alec Grieser aadc06de99 Merge remote-tracking branch 'upstream/release-5.1' 2018-02-20 14:28:29 -08:00
Alec Grieser 1c1ae7d70e Merge remote-tracking branch 'upstream/release-5.1' into bindings-format 2018-02-19 12:37:06 -08:00
Evan Tschannen 31b89a638f added satellite_none and remote_none options to unconfigure from a fearless setup
fix: log_router configuration was broken
2018-02-17 13:51:17 -08:00
Evan Tschannen dc93759e15 suppressed trace events that are spammy 2018-02-16 16:01:19 -08:00
Evan Tschannen cb25564d38 simulated cluster supports fearless configurations
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00
A.J. Beamon 814ae16016 Add destination tokens to Net2_LargePacket trace events. Add backtrace when a sent packet is too large. 2018-02-15 14:54:35 -08:00
Balachandar Namasivayam f320b1b347 Change ConnectionClosed TraceEvent severity from SevError to SevWarnAlways. 2018-02-14 12:25:54 -08:00
Stephen Atherton 0a35f167e4 Merge branch 'master' into feature-redwood
# Conflicts:
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/IDiskQueue.h
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/worker.actor.cpp
2018-02-12 01:30:02 -08:00
Evan Tschannen 42405c78a5 Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/Knobs.cpp
2018-02-10 12:08:52 -08:00
Evan Tschannen fbadcc6eea changing a storage server’s tag must be the first mutations applied in a version, because privatized mutations applied earlier in the same version will use the old tag 2018-02-09 18:21:29 -08:00
Evan Tschannen c7b3be5b19 re-enabled better master exists
the cluster controller can choose a better data center for itself and let the workers know where the next cluster controller should be recruited
2018-02-09 16:48:55 -08:00
Stephen Atherton 69425a303b Improved error handling for cases where blob account credentials are either not found in the provided credentials sources and/or some of the credentials sources provided are not readable or parseable. 2018-02-07 21:50:43 -08:00
Stephen Atherton f8522248cb Blob credentials files were being opened in read-write mode despite the read-only option being specified because the underlying caching layer opens always opens files for read/write access. For now, disabled caching for this file. 2018-02-07 16:25:16 -08:00
Stephen Atherton d8879dc3f3 HTTP::doRequest() now reads responses in parallel with sending requests, so if the server responds before receiving all of the the request the client can stop sending the remainder of the request. For PUT requests which upload files, this prevents sending potentially several megabytes of unnecessary bytes if the server responds with an error (such as 429) before the request is completely sent. Updated the backup container unit test to use more parallelism in order to test this new behavior. 2018-02-07 10:38:31 -08:00
Stephen Atherton 0792d5e3dd Fix: last restorable version for a backup tag name (a separate value from the latest restorable version for a configured backup) was not being updated.
Fix: backup blob speed was sometimes an error because the JSON $sum merge operator did not support mixed numeric types.
Fix: JSON merge operator handling was squashing errors in some cases, which was generally obscuring the backup speed metric issue.
Cleaned up some of the JSON object merging logic.
Improved error messages in JSON merge operators.  Added JSON merge operator tests for mixed numeric math and improved readability of test output.
2018-02-06 13:44:04 -08:00
Evan Tschannen ebd94bb654 removed a separately configurable storage team size for the remote data center, because it did not make sense
fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
2018-02-02 11:46:04 -08:00
Evan Tschannen 2e3b1d7ab8 Merge commit 'dd6ea70051aef215315e9eb3dea3b67a24778e32' into feature-remote-logs
# Conflicts:
#	flow/Net2.actor.cpp
2018-01-29 17:11:03 -08:00
Stephen Atherton 2f291d8955 Bug fix in blob backup container deletion. The list/delete loop could end before deleting all of the files, but the index entry would still be deleted. Also preemptively made the same code change in listBucket() - Although it is technically correct as written it is a dangerous style because it is not obvious that the addition of a wait() call in the second 'when' block would create a bug. Consolidated deleteContainer() and deleteBucket() as they differ by only 1 line. 2018-01-29 00:32:41 -08:00
Alec Grieser 51781bb7a8 Merge branch 'release-5.1' into bindings-format 2018-01-26 12:28:29 -08:00
Evan Tschannen 79d94214a4 Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs 2018-01-25 10:12:06 -08:00
Stephen Atherton 9fd2a8df3d Tweaked a trace event suppression time. 2018-01-24 19:08:24 -08:00
Alec Grieser 57986cfe00 format python files to be roughtly pep8 compliant 2018-01-24 19:06:58 -08:00
A.J. Beamon 19ed388c0e Merge branch 'release-5.0' into release-5.1
# Conflicts:
#	documentation/sphinx/source/downloads.rst
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2018-01-24 14:43:41 -08:00
Stephen Atherton 7f18d59dfe Bug fix, the blob request attempt count is now incremented for all errors except response code 429. 2018-01-24 01:15:01 -08:00
Stephen Atherton a2481343ec Bug fix, HTTP error code 429 was not being considered retryable in blob client (this was previously fixed but apparently reintroduced). 2018-01-24 00:22:11 -08:00
Stephen Atherton 66de9d392b New error code, http_auth_failed, which is used when blob authentication fails instead of the previous generic http_request_failed. 2018-01-22 14:58:56 -08:00
Evan Tschannen 698ef4117e Merge branch 'master' into feature-remote-logs 2018-01-20 10:34:30 -08:00
Stephen Atherton 307e04c0ad Updated backup container unit test to match new safer behavior of expireData(). Rewrote BackupContainerLocalDirectory::deleteContainer() to actually delete the whole directory but only if it appears to be a backup with either log or snapshot data. 2018-01-18 00:36:28 -08:00
Stephen Atherton 93b34a945f Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup. 2018-01-17 04:09:43 -08:00
Evan Tschannen 21482a45e1 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DBCoreState.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Alvin Moore 2e6ce03224 Merge pull request #232 from cie/build-dont-compile-hpp
Filter out .hpp files from *_BUILD_SOURCES (like we do with .h files)…
2018-01-12 14:09:25 -08:00
Evan Tschannen 02bd83ff76 changed incompatibleDataRead to an asyncTrigger 2018-01-11 13:35:56 -08:00
A.J. Beamon 80b84c23ac Filter out .hpp files from *_BUILD_SOURCES (like we do with .h files). Add xml2json.hpp to our fdbrpc project. 2018-01-10 13:51:57 -08:00
A.J. Beamon ce93d98b50 Temporarily remove xml2json.hpp from fdbrpc vcxproj 2018-01-10 10:18:44 -08:00
A.J. Beamon 2f5073d00f Some visual studio project cleanup. 2018-01-10 10:07:18 -08:00
Stephen Atherton 0e7d538c94 Bug fix, in recursive blob folder listings the recent removal of common prefixes from the result stream caused the list marker to not be set correctly when a folder level requires multiple requests due to folder size. 2018-01-06 20:58:48 -08:00
Evan Tschannen 3ec45d38a0 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Stephen Atherton 96cb06cbc7 Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations. 2018-01-05 23:06:39 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Stephen Atherton 78430425e8 Blob bucket listings will now use parallel recursive requests on CommonPrefixes, up to a max depth, if a delimiter is provided. 2018-01-02 23:17:52 -08:00
Stephen Atherton 07fde9dfb4 Bug fix, error code 429 was not being treated as retryable in the recent refactor. 2018-01-02 23:15:25 -08:00
Stephen Atherton f324afc13f Bug fix in blob store listing when it requires multiple serial requests Added more trace events to FileBackup and BlobStoreEndpoint with suppression and added suppression to existing trace events. 2017-12-22 17:08:25 -08:00
Stephen Atherton f2524ffd33 AsyncFileBlobStoreWrite was prohibiting the writing of 0-byte files. Improved HTTP verbose logging to stdout. Added writing a 0-byte file to BackupContainer unit test. Added backup log and snapshot sizes to backup description. 2017-12-21 21:15:26 -08:00
Stephen Atherton e0ef5a9a20 Whitespace normalization. 2017-12-21 12:07:29 -08:00
Stephen Atherton e3aee45a74 Backup tools and agent now accept blob account credentials via files containing JSON which are specified using command line arguments and/or an environment variable. Improved fdbbackup help, clarifying which options are for which operations. Fdbbackup operations which do not need to use a database no longer require a cluster file parameter. Added eat() commands to StringRef for incrementally tokenizing strings using separator strings. 2017-12-21 01:58:15 -08:00
Stephen Atherton e0d9cea008 Merge branch 'master' into continuous-backup
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Alex Miller 9a0df6d76d Deallocate aligned_alloc with aligned_free.
This probably fixes a windows-only crash, as only windows cares about this distinction.
2017-12-14 15:12:05 -08:00
Stephen Atherton b6cfe010a1 Bug fix in URL encoding of delimiter. 2017-12-12 17:31:19 -08:00
Stephen Atherton 872edd7540 Merge branch 'release-5.0'
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2017-12-06 16:27:04 -08:00
Stephen Atherton 41f80bf7ed Renamed an error, changed blob request failure to Warn severity. 2017-12-06 15:58:54 -08:00
Stephen Atherton 4bc7d0b86a Updated error names and severities. 2017-12-06 15:42:44 -08:00
Stephen Atherton abb2dd1ebc Merge pull request #214 from cie/alexmiller/fallocate
Use fallocate to zero ranges instead of writing zeroes
2017-12-06 13:47:40 -08:00
Alex Miller 064670a95b Maintain a reference to the IAsyncFile in zeroRange.
And also add some notes about the reference semantics to the IAsyncFile header
for future readers.
2017-12-06 13:41:21 -08:00
Balachandar Namasivayam 1f949240f5 Make fdbbackup s3 compatible.
s3 sends response in XML.  FDB backup expects json response. Added a new libraray xml2json to convert xml to json.
2017-12-05 17:13:15 -08:00
Stephen Atherton 86ae6c09c7 Bug fixes, take(1) is incorrect usage of FlowLock. 2017-12-04 10:20:50 -08:00
Evan Tschannen 482ac38ca6 added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs 2017-12-01 13:04:32 -08:00
Alex Miller 7bab3a4ece AsyncFileKAIO will prefer using fallocate's ZERO_RANGE for AsyncFile::zero().
For situations in which we have support for FALLOC_FL_ZERO_RANGE, it's much
faster to use fallocate than manually overwrite the file with zero bytes.  Note
that this support depends on having a kernel from late 2014 or newer, and being
on ext4 or xfs.  If these conditions aren't met, we'll fall back to writing
zeros in 1MB chunks as normal.
2017-11-30 17:57:55 -08:00
Alex Miller 196258080b Refactor zeroing a chunk of a file from DiskQueue into IAsyncFile.
If we're going to do the work to provide more optimized ways to zero files,
then I'd feel better with this being in a more common place, so that any other
zero-ers are likely to reuse it.  It also makes testing easier/more obvious.

Also, because it's needed for correctness, fix the aligned_alloc for OSX, which
wasn't aligned, and use an actually aligned allocation function.
2017-11-30 17:57:55 -08:00
Alex Miller c7a120c59d Rename IAsyncFile::incrementalDelete -> IAsyncFileSystem::incrementalDeleteFile.
`deleteFile` existed in IAsyncFileSystem, so an incremental delete function
seems to belong more as a virtual method on IAsyncFileSystem than a static
method on IAsyncFile, and the naming should match.

As long as we're here, change IAsyncFile to declare a virtual destructor, so
that it has good and proper C++ behavior.  I presume this is what was vaguely
intended by the default constructor definition that previously existed?
2017-11-30 17:19:10 -08:00
Stephen Atherton 1e643239f9 Improvement in blob connnection reuse, oldest connnections in pool are now used first. 2017-11-30 12:57:29 -08:00
Stephen Atherton 1b1c8e985a Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2017-11-25 19:54:51 -08:00
Alex Miller f19cb3bbbd Merge pull request #208 from cie/alexmiller/grvtfix
Fix the GRV performance regression
2017-11-17 15:00:44 -08:00
Alex Miller e9412bbb11 Fix the GRV performance regression introduced by adding the policy engine to GRV calculations.
Construction of LocalityGroup from LocalityData is expensive, and the previous
code greatly ran afoul of that.  The policy engine does a large amount of
interning of strings and building compressed maps to make the expected many
future selectReplica calls cheap.  Unfortunately we don't call selectReplicas,
so much of this work is undesireable for us, and a large amount of CPU time is
spent doing this initialization work.

The new changes aggressively do the minimal LocalityGroup::add() calls
necessary, and make them as cheap as possibly by removing all elements from
LocalityData that don't need to be considered by the policy.

This optimization was also applied to the PeekCursor used during recovery,
which should speed recoveries up by a small amount.
2017-11-16 16:15:52 -08:00
Stephen Atherton a77162b53d Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbclient/BackupAgent.h
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton e07dcb9ada Fixed header paths. 2017-11-15 00:05:20 -08:00
Stephen Atherton 3dfaf13b67 IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup.  The addition of IBackupFile and its finish() method simplified the log and range writer tasks.  Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes.  Added KeyBackedSet<T> type.  Moved JSONDoc to its own header.  Added platform::findFilesRecursively().

Still to do:  update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
Balachandar Namasivayam 987379d790 Changed naming of num_incompatible_connections to numIncompatibleConnections 2017-11-14 18:37:29 -08:00
Balachandar Namasivayam 27b67cffbe The earlier implementation of tracking number of incompatible connection had a bug where the counter will be incorrectly decremented for incoming connections on certain conditions.
Now the counter increment and decrement happens in the same ACTOR (ConnecitonReader) and makes it easy to verify its correctness.
2017-11-13 15:07:39 -08:00
Balachandar Namasivayam 9809e84806 Added a counter to keep track of active outgoing incompatible connections.
This counter is used to print a warning in fdbcli if there are incompatible peers.

Example Output:

./fdbcli
Using cluster file `fdb.cluster'.

WARNING: Incompatible peers exist.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> status

WARNING: Incompatible peers exist.

Using cluster file `fdb.cluster'.

Could not communicate with a quorum of coordination servers:
  127.0.0.1:4000  (unreachable)
2017-11-09 11:20:35 -08:00
Evan Tschannen 57aba0b3bc fix: excluded servers were the same fitness as storage servers for the master role
fix: better master exists did not considers exclusion for master fitness
2017-11-03 17:09:14 -07:00
John Brownlee d46e240de2 Merge branch 'release-5.0'
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
#	versions.target
2017-11-02 10:42:30 -07:00
Stephen Atherton f050105243 Added HTTP 502 to the list of retryable errors. 2017-11-01 11:41:32 -07:00
Alex Miller 3b61b76876 Fix a massive amount of valgrind errors and make them easier to debug in the future.
std::is_pod<> being less restrictive than is_binary_serializable<> meant that
structs that both were POD and had a serialize method defined would be binary
serialized instead of using the defined serialize().  This means that it would
also serialize any padding that the struct contained, which would cause mass
waves of valgrind failures from uninitialized memory.

Included in this change is additional uses of valgrind client requests so that
attempts to send uninitialized memory are reported at the sending site, versus
as part of checksum calculation in sending the packet.
2017-10-27 16:54:44 -07:00
Evan Tschannen df74e2a373 re-added support for non-copying tlog recovery 2017-10-24 15:09:31 -07:00
Stephen Atherton 45fa3680fa Restore logging of remote address (if connected) or host (if connection fails) for blob errors. 2017-10-20 21:47:23 -07:00
Stephen Atherton 3afc85881e Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbrpc/BlobStore.actor.cpp
2017-10-20 21:38:28 -07:00
Stephen Atherton 42955012e9 Merge branch 'release-5.0'
# Conflicts:
#	fdbrpc/BlobStore.actor.cpp
#	flow/error_definitions.h
2017-10-20 21:16:55 -07:00
Stephen Atherton 9f151314b3 Changed some trace event severities. Also fixed a weird casing of “retryable”. 2017-10-19 17:47:42 -07:00
Evan Tschannen e2c1e87df6 made a large number of fixes to make fearless DR correctness clean. 2017-10-19 15:36:32 -07:00
Stephen Atherton caad691ae2 Added comments for how to handle HTTP 400 errors gracefully in certain instances should the need arise. 2017-10-18 23:47:59 -07:00
Stephen Atherton ef84e52127 Improved error handling and memory usage in AsyncFileBlobStoreWrite. Writes will now fail if any upload has already failed, rather than buffering unboundedly until sync() is called to complete the file. There is also a configurable limit on how many uploads can be pending before writes will stall waiting for one to finish. 2017-10-18 05:51:30 -07:00
Stephen Atherton ebd0234514 Rewrote most error handling in BlobStoreEndpoint to fix several shortcomings in error handling and logging. The request loop now logs but rate limits all errors, and the exceptions thrown are more appropriate. HTTP 503 is now treated as retryable. Callers of BlobStoreEndpoint::doRequest() now specify which codes they consider to be successful so that more error handling can take place in the main request loop. 2017-10-18 02:52:09 -07:00
Alex Miller 7b9bc1d715 Merge pull request #170 from cie/alexmiller/flowprofile
Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.
2017-10-16 16:51:53 -07:00
Alex Miller cf646d4a99 Address review comments.
* Fixed fdbcli to be more idiomatic.
* Removed is_binary_serializable in favor of std::is_pod<>
* Removed custom enable_if<> in favor of std::enable_if<>
* Removed HEY REVIEWER comments
* Removed print from prof.py
* Added FLOW_PROFILER_ENABLED=yes to circus components that wished to enable the flow profiler.
2017-10-16 16:46:52 -07:00
Yichi Chiang a6ae89af1a Merge pull request #176 from cie/add-cluster-controller-process-class
Add cluster controller process class
2017-10-16 16:27:54 -07:00
Yichi Chiang af2aa41136 Downgrade Transaction process class for cluster controller 2017-10-16 16:27:01 -07:00
Yichi Chiang 76c5488421 Add cluster controller process class 2017-10-16 16:21:25 -07:00
Stephen Atherton e934604f67 Added DNS resolution. Interface is INetworkConnections::resolveTCPEndpoint() to resolve, or for convenience INetworkConnections::connect(host, service) will resolve host and service (port number or service name like http) and connect to one of the addresses at random.
BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.
2017-10-15 21:51:11 -07:00
Evan Tschannen ff1b49be2e Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DatabaseConfiguration.cpp
2017-10-10 16:07:59 -07:00
Evan Tschannen 15962cf079 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbrpc/Locality.cpp
#	fdbrpc/Locality.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/ClusterRecruitmentInterface.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/masterserver.actor.cpp
#	fdbserver/worker.actor.cpp
#	flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Alvin Moore de8f875038 Fixed call to IsClear
Changed killMachine and killDataCenter interface to return final killtype
Updated TESTs for DataCenter to ensure that DataCenter was killed
Added assertion to ensure that failed DC kills were not downgrades
2017-10-05 03:07:20 -07:00
Stephen Atherton fd5fe3a000 Add slightly better handling of HTTP 503 in blob client. Previously it would end the blob request loop and the task doing the blob action would see a failure, but now the blob request attempt loop will continue to back off and retry. This is better because previously the task that saw the failure would be re-run quickly. 2017-10-03 15:25:49 -07:00
Stephen Atherton 03c4cea511 Added rate-controlled TraceEvents for blob http connection attempts and failures. 2017-10-03 15:21:40 -07:00
Yichi Chiang 284e35204a Fix connection count 2017-10-03 10:54:20 -07:00
Alvin Moore 5257b99d3f Fixed problem with machines RebootedAndCleared not being considered dead in availability consideration 2017-10-03 10:48:16 -07:00
Alvin Moore d099656557 Merge branch 'release-5.0' 2017-10-02 12:05:24 -07:00
Alvin Moore 25513d8e2c Added tests for DataCenter kills 2017-10-02 12:04:28 -07:00
Evan Tschannen 6ea9903c82 Merge branch 'release-5.0'
# Conflicts:
#	fdbbackup/backup.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	versions.target
2017-10-01 18:46:44 -07:00
Stephen Atherton 058300be16 Each blobstore request will again select a random remote address. This used to happen before recent load balancing improvements related to focusing too much load on consistently up endpoints after others have recovered from being down. 2017-10-01 16:17:38 -07:00
Stephen Atherton a95107417f Improved behavior of slow writes during backup. KeyRange and Log backup tasks now use TaskBucket::saveAndExtend() to keep the task alive until flushing the file finishes or fails with an error (blob uploads fail after a limited number of retries). This prevents blob uploads from being retried too often if the destination is slow since a task abort and retry would start the backoff counters back at zero. Also removed a debugging behavior that was accidentally checked in. 2017-10-01 16:01:24 -07:00
Stephen Atherton a098919b20 Bug fix, releaser declared in wrong place, and lots of whitespace cleanup from try blocks that were no longer needed. 2017-10-01 11:25:50 -07:00
Stephen Atherton af87ac301d Removed wait never used for debugging which was accidentally included in bug fix. 2017-10-01 11:19:38 -07:00
Stephen Atherton 6000cafde1 Bug fix, locks were being taken inside try/catch so release would be done even if the take threw an error. Changed to using a Releaser. 2017-10-01 10:46:55 -07:00
Evan Tschannen f84e7252e8 fix: there was a reference counting cycle in asyncFileBlobStore and asyncFileReadAhead 2017-09-29 19:13:08 -07:00
A.J. Beamon 38616424f6 Report a couple error cases in blobstore URL parsing when dealing with numbers. 2017-09-29 17:58:49 -07:00
Alex Miller c40c1bb5fe Add a new workload: BackupToDBAbort, which does an ACI switchover.
This is to allower easier testing of non-durable switchovers without having to
wiggle into BackupToDBCorrectness's view of the world.
2017-09-29 15:58:36 -07:00
Evan Tschannen a1f8b546e6 fix: ensure connections to blob store are evenly distributed across network addresses
added a per address limit to the number of open connections
lowered a variety of knobs to prevent us from using too much memory
2017-09-29 14:59:24 -07:00
A.J. Beamon d30c730f75 Add the ability to access name and description in Error. Update error descriptions. 2017-09-28 12:35:03 -07:00
Alvin Moore 298b54104e Merge branch 'release-5.0' 2017-09-26 11:16:14 -07:00