A.J. Beamon
9272a41e5f
Merge pull request #1146 from atn34/fix-actor-warning
...
Fix actor warning for cmake build
2019-02-13 11:01:37 -08:00
Andrew Noyes
3a38bff8ee
Use DISABLE_ACTOR_WITHOUT_WAIT_WARNING consistently
2019-02-13 10:30:35 -08:00
Andrew Noyes
067a445e06
Replace unused _ variables with wait(success(...))
2019-02-12 17:30:30 -08:00
Andrew Noyes
874a58cb4f
Suppress actor without wait for tests in cmake
2019-02-12 11:01:17 -08:00
mpilman
8a94d80deb
fdbservice and fdbrpc now compiling
2019-02-07 15:37:04 -08:00
Evan Tschannen
486e0e13c3
Merge pull request #1116 from alexmiller-apple/tstlog
...
Random cleanups that prepare for Spill-By-Reference TLog
2019-02-05 18:09:06 -08:00
A.J. Beamon
882f8d70b7
Merge pull request #1066 from etschannen/master
...
fix: coordinators auto could put two coordinators in the same zone
2019-02-05 11:52:04 -08:00
Alex Miller
6668b7c544
Make simulation enforce what KAIO requires.
2019-02-04 18:04:22 -08:00
Evan Tschannen
e9ddd94e27
The failure monitor is given a list of all IP addresses associated with a process
...
The connect packet includes the correct remote address
Did a lot of code cleanup
Simulation test mixed TLS and non-TLS listeners on the same process
2019-01-31 18:20:14 -08:00
Meng Xu
550f2e2682
Merge with master to use the latest backup container
...
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
Balachandar Namasivayam
9cf2b4e1e7
Improve TLS logging on error scenarios.
2019-01-29 17:04:09 -08:00
Meng Xu
76e1ba2934
add blob_credential_file option
2019-01-29 16:00:52 -08:00
A.J. Beamon
05b38167d0
Update fdbrpc/sim2.actor.cpp
...
Co-Authored-By: etschannen <36455792+etschannen@users.noreply.github.com>
2019-01-29 11:35:02 -08:00
Trevor Clinkenbeard
2e0b3a7f1d
Added ProcessClass::CoordinatorClass, which can be used by coordinators, so that coordinators do not have to take on other roles if desired
2019-01-25 11:03:13 -08:00
Evan Tschannen
1d7fec3074
Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2
...
# Conflicts:
# .gitignore
2019-01-24 17:43:06 -08:00
Evan Tschannen
9cf77d70bc
fix: getFirstLocalAddress has to be the same as primary address, because it is what we put in the connect packet, and we always connect from the primary address
2019-01-24 17:28:26 -08:00
Evan Tschannen
699f8dd617
fix: coordinators auto could put two coordinators in the same zone
...
simulation now tests two machines in the same zone
2019-01-18 15:42:48 -08:00
Evan Tschannen
4eb11d74af
Merge pull request #1029 from bnamasivayam/reenable-check_desired_classes
...
Re-enable CheckDesiredClasses after making necessary changes for mult…
2019-01-11 17:15:05 -08:00
Balachandar Namasivayam
a8e2e75cd5
Re-enable CheckDesiredClasses after making necessary changes for multi-region setup.
...
Fixed a couple of bugs
1) A rare race condition where a worker is being roles even after it died.
2) Fix how RoleFitness is calculated for TLog and LogRouter. Only worst fitness is compared to see if a better fit is available.
2019-01-10 10:28:32 -08:00
Evan Tschannen
684a22a52b
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/BackupContainer.actor.cpp
# fdbclient/HTTP.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/BackupCorrectness.actor.cpp
# versions.target
2019-01-09 16:14:46 -08:00
Vishesh Yadav
31c4ac07ac
WIP: FailureMonitoring use endpointAddressList (create individual endpoints for each address) WIP: g_currentDeliveryPeerAddress WIP: FlowTransport endpoint map WIP: Add peerReference to addressToEndpointMap
2019-01-09 07:46:01 -08:00
Vishesh Yadav
51b89ae083
WIP
2019-01-09 07:41:02 -08:00
Alex Miller
cebdb83def
Revert "Merge pull request #977 from alexmiller-apple/abspath"
...
This reverts commit 9881b1d074
, reversing
changes made to 6d278e466b
.
2019-01-08 16:52:09 -08:00
Evan Tschannen
57293a2db0
byte sample recovery did not use limits for its range reads, leading to slow tasks
2019-01-04 10:32:31 -08:00
Andrew Noyes
d5430d7bf8
Remove ignore "-Wreturn-local-addr" pragma
...
This seems to still build on gcc 8
2019-01-03 13:55:17 -08:00
Markus Pilman
dbe9baff1f
Several small compilation fixes for new versions of gcc
...
There are several missing includes for cmath in the code, I added those.
Next, Coro returns a reference to a stack variable and this causes a
warning. As this is probably ok for Coro, I disabled the warning in
that file for GCC. I want to have this warning in the build system as
it is generally a very useful warning to have.
Another change is that major and minor are deprecated for a while now.
I replaced those with gnu_dev_major and gnu_dev_minor.
ErrorOr currently implements operators ==, !=, and <. These do not
compile because Error does not implement ==. This compiles on older
versions of gcc and clang because ErrorOr<T>::operator== is not used
anywhere. It is still wrong though and newer gcc versions complain.
I simply removed these methods.
The most interesting fix is that TraceEvent::~TraceEvent is currently
throwing exceptions. This is illegal behavior in C++11 and a idea in
older versions of C++. For now I simply removed the throw, but this
might need some more thought.
2019-01-03 12:44:19 -08:00
Bhaskar Muppana
aa2a76ef4c
Merge pull request #981 from alexmiller-apple/cmake
...
Add a CMake build system
2019-01-02 18:50:15 -08:00
A.J. Beamon
d8f33a2419
Add parentheses to bitwise ops (turned up by clang after recent change)
2019-01-02 10:15:59 -08:00
anoyes
6a4d87802b
Replace & operator with variadic function
2018-12-28 11:33:42 -08:00
Steve Atherton
9881b1d074
Merge pull request #977 from alexmiller-apple/abspath
...
Use abspath when dealing with the simulator file-cache
2018-12-20 14:56:38 -08:00
Vishesh Yadav
209ecd09ee
Keep local addresses in a vector
2018-12-17 11:25:44 -08:00
Meng Xu
486a7b04fa
TeamCollection: Fix build in osX
...
In osX, we cannot adding unsigned long to a string to append to the string.
2018-12-14 13:44:11 -08:00
Markus Pilman
4ae701d8a9
minor bugfix to look up correct filename in cache
...
(manually cherry-picked from flat-buffers branch)
2018-12-13 22:21:25 -08:00
Markus Pilman
0207831fd6
Use abspath when dealing with the simulator file-cache
...
The simulator uses a hash table to cache all open files to make sure
that several simulated processes don't open the file more than once.
This currently doesn't work properly and deleted files are often kept
open forever. As a result, we often ran out of file descriptors.
The problem is luckily quite simple: files are often opened with an
absolute path but later a relativ path is passed for deletion. This
is not working because the map that is used to store the file
descriptors is not aware of paths - so deleted files are often not
removed from this map. The fix that works for us is to just always
work with absolute paths when adding and removing files from this map.
2018-12-13 22:21:06 -08:00
Alex Miller
a982b9da72
Additional changes from a merge commit.
2018-12-13 17:13:41 -08:00
Alex Miller
e70e59a895
Change some file locations.
2018-12-13 14:53:19 -08:00
Markus Pilman
dce290909d
fdbserver now compiling
2018-12-13 14:13:47 -08:00
mpilman
51beb8b48c
fdbrpc compiling with cmake
2018-12-13 14:02:16 -08:00
Vishesh Yadav
e04abf25f7
simulator: Support multiple listeners on single process
...
Sim2Listener can now take the network address to listen on. This is
used to listen to multiple ports in simulator and test the patch
which added multiple network addresses to single endpoint.
2018-12-13 13:36:52 -08:00
Vishesh Yadav
3eb9b23024
Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint
...
- This patch will make FDB listen to multiple addresses given via
command line. Although, we'll still use first address in most places,
this patch starts using vector<NetworkAddress> in Endpoint at some basic
places.
- When sending packets to an endpoint, pick a random network address in
endpoints
- Renames Endpoint::address to Endpoint::addresses since it
now holds a vector of addresses.
2018-12-13 13:36:52 -08:00
Vishesh Yadav
43e5a46f9b
Change Endpoint::address(NetworkAddress) to vector<NetworkAddress>
...
Extend `Endpoint` class to take multiple NetworkAddresses instead of
just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll
have multiple IP:PORT pairs.
This patch simply adds the field and makes changes to compile the
codebase. The first element of of `address` field is used everywhere.
Hence the way we talk to remains same with this patch.
NOTE:
Directly accessing the first memeber of Endpoint::address is unsafe
as Endpoint() doesn't enforces non-empty address list. However, since
the correctness test pass for now and are anyway replacing all those
unsafe accesses with ones considering the whole vector, this patch
ignores to access them in safe way.
2018-12-13 13:36:52 -08:00
Vishesh Yadav
e8e01b2406
Remove unused localAddress parameter from newNet2 and Net2 classes
2018-12-13 13:36:52 -08:00
Evan Tschannen
d9626895b1
Merge pull request #964 from xumengpanda/mengxu/teamcollection-release
...
TeamCollection: Use machine teams to create server teams to increase availability at scale when a machine has multiple servers
2018-12-13 13:18:54 -08:00
Meng Xu
e069b5c31c
TeamCollection: Use clang format
...
No functional change.
Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-06 11:39:35 -08:00
Evan Tschannen
d2d68aa171
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/ManagementAPI.actor.cpp
# versions.target
2018-12-03 18:26:52 -08:00
Evan Tschannen
55a9c4a0f0
Merge pull request #955 from ajbeamon/fix-bad-error-creation-and-whitespace
...
throw platform_error; -> throw platform_error();. Convert some spaces to tabs.
2018-12-03 15:12:37 -08:00
A.J. Beamon
50c9dfdd01
Errors that occur in platform that are the result of IO issues are now raised as io_error rather than platform_error.
2018-11-30 10:55:19 -08:00
A.J. Beamon
97847f517b
throw platform_error; -> throw platform_error();. Convert some spaces to tabs.
2018-11-28 12:56:57 -08:00
Meng Xu
8de031f9a6
TeamCollection: clang-format
...
Format the changes with git clang-format.
No functional changes.
Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-21 11:18:26 -08:00
Meng Xu
f7a7e069f0
TeamCollection: Remove unnecessary comments
...
Pass 41806 tests with no failure
Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:56:35 -08:00
Meng Xu
73c58852f0
TeamCollection: Resolve code review comments
...
Resolve code review comments:
1) Improve the code efficiency by avoiding unnecessary map search
and avoiding unnecessary checking
2) Remove or comment out trace events when they can be spammy
3) Improve coding style
Tested for 1 hour and no error was found.
KillRegionCycle.txt test was excluded from the test because
existing code cannot pass that test either
Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:55:33 -08:00
Meng Xu
5051b35c61
TeamCollection: Use machine team to create server team
...
Current server team collection logic does not consider
the fact that multipe storage servers can run on the same machine.
When multiple machines fail, all servers on the machines will fail, and
the possibility of having one process team fail and lose data is very high.
To reduce the possibility of losing data when multiple machine fails,
we first create machine teams which span across different fault zones;
we then create server teams based on machine teams by
first picking 1 machine team, and then
picking 1 server from each machine in the machine team.
Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:53:22 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
6f4ad84777
Merge pull request #903 from ajbeamon/move-batcher-into-proxy
...
Move the sort of generic batcher from fdbrpc and make it specific to …
2018-11-10 09:56:03 -08:00
Evan Tschannen
b8381b3cea
Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0
2018-11-10 09:51:49 -08:00
A.J. Beamon
67a152ae9f
Move the sort of generic batcher from fdbrpc and make it specific to batching commits in master proxy. Also a couple minor formatting changes.
2018-11-09 14:19:18 -08:00
Evan Tschannen
56c51c1bb3
fix: usableRegions was uninitialized
2018-11-09 10:17:35 -08:00
Stephen Atherton
9d73166b3b
Many bug fixes related to concurrent page operations and pager shutdown.
2018-11-06 19:31:16 -08:00
Evan Tschannen
87295cc263
suppressed spammy trace events, and avoid reporting a long master recovery duration when the cluster is first created
2018-11-04 23:07:56 -08:00
Evan Tschannen
bf6545a9cf
clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations
...
allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary
reduced unnecessary state variables in waitMetrics requests
2018-11-02 13:15:09 -07:00
Stephen Atherton
df3bdde50b
Many bug fixes. AsyncFileCached write() on a page with a zero-copy read in progress would orphan the old page before the read was finished. Pager file operations were not converting page id to int64 for byte offset calculation. Pager was not calling releaseZeroCopy() after readZeroCopy() if there was an error or cancellation. Pager reads were using some variables that could go out of scope. BusyPage's mechanism for notifying when a physical page is no longer in use is itself no longer in use and therefore removed. Pager shutdown now cancels all outstanding reads. Improved some debug output.
2018-10-31 02:14:55 -07:00
A.J. Beamon
776b289bfe
Move AsyncFileBlobStore and related files to fdbclient.
2018-10-26 13:49:42 -07:00
A.J. Beamon
58a0e22d3c
Remove sim2 dependency on fdbclient:
...
* Remove unused 'exclusionSet' that used a type from fdbclient.
* Replace usages of describe(x) with x.toString().
Also removed some using statements.
2018-10-26 09:23:12 -07:00
Alex Miller
6bb1f4093d
Merge pull request #856 from dropbox/pr/include-fix
...
Adjust all includes to be relative to the root.
2018-10-22 09:51:55 -07:00
Alex Miller
e2fc1c9b95
Remove specifying non-root directory as a path to search for includes.
2018-10-19 18:56:45 -07:00
Evan Tschannen
1ef29cbf0d
more windows build fixes
2018-10-19 17:00:24 -07:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
db71b60d72
Merge pull request #819 from satherton/feature-redwood
...
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen
0217aed74c
Merge branch 'release-6.0'
...
# Conflicts:
# bindings/go/README.md
# documentation/sphinx/source/release-notes.rst
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2018-10-15 18:38:51 -07:00
A.J. Beamon
a963ff7a64
Fix line endings
2018-10-08 09:30:09 -07:00
Stephen Atherton
22f8a4efa9
Normalized all unit test names to begin with "/" if they should be included in random unit testing.
2018-10-05 22:09:58 -07:00
A.J. Beamon
664f64881c
Port truncate optimization from Snowflake PR in order to make quick changes for a patch release.
2018-10-05 15:05:26 -07:00
Stephen Atherton
7c1dc305cb
Merge commit 'a72c8f5cb2e79a673abc0ed3d27ef1c51028fb13' into feature-redwood
2018-10-05 10:15:10 -07:00
Evan Tschannen
3922e477a5
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/LogSystemDiskQueueAdapter.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
A.J. Beamon
92990d6aef
Merge release-6.0 into master
2018-09-21 16:14:39 -07:00
Evan Tschannen
77e2fb787e
Merge branch 'release-6.0' into feature-fix-forced-recovery
2018-09-21 14:55:37 -07:00
Stephen Atherton
2fc86c5ff3
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbrpc/AsyncFileCached.actor.h
# fdbserver/IKeyValueStore.h
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-09-20 03:39:55 -07:00
Evan Tschannen
42a67efb0c
the cluster controller should prefer to be located on a transaction class machine over a storage server class machine
2018-09-19 18:04:59 -07:00
Evan Tschannen
200e65fe61
added a workload which tests killing an entire region, and recovering from the failure with data loss.
...
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Evan Tschannen
4dd2dda0a3
Merge branch 'release-6.0'
...
# Conflicts:
# fdbserver/worker.actor.cpp
2018-09-05 16:11:06 -07:00
Evan Tschannen
df406a340e
Merge pull request #742 from ajbeamon/roles-in-trace-events
...
Add the roles running on a process as a field on trace events in the …
2018-09-05 16:08:12 -07:00
Evan Tschannen
90301f497f
Merge branch 'release-6.0'
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/TLSConnection.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# versions.target
2018-09-05 16:06:33 -07:00
A.J. Beamon
2de0b5d6d7
Add the roles running on a process as a field on trace events in the form of a comma delimited string of role abbreviations.
2018-09-05 15:06:14 -07:00
Evan Tschannen
dcdbb3ec4d
Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-movekey-fixes
2018-09-05 10:29:13 -07:00
Evan Tschannen
21f5cf9ce9
suppress spammy trace events
2018-09-04 17:12:26 -07:00
Steve Atherton
89dd9cc4a3
Cherry-pick pull request #717 to release-6.0
...
Which contains:
* Improve TLS cert refresh logging.
* Loading a mismatching cert shouldn't prevent TLS connections.
* Initialize the cached copy of ca/cert/key data.
* Open certificates as uncached, which means they can be write-protected.
2018-08-23 16:53:40 -07:00
Steve Atherton
365fe992b4
Merge pull request #717 from alexmiller-apple/tls-refresh-fixes
...
Fix certificate reloading issues
2018-08-22 15:09:12 -07:00
Evan Tschannen
717c43a69f
merge 6.0 into master
2018-08-22 00:28:04 -07:00
Alex Miller
d2da969412
Improve TLS cert refresh logging.
...
Explicitly call out failure/success, and surface repeated cert
mismatches.
2018-08-21 15:05:41 -07:00
Alex Miller
4113b36df7
Loading a mismatching cert shouldn't prevent TLS connections.
...
set_{cert,key,ca}_data returns pass/fail and not throw. The existing
code wrongly assumed that they threw.
2018-08-21 15:02:54 -07:00
Evan Tschannen
26ec6ebac8
fixed line endings
2018-08-21 14:58:26 -07:00
Evan Tschannen
712aa00261
a better fix to the windows build issue
2018-08-21 14:54:38 -07:00
Alex Miller
4caacaaf4e
I would like to atone for my sins. But later.
...
This fixes the windows build. For some reason, MSVC believes that the
actor-compiled version of networkSender actually exists, but the
non-actor-compiled version doesn't exist.
This is a hackish workaround, as the largest reason to not include a
.g.h file is because it defines a POST_ACTOR_COMPILER define that messes
with actorcompiler.h's #defines. We can just undefine that after
including the file. ...but carefully.
2018-08-20 20:33:38 -07:00
Alex Miller
3ece3cf301
Initialize the cached copy of ca/cert/key data.
...
This was just purely an accidental oversight from before. The variables
were there and handled like they were actually initilized with the
contents of the various certificate files at start-up, but never
actually were.
And add a few trace events to make it easy to see when the system
noticed and tried to reload certificate data.
2018-08-20 19:09:34 -07:00
Alex Miller
fd866a3b47
Open certificates as uncached, which means they can be write-protected.
...
OPEN_READONLY still opens the file as read-write. To actually be
read-only, one needs to open the file as READONLY and UNCACHED.
2018-08-20 19:07:58 -07:00
Alex Miller
63b1e85338
Ban `Void _ = wait(...)` constructions, and require just `wait(...)`.
...
There's never any reason to save the value of a Void return, and it's
the easiest source of redefined variable bugs that will creep back in
over time. So just `wait(...)`, it's cleaner that way.
2018-08-14 15:50:26 -07:00
Alex Miller
86dbe1f0e9
Fix more instances of actorcompiler.h being in the wrong place.
2018-08-14 15:50:26 -07:00
Alex Miller
7feb5d8209
Remove including flow.h in actorcompiler.h, and fix resulting breakage.
...
For files that required flow.h, and only got it through actorcompiler.h,
their version of flow.h would have the actorcompiler #defines defined.
Then, if it included a STL/boost file, the same breakage would result.
This needs to not happen, so the include of flow.h in actorcompiler.h
was removed.
2018-08-14 15:50:26 -07:00
Alex Miller
bca324eaa6
More actorcompiler.h fixes and additions.
2018-08-14 15:50:26 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
07e5281142
Restrict actor keyword #defines to actor files.
...
This introduces a new rule in our codebase, that any file that #includes
actorcompiler.h needs to do it as the last #include, and it needs to
then #include unactorcompiler.h at the end of the file.
The point of this is that it prevents our actorcompiler.h #defines from
leaking into boost or the c++ standard library. Both of these start
throwing errors if you s/state// their code, which `#define state `
effectively does.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
cdcf056aef
Merge branch 'release-6.0'
2018-08-14 09:43:51 -07:00
A.J. Beamon
168dce94cb
Remove some trace event suppressions that were happening off the network thread. Downgrade some trace events related to trace logging problems from SevError to SevWarnAlways.
2018-08-14 09:00:43 -07:00
Evan Tschannen
3186fac397
Make sure we still accept some connections even if we are CPU bound by high priority work
2018-08-10 17:47:21 -07:00
A.J. Beamon
574c5576a2
Merge branch 'release-6.0' of github.com:apple/foundationdb
...
# Conflicts:
# fdbrpc/TLSConnection.actor.cpp
# versions.target
2018-08-10 14:31:58 -07:00
A.J. Beamon
3535ddad80
Merge pull request #674 from alexmiller-apple/glibcxx-debug-fixes
...
Fix bugs uncovered by -D_GLIBCXX_DEBUG
2018-08-09 08:18:51 -07:00
A.J. Beamon
24dec1529b
Merge pull request #673 from etschannen/release-6.0
...
A variety of bug fixes and performance improvements
2018-08-07 10:55:46 -07:00
Alex Miller
ff0e14d5a7
Fix a compilation error on windows.
2018-08-06 18:36:01 -07:00
Evan Tschannen
b5a133865d
Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0
...
# Conflicts:
# fdbrpc/TLSConnection.actor.cpp
2018-08-06 18:26:54 -07:00
Evan Tschannen
22f2a1fedd
Merge pull request #676 from etschannen/master
...
fix: we should not free statdata ourselves, it will be deleted by libeio itself
2018-08-06 18:08:45 -07:00
Steve Atherton
fb46385a39
Merge pull request #628 from alexmiller-apple/reloadcertificates
...
Reload certificates if changed.
This is a cherry-pick of #628 back to release-6.0
2018-08-06 18:04:04 -07:00
Evan Tschannen
56e0b729c8
fix: we should not free statdata ourselves, it will be deleted by libeio itself
2018-08-06 17:46:53 -07:00
Alex Miller
d99592f8bd
Fix an out-of-bounds vector access.
2018-08-06 12:50:34 -07:00
Evan Tschannen
6f328d41ac
suppressed spammy trace events
2018-08-06 12:12:55 -07:00
Evan Tschannen
538e684f1c
Merge branch 'release-6.0'
...
# Conflicts:
# versions.target
2018-08-03 11:41:46 -07:00
Evan Tschannen
2619234477
Merge branch 'release-5.2' into release-6.0
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2018-08-03 11:40:24 -07:00
Evan Tschannen
21fe6adac4
fix: give time to do other work between accepting connections. It is expensive to accept TLS connections, so we have a slow task (which can kill other connections) if we accept too many connections in a row
2018-08-03 11:37:10 -07:00
Alex Miller
1a7cda4149
Stop performing self-moves. (e.g. a = std::move(a))
...
self-moves are frowned upon in C++, and in our code this generally happens from
calls to swap as part of trying to implement a "unordered erase" function via
swap-to-the-end-and-pop_back. For convenience, a swapAndPop() function is now
offered that performs this, while disallowing self-moves.
2018-08-01 18:09:54 -07:00
Evan Tschannen
1c29275672
call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.
2018-08-01 14:30:57 -07:00
Alex Miller
f70f204d55
Fix a compilation error on windows.
2018-07-30 17:13:37 -07:00
Evan Tschannen
28a26d54f2
Merge commit 'ccf4384c79d026edbf76152e95e7410ebe621c1f' into release-6.0
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbrpc/FlowTransport.actor.cpp
2018-07-28 09:11:31 -07:00
Evan Tschannen
fa3b61508c
fix: do not increase numIncompatibleConnections if the connect was already incompatible
2018-07-28 08:50:54 -07:00
Stephen Atherton
4379a58bbe
Suppress potentially spammy event and don't log cancellation errors.
2018-07-27 21:03:10 -07:00
Stephen Atherton
59e005485d
Fixed bug where incompatible connection count was sometimes decremented twice for the same peer.
2018-07-27 20:48:14 -07:00
Stephen Atherton
6a3834c3f8
Fixed memory leak when destroying a FlowTransport.
2018-07-27 20:46:54 -07:00
Stephen Atherton
c593d1c6a2
Bug fix causing clients to sometimes (rarely) not reconnect to upgraded clusters. Reliable packets were being dropped to incompatible peers intentionally, but now this is only done if the peer is newer since successful communication with a newer peer will never be possible.
2018-07-27 20:42:06 -07:00
Steve Atherton
d1a877039d
Merge pull request #628 from alexmiller-apple/reloadcertificates
...
Reload certificates if changed.
2018-07-26 17:21:23 -07:00
Stephen Atherton
40762d9f9b
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-07-25 17:58:52 -07:00
Evan Tschannen
95bc695f0e
Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0
2018-07-25 13:06:54 -07:00
Evan Tschannen
89a3e2e1b4
Backed out connection closing changes because of upgrade problems
2018-07-25 13:06:13 -07:00
Alex Miller
262af775eb
Implement overly simple file write timestamps for simulation, and clean up code.
2018-07-24 17:20:31 -07:00
Alex Miller
168496f819
Poll the certificate files if TLS is enabled and reload them if changed.
...
This allows certificates to be changed/updated without having to restart fdbserver.
2018-07-20 19:00:32 -07:00
Alex Miller
2d26e98d07
Add a cross-platform getLastWrite() to get a file's mtime.
2018-07-20 19:00:32 -07:00
A.J. Beamon
a7a1124c11
Fix incompatible connection accounting that was incorrectly decrementing the incompatible count in some cases.
2018-07-17 11:36:05 -07:00
A.J. Beamon
8879954254
Merge pull request #609 from etschannen/release-6.0
...
Improved simulation strength by only remove datacenters that have been killed
2018-07-16 15:59:28 -07:00
Evan Tschannen
e0caa28758
code cleanup
2018-07-16 15:56:43 -07:00
AlvinMooreSr
aafb3c5c00
Merge pull request #593 from AlvinMooreSr/release-6.0-tls-funct
...
Replaced separate TLS Log function with FDB TraceEvent logger
2018-07-16 12:01:02 -07:00
Evan Tschannen
f72a9f60c0
only disable fearless if a datacenter has actually been killed
...
fix: we must prevent recovery into the dead datacenter while reducing usable_regions
2018-07-16 10:06:57 -07:00
Alvin Moore
a034acf3bd
Replaced separate TLS Log function with FDB TraceEvent logger
2018-07-11 18:41:46 -07:00
Stephen Atherton
96389c74cd
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 16:42:34 -07:00
Alec Grieser
d5a23642a1
Merge pull request #587 from etschannen/feature-remote-logs
...
close unneeded connections
2018-07-10 13:27:15 -07:00
Evan Tschannen
a35d5e30d9
Added a SevError trace event in case peer references becomes negative
2018-07-10 13:26:28 -07:00
Evan Tschannen
c25be5699a
close unneeded connections
2018-07-10 13:10:29 -07:00
Alec Grieser
be9c34c6f8
Merge remote-tracking branch 'upstream/release-5.2' into merge-release-5.2
2018-07-10 10:04:48 -07:00
Alec Grieser
ad37b1693d
Merge pull request #585 from etschannen/feature-remote-logs
...
A variety of cleanup and test strengthening commits
2018-07-10 09:58:44 -07:00
AlvinMooreSr
b3916a9b71
Merge pull request #409 from joelarmstrong/tlsconnection-clang-ub-warning
...
Fix compilation with clang from Apple LLVM 9.1.0
2018-07-10 09:32:24 -07:00
Stephen Atherton
1bc95862b7
Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 04:16:02 -07:00
Evan Tschannen
82cc30be62
added testing for two_satellite_fast and two_satellite_safe
2018-07-09 22:01:46 -07:00
Stephen Atherton
fddb3e87e2
Differentiate between a timeout in attempting to connect vs a timeout on an active connection by converting timeouts during connection attempts to connection_failed errors.
2018-07-09 19:40:01 -07:00
Stephen Atherton
3ce7c78d36
If an HTTP request fails due to a connection failure or a timeout, do not convert the error to the more generic http_request_failed.
2018-07-09 18:58:33 -07:00
Evan Tschannen
e503dc975c
fix: destroy peers that are inactive
...
do not open new connections to send replies
2018-07-09 13:37:06 -07:00
Evan Tschannen
5a2cb3037b
merge 5.2 into 6.0
2018-07-08 20:14:06 -07:00
Evan Tschannen
0e97ce79b4
fix: destroy peers that are inactive
...
do not open new connections to send replies
2018-07-08 10:26:41 -07:00
Stephen Atherton
a2f16e217e
Memory waste fix, when a Peer disconnects an extra packet buffer block is allocated to copy unsent reliable bytes to even if there aren't any.
2018-07-06 19:44:30 -07:00
Evan Tschannen
6d7172ef7e
fix: canKillProcesses did not take into account the remoteTLogPolicy when checking notEnoughLeft
2018-07-05 21:36:09 -07:00
Evan Tschannen
6f4ca2eba2
fix: get all processes did not include rebooting processes
2018-07-05 21:13:56 -07:00
Evan Tschannen
cd4fb9285a
waitForExlusion requires both regions to be healthy, which is only possible if we do not kill all logs in a region
2018-07-05 14:04:42 -07:00
Stephen Atherton
9d85a05372
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-05 12:52:06 -07:00
Stephen Atherton
2cb0362102
AsyncFileCached now allows writing and truncation of whole pages previously read using readZeroCopy and not yet released without prior readers seeing the effects of the write.
2018-07-05 02:59:13 -07:00
Evan Tschannen
7315e5da55
fix: isExcluded and isCleared were exactly wrong
...
fix: isCleared should mean the process is dead
2018-07-05 02:22:22 -07:00
Evan Tschannen
e17dfea3b6
fix: desiredTLogCount was used instead of getDesiredLogs(), which caused problems with recruitment when desiredTLogCount was -1.
...
canKillProcess logic was wrong.
We still need to configure usable_regions because if datacenterVersionDifference is too large we cannot complete data movement.
2018-07-04 16:22:32 -04:00
Stephen Atherton
2925b9b984
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-07-03 23:03:56 -07:00
Alvin Moore
c3f88dbfe1
Merge branch 'master' of github.com:apple/foundationdb into tls-static
2018-07-01 23:13:57 -07:00
Alvin Moore
132e2d9267
Defined TLS build flags for projects
...
Updated TLS documentation
2018-07-01 22:49:39 -07:00
Stephen Atherton
b95a2bd6c1
Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
...
# Conflicts:
# flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen
899f880ce0
fix: log router class did not have the proper fitness for becoming the cluster controller
2018-06-28 23:20:01 -07:00
Alvin Moore
45849d1f95
Added support for no-op legacy TLS options
2018-06-27 09:25:05 -07:00
Alvin Moore
65d8b38ae9
Changed generic plugin code to work as expected plugin code except for TLS use case
...
Defined TLS plugin name constant
Changed TLS plugin name to get_tls_plugin
Fixed link script
Removed compilation flags from info make target
2018-06-26 16:01:25 -07:00
Alvin Moore
ef8de426d3
Changed the TLS_DISABLED macro
...
Disable TLS within Windows until working
2018-06-26 12:08:32 -07:00
Evan Tschannen
0123627d67
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-06-22 10:43:07 -07:00
Evan Tschannen
5fc8199abc
Swapped OkayFit and UnsetFit, because generally if machine classes are set on one machine they are set everywhere and it helps with wait_for_good_recruitment logic
...
wait_for_good_recruitment now requires that you have the desired count of each roll
remote recruitment is given a much longer wait_for_good_recruitment time interval, which does not start until enough remote machines have registered
2018-06-22 10:15:24 -07:00
Evan Tschannen
1dce97f28c
Merge branch 'release-5.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/SimulatedCluster.actor.cpp
# packaging/msi/FDBInstaller.wxs
# versions.target
2018-06-21 17:05:11 -07:00
Balachandar Namasivayam
d7dba11366
Throw tls_error instead of internal_error when not able to create a TLS connection.
2018-06-21 15:33:00 -07:00
Stephen Atherton
e9e1e194f0
Added operation-specific rate controls to blob store interface.
2018-06-20 20:34:34 -07:00
Richard Low
fff6a47c43
Validate certiicates by default
2018-06-20 14:04:03 -07:00
Alvin Moore
f8ce1de601
Added support for compiling TLS into binaries
2018-06-20 09:21:23 -07:00
Stephen Atherton
e5c48d453a
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-18 22:45:27 -07:00
Evan Tschannen
0913368651
added usable_regions to specify if we will replicate into a remote region
...
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Stephen Atherton
1eae9d621b
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-13 15:58:21 -07:00
Stephen Atherton
2878f30f29
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/IKeyValueStore.h
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Alex Miller
6c2cb25c53
Rename BestOtherFit -> OkayFit.
...
The previous order of fitness was
BestFit > GoodFit > BestOtherFit > ...
which is baffling. It's now:
BestFit > GoodFit > OkayFit > ...
which won't break anyone's expectations.
2018-06-12 16:50:25 -07:00
Evan Tschannen
372ed67497
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen
48fbc407fd
fix: we cannot kill all of the remote tlogs, because we still need their data to copy to the next generation in the same data center
2018-06-08 15:28:44 -07:00
A.J. Beamon
99c9958db7
Some more trace event normalization
2018-06-08 13:57:00 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Balachandar Namasivayam
529d0497f1
Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload.
...
Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.
2018-06-01 15:21:40 -07:00
A.J. Beamon
d9c702a9e3
Merge release-5.1 into release-5.2
2018-05-30 09:09:55 -07:00
Joel Armstrong
7c35ea6ba1
Fix use of bool in va_start causing undefined behavior
...
The version of clang included in Apple LLVM 9.1.0 complains about
passing the bool parameter `is_error` to va_start, which causes make
to fail:
fdbrpc/TLSConnection.actor.cpp:370:16: error: passing an object that undergoes
default argument promotion to 'va_start' has undefined behavior
[-Werror,-Wvarargs]
va_start( ap, is_error );
^
This just switches is_error back to the type it gets promoted to (int).
2018-05-24 16:37:11 -07:00
A.J. Beamon
026458baf3
Merge release-5.2 into master
2018-05-23 15:32:56 -07:00
Richard Low
84ed35b01f
Only log TLS verify failures if all verification fails; log failures at SevInfo
2018-05-21 10:58:59 -07:00
Richard Low
086700aeb1
Plumb through TLS key password to CLI and from environment
2018-05-21 10:56:10 -07:00
Evan Tschannen
520aaf731d
merge release 5.2 into master
2018-05-10 14:33:08 -07:00
Evan Tschannen
b5b8c5d587
fix: white space issue in getKnobDescription
2018-05-10 14:27:10 -07:00
Balachandar Namasivayam
b2c32ea4f2
Add secure_connection param to BlobStore to configure security.
...
Default is https. Setting secure_connection=0 makes it http.
2018-05-10 13:53:46 -07:00
Evan Tschannen
7bca7b80e6
fixed merge conflicts
2018-05-10 09:13:41 -07:00
Evan Tschannen
8f984cb2c9
Merge branch 'release-5.2'
...
# Conflicts:
# fdbrpc/TLSConnection.h
2018-05-10 09:13:22 -07:00
Evan Tschannen
d3450ce5b0
Merge pull request #343 from bnamasivayam/tls-plugin
...
Tls plugin
2018-05-09 16:35:53 -07:00
Balachandar Namasivayam
479dbf4c04
Addressed review comments.
...
Remove redundant FDBLibTLS/ITLSPlugin.h.
2018-05-09 16:16:09 -07:00
Balachandar Namasivayam
0c2960a221
Use smart pointer instead of naked ones in set_peer_verify() method.
2018-05-09 14:53:01 -07:00
Balachandar Namasivayam
7591931a09
Revert "Make tls_verify_peers as a comma separated string of constraints."
...
This reverts commit 2033847e4b
.
2018-05-09 14:40:36 -07:00
Balachandar Namasivayam
2033847e4b
Make tls_verify_peers as a comma separated string of constraints.
2018-05-09 14:37:39 -07:00
Balachandar Namasivayam
e8b7f4b190
Add password support for tls.
2018-05-08 20:46:31 -07:00
Balachandar Namasivayam
49af5d685b
Restore previous behavior of not specifying peer_verify option means disable checking.
2018-05-08 18:54:44 -07:00
Balachandar Namasivayam
d3b5cfb93c
Support latest TLS plugin.
...
Add support for https in backup.
2018-05-08 16:28:13 -07:00
Evan Tschannen
7acdc314e4
Merge branch 'release-5.2'
...
# Conflicts:
# fdbrpc/TLSConnection.actor.cpp
2018-05-08 13:22:53 -07:00
Evan Tschannen
1f6c6a886b
Merge branch 'release-5.1' into release-5.2
2018-05-08 13:08:11 -07:00
Alvin Moore
9aa94e87a3
Renamed the default TLS search plugin
2018-05-07 17:01:14 -07:00
Alex Miller
bc8e6acbe8
Fix the other half of simulation requiring a TLS Plugin.
...
This commit:
1. Restores --tls_plugin as a way to provide the path to the TLS plugin when running in simulation.
2. Removes the TLS Plugin as being required for 5% of tests.
3. Standardizes on 'sslEnabled' as a variable name.
And is a fix/improvement upon commit f7733d1b
.
(1) previously didn't work, because we would create multiple new TLSOptions
instances and run init_plugin multiple times. Only the first call would use
the argument specified on the command line. To fix this, the TLSOptions
derived from the command line is threaded through all the simulation code that
needs it.
(2) was an oversight in f7733d1b
, which didn't actually make "should we be TLS"
dependant on if the TLS plugin was available or not.
(3) is just nice for trying to grep around in the codebase.
2018-04-30 18:26:29 -07:00
Stephen Atherton
af61d3596d
Merge branch 'public-master' into feature-redwood
...
# Conflicts:
# fdbserver/DatabaseConfiguration.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
2018-04-24 17:22:21 -07:00
Alex Miller
f7733d1bd0
Do not require the TLS Plugin for simulation.
...
It appears that explicit calls to TLS-related things had snuck in over time,
which meant that simulation runs that weren't even configured to use SSL still
wanted and required the TLS plugin.
This commit instead threads through the understanding of if any TLS-related
options were provided, and if not, then don't call anything TLS-related so that
we don't require the TLS plugin.
Hopefully this makes life easier for the opensource folk. :)
2018-04-24 16:53:30 -07:00
Dennis Schafroth
290122637b
Using ASSERT_ABORT in destructors
2018-04-23 14:05:10 +02:00
Evan Tschannen
c1ccc8522c
Merge branch 'release-5.2'
2018-04-17 18:38:12 -07:00
Evan Tschannen
db98c1b9b6
Merge branch 'release-5.1' into release-5.2
...
# Conflicts:
# versions.target
2018-04-17 18:36:19 -07:00
Stephen Atherton
0169384636
Fixed rare infinite loop in blob list and delete operations.
2018-04-12 17:22:34 -07:00
Alex Miller
20082e3228
Clang fixes.
2018-04-12 11:10:53 -07:00
Alec Grieser
42c8527f43
Merge remote-tracking branch 'upstream/release-5.2' into merge-release-5.2
2018-04-11 18:35:32 -07:00
Yichi Chiang
d8175471bc
Merge release-5.1.5
2018-04-11 17:55:10 -07:00
A.J. Beamon
ee4c966137
TransportData::numIncompatibleConnections was uninitialized.
2018-04-11 11:15:12 -07:00
Stephen Atherton
2752a28611
Merge branch 'release-5.2' of github.com:apple/foundationdb into feature-redwood
2018-04-06 16:29:37 -07:00
Alec Grieser
551ea9c7f8
Merge remote-tracking branch 'upstream/release-5.2' into master-release-5.2-merge
2018-03-19 12:34:50 -07:00
yichic
ede5cab192
Merge pull request #89 from yichic/share-log-mutations-5.2
...
Share log mutations 5.2
2018-03-19 12:01:26 -07:00
Yichi Chiang
ec02e54f64
Refactor EraseLogData()
2018-03-19 11:56:01 -07:00
Yichi Chiang
d6559b144f
Share log mutations between backups and DRs which have the same backup range
2018-03-19 11:32:50 -07:00
Yichi Chiang
26b93ff920
Share log mutations between backups and DRs which have the same backup range
2018-03-16 18:09:23 -07:00
Alec Grieser
0853fcb052
switch to using zu for some size_t variables in printf
2018-03-14 18:07:05 -07:00
Evan Tschannen
3abf4d7fdf
Merge branch 'master' into feature-remote-logs
2018-03-09 14:50:04 -08:00
Evan Tschannen
91bb8faa45
Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
Evan Tschannen
28ea983487
Merge branch 'release-5.1' into release-5.2
...
# Conflicts:
# flow/Trace.cpp
# versions.target
2018-03-09 14:40:31 -08:00
Evan Tschannen
cf6dd1437b
suppress spammy trace events
2018-03-09 10:16:34 -08:00
Evan Tschannen
5390af8be4
suppress spammy logs
2018-03-09 09:40:36 -08:00
Evan Tschannen
68606c7984
fix: sim2 logic for when a kill is safe was incorrect
2018-03-06 18:38:05 -08:00
A.J. Beamon
f2c804e14f
Reverting changes from merge of master into release-5.2 ( b25810711c
). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit.
2018-03-06 10:15:04 -08:00
satherton
a82d0e95be
Merge pull request #25 from apple/release-5.1
...
Merge release-5.1 to master
2018-03-04 23:20:31 -08:00
Stephen Atherton
d0e122fdbe
Blob client send and receive speed limits were being initialized using opposite knobs.
2018-03-04 23:05:55 -08:00
Evan Tschannen
e3c6b66240
fix: do not commit more data after being stopped
...
fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center
2018-02-26 13:13:37 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alvin Moore
de1551c20d
Merge branch 'release-5.1'
2018-02-23 08:24:06 -08:00
Alvin Moore
a1382895a6
Fixed headers and some whitespace
2018-02-23 04:50:23 -08:00
Alec Grieser
e1162e9238
Merge remote-tracking branch 'upstream/release-5.1'
2018-02-22 11:16:12 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Alec Grieser
aadc06de99
Merge remote-tracking branch 'upstream/release-5.1'
2018-02-20 14:28:29 -08:00
Alec Grieser
1c1ae7d70e
Merge remote-tracking branch 'upstream/release-5.1' into bindings-format
2018-02-19 12:37:06 -08:00
Evan Tschannen
31b89a638f
added satellite_none and remote_none options to unconfigure from a fearless setup
...
fix: log_router configuration was broken
2018-02-17 13:51:17 -08:00
Evan Tschannen
dc93759e15
suppressed trace events that are spammy
2018-02-16 16:01:19 -08:00
Evan Tschannen
cb25564d38
simulated cluster supports fearless configurations
...
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00
A.J. Beamon
814ae16016
Add destination tokens to Net2_LargePacket trace events. Add backtrace when a sent packet is too large.
2018-02-15 14:54:35 -08:00
Balachandar Namasivayam
f320b1b347
Change ConnectionClosed TraceEvent severity from SevError to SevWarnAlways.
2018-02-14 12:25:54 -08:00
Stephen Atherton
0a35f167e4
Merge branch 'master' into feature-redwood
...
# Conflicts:
# fdbserver/DiskQueue.actor.cpp
# fdbserver/IDiskQueue.h
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
# fdbserver/worker.actor.cpp
2018-02-12 01:30:02 -08:00
Evan Tschannen
42405c78a5
Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/Knobs.cpp
2018-02-10 12:08:52 -08:00
Evan Tschannen
fbadcc6eea
changing a storage server’s tag must be the first mutations applied in a version, because privatized mutations applied earlier in the same version will use the old tag
2018-02-09 18:21:29 -08:00
Evan Tschannen
c7b3be5b19
re-enabled better master exists
...
the cluster controller can choose a better data center for itself and let the workers know where the next cluster controller should be recruited
2018-02-09 16:48:55 -08:00
Stephen Atherton
69425a303b
Improved error handling for cases where blob account credentials are either not found in the provided credentials sources and/or some of the credentials sources provided are not readable or parseable.
2018-02-07 21:50:43 -08:00
Stephen Atherton
f8522248cb
Blob credentials files were being opened in read-write mode despite the read-only option being specified because the underlying caching layer opens always opens files for read/write access. For now, disabled caching for this file.
2018-02-07 16:25:16 -08:00
Stephen Atherton
d8879dc3f3
HTTP::doRequest() now reads responses in parallel with sending requests, so if the server responds before receiving all of the the request the client can stop sending the remainder of the request. For PUT requests which upload files, this prevents sending potentially several megabytes of unnecessary bytes if the server responds with an error (such as 429) before the request is completely sent. Updated the backup container unit test to use more parallelism in order to test this new behavior.
2018-02-07 10:38:31 -08:00
Stephen Atherton
0792d5e3dd
Fix: last restorable version for a backup tag name (a separate value from the latest restorable version for a configured backup) was not being updated.
...
Fix: backup blob speed was sometimes an error because the JSON $sum merge operator did not support mixed numeric types.
Fix: JSON merge operator handling was squashing errors in some cases, which was generally obscuring the backup speed metric issue.
Cleaned up some of the JSON object merging logic.
Improved error messages in JSON merge operators. Added JSON merge operator tests for mixed numeric math and improved readability of test output.
2018-02-06 13:44:04 -08:00
Evan Tschannen
ebd94bb654
removed a separately configurable storage team size for the remote data center, because it did not make sense
...
fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
2018-02-02 11:46:04 -08:00
Evan Tschannen
2e3b1d7ab8
Merge commit 'dd6ea70051aef215315e9eb3dea3b67a24778e32' into feature-remote-logs
...
# Conflicts:
# flow/Net2.actor.cpp
2018-01-29 17:11:03 -08:00
Stephen Atherton
2f291d8955
Bug fix in blob backup container deletion. The list/delete loop could end before deleting all of the files, but the index entry would still be deleted. Also preemptively made the same code change in listBucket() - Although it is technically correct as written it is a dangerous style because it is not obvious that the addition of a wait() call in the second 'when' block would create a bug. Consolidated deleteContainer() and deleteBucket() as they differ by only 1 line.
2018-01-29 00:32:41 -08:00
Alec Grieser
51781bb7a8
Merge branch 'release-5.1' into bindings-format
2018-01-26 12:28:29 -08:00
Evan Tschannen
79d94214a4
Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs
2018-01-25 10:12:06 -08:00
Stephen Atherton
9fd2a8df3d
Tweaked a trace event suppression time.
2018-01-24 19:08:24 -08:00
Alec Grieser
57986cfe00
format python files to be roughtly pep8 compliant
2018-01-24 19:06:58 -08:00
A.J. Beamon
19ed388c0e
Merge branch 'release-5.0' into release-5.1
...
# Conflicts:
# documentation/sphinx/source/downloads.rst
# documentation/sphinx/source/release-notes.rst
# versions.target
2018-01-24 14:43:41 -08:00
Stephen Atherton
7f18d59dfe
Bug fix, the blob request attempt count is now incremented for all errors except response code 429.
2018-01-24 01:15:01 -08:00
Stephen Atherton
a2481343ec
Bug fix, HTTP error code 429 was not being considered retryable in blob client (this was previously fixed but apparently reintroduced).
2018-01-24 00:22:11 -08:00
Stephen Atherton
66de9d392b
New error code, http_auth_failed, which is used when blob authentication fails instead of the previous generic http_request_failed.
2018-01-22 14:58:56 -08:00
Evan Tschannen
698ef4117e
Merge branch 'master' into feature-remote-logs
2018-01-20 10:34:30 -08:00
Stephen Atherton
307e04c0ad
Updated backup container unit test to match new safer behavior of expireData(). Rewrote BackupContainerLocalDirectory::deleteContainer() to actually delete the whole directory but only if it appears to be a backup with either log or snapshot data.
2018-01-18 00:36:28 -08:00
Stephen Atherton
93b34a945f
Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup.
2018-01-17 04:09:43 -08:00
Evan Tschannen
21482a45e1
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DBCoreState.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Alvin Moore
2e6ce03224
Merge pull request #232 from cie/build-dont-compile-hpp
...
Filter out .hpp files from *_BUILD_SOURCES (like we do with .h files)…
2018-01-12 14:09:25 -08:00
Evan Tschannen
02bd83ff76
changed incompatibleDataRead to an asyncTrigger
2018-01-11 13:35:56 -08:00
A.J. Beamon
80b84c23ac
Filter out .hpp files from *_BUILD_SOURCES (like we do with .h files). Add xml2json.hpp to our fdbrpc project.
2018-01-10 13:51:57 -08:00
A.J. Beamon
ce93d98b50
Temporarily remove xml2json.hpp from fdbrpc vcxproj
2018-01-10 10:18:44 -08:00
A.J. Beamon
2f5073d00f
Some visual studio project cleanup.
2018-01-10 10:07:18 -08:00
Stephen Atherton
0e7d538c94
Bug fix, in recursive blob folder listings the recent removal of common prefixes from the result stream caused the list marker to not be set correctly when a folder level requires multiple requests due to folder size.
2018-01-06 20:58:48 -08:00
Evan Tschannen
3ec45d38a0
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Stephen Atherton
96cb06cbc7
Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations.
2018-01-05 23:06:39 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Stephen Atherton
78430425e8
Blob bucket listings will now use parallel recursive requests on CommonPrefixes, up to a max depth, if a delimiter is provided.
2018-01-02 23:17:52 -08:00
Stephen Atherton
07fde9dfb4
Bug fix, error code 429 was not being treated as retryable in the recent refactor.
2018-01-02 23:15:25 -08:00
Stephen Atherton
f324afc13f
Bug fix in blob store listing when it requires multiple serial requests Added more trace events to FileBackup and BlobStoreEndpoint with suppression and added suppression to existing trace events.
2017-12-22 17:08:25 -08:00
Stephen Atherton
f2524ffd33
AsyncFileBlobStoreWrite was prohibiting the writing of 0-byte files. Improved HTTP verbose logging to stdout. Added writing a 0-byte file to BackupContainer unit test. Added backup log and snapshot sizes to backup description.
2017-12-21 21:15:26 -08:00
Stephen Atherton
e0ef5a9a20
Whitespace normalization.
2017-12-21 12:07:29 -08:00
Stephen Atherton
e3aee45a74
Backup tools and agent now accept blob account credentials via files containing JSON which are specified using command line arguments and/or an environment variable. Improved fdbbackup help, clarifying which options are for which operations. Fdbbackup operations which do not need to use a database no longer require a cluster file parameter. Added eat() commands to StringRef for incrementally tokenizing strings using separator strings.
2017-12-21 01:58:15 -08:00
Stephen Atherton
e0d9cea008
Merge branch 'master' into continuous-backup
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
# fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Alex Miller
9a0df6d76d
Deallocate aligned_alloc with aligned_free.
...
This probably fixes a windows-only crash, as only windows cares about this distinction.
2017-12-14 15:12:05 -08:00
Stephen Atherton
b6cfe010a1
Bug fix in URL encoding of delimiter.
2017-12-12 17:31:19 -08:00
Stephen Atherton
872edd7540
Merge branch 'release-5.0'
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2017-12-06 16:27:04 -08:00
Stephen Atherton
41f80bf7ed
Renamed an error, changed blob request failure to Warn severity.
2017-12-06 15:58:54 -08:00
Stephen Atherton
4bc7d0b86a
Updated error names and severities.
2017-12-06 15:42:44 -08:00
Stephen Atherton
abb2dd1ebc
Merge pull request #214 from cie/alexmiller/fallocate
...
Use fallocate to zero ranges instead of writing zeroes
2017-12-06 13:47:40 -08:00
Alex Miller
064670a95b
Maintain a reference to the IAsyncFile in zeroRange.
...
And also add some notes about the reference semantics to the IAsyncFile header
for future readers.
2017-12-06 13:41:21 -08:00
Balachandar Namasivayam
1f949240f5
Make fdbbackup s3 compatible.
...
s3 sends response in XML. FDB backup expects json response. Added a new libraray xml2json to convert xml to json.
2017-12-05 17:13:15 -08:00
Stephen Atherton
86ae6c09c7
Bug fixes, take(1) is incorrect usage of FlowLock.
2017-12-04 10:20:50 -08:00
Evan Tschannen
482ac38ca6
added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs
2017-12-01 13:04:32 -08:00
Alex Miller
7bab3a4ece
AsyncFileKAIO will prefer using fallocate's ZERO_RANGE for AsyncFile::zero().
...
For situations in which we have support for FALLOC_FL_ZERO_RANGE, it's much
faster to use fallocate than manually overwrite the file with zero bytes. Note
that this support depends on having a kernel from late 2014 or newer, and being
on ext4 or xfs. If these conditions aren't met, we'll fall back to writing
zeros in 1MB chunks as normal.
2017-11-30 17:57:55 -08:00
Alex Miller
196258080b
Refactor zeroing a chunk of a file from DiskQueue into IAsyncFile.
...
If we're going to do the work to provide more optimized ways to zero files,
then I'd feel better with this being in a more common place, so that any other
zero-ers are likely to reuse it. It also makes testing easier/more obvious.
Also, because it's needed for correctness, fix the aligned_alloc for OSX, which
wasn't aligned, and use an actually aligned allocation function.
2017-11-30 17:57:55 -08:00
Alex Miller
c7a120c59d
Rename IAsyncFile::incrementalDelete -> IAsyncFileSystem::incrementalDeleteFile.
...
`deleteFile` existed in IAsyncFileSystem, so an incremental delete function
seems to belong more as a virtual method on IAsyncFileSystem than a static
method on IAsyncFile, and the naming should match.
As long as we're here, change IAsyncFile to declare a virtual destructor, so
that it has good and proper C++ behavior. I presume this is what was vaguely
intended by the default constructor definition that previously existed?
2017-11-30 17:19:10 -08:00
Stephen Atherton
1e643239f9
Improvement in blob connnection reuse, oldest connnections in pool are now used first.
2017-11-30 12:57:29 -08:00
Stephen Atherton
1b1c8e985a
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2017-11-25 19:54:51 -08:00
Alex Miller
f19cb3bbbd
Merge pull request #208 from cie/alexmiller/grvtfix
...
Fix the GRV performance regression
2017-11-17 15:00:44 -08:00
Alex Miller
e9412bbb11
Fix the GRV performance regression introduced by adding the policy engine to GRV calculations.
...
Construction of LocalityGroup from LocalityData is expensive, and the previous
code greatly ran afoul of that. The policy engine does a large amount of
interning of strings and building compressed maps to make the expected many
future selectReplica calls cheap. Unfortunately we don't call selectReplicas,
so much of this work is undesireable for us, and a large amount of CPU time is
spent doing this initialization work.
The new changes aggressively do the minimal LocalityGroup::add() calls
necessary, and make them as cheap as possibly by removing all elements from
LocalityData that don't need to be considered by the policy.
This optimization was also applied to the PeekCursor used during recovery,
which should speed recoveries up by a small amount.
2017-11-16 16:15:52 -08:00
Stephen Atherton
a77162b53d
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/BackupAgent.h
# fdbclient/FileBackupAgent.actor.cpp
# fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton
e07dcb9ada
Fixed header paths.
2017-11-15 00:05:20 -08:00
Stephen Atherton
3dfaf13b67
IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
...
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively().
Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
Balachandar Namasivayam
987379d790
Changed naming of num_incompatible_connections to numIncompatibleConnections
2017-11-14 18:37:29 -08:00
Balachandar Namasivayam
27b67cffbe
The earlier implementation of tracking number of incompatible connection had a bug where the counter will be incorrectly decremented for incoming connections on certain conditions.
...
Now the counter increment and decrement happens in the same ACTOR (ConnecitonReader) and makes it easy to verify its correctness.
2017-11-13 15:07:39 -08:00
Balachandar Namasivayam
9809e84806
Added a counter to keep track of active outgoing incompatible connections.
...
This counter is used to print a warning in fdbcli if there are incompatible peers.
Example Output:
./fdbcli
Using cluster file `fdb.cluster'.
WARNING: Incompatible peers exist.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
fdb> status
WARNING: Incompatible peers exist.
Using cluster file `fdb.cluster'.
Could not communicate with a quorum of coordination servers:
127.0.0.1:4000 (unreachable)
2017-11-09 11:20:35 -08:00
Evan Tschannen
57aba0b3bc
fix: excluded servers were the same fitness as storage servers for the master role
...
fix: better master exists did not considers exclusion for master fitness
2017-11-03 17:09:14 -07:00
John Brownlee
d46e240de2
Merge branch 'release-5.0'
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
# versions.target
2017-11-02 10:42:30 -07:00
Stephen Atherton
f050105243
Added HTTP 502 to the list of retryable errors.
2017-11-01 11:41:32 -07:00
Alex Miller
3b61b76876
Fix a massive amount of valgrind errors and make them easier to debug in the future.
...
std::is_pod<> being less restrictive than is_binary_serializable<> meant that
structs that both were POD and had a serialize method defined would be binary
serialized instead of using the defined serialize(). This means that it would
also serialize any padding that the struct contained, which would cause mass
waves of valgrind failures from uninitialized memory.
Included in this change is additional uses of valgrind client requests so that
attempts to send uninitialized memory are reported at the sending site, versus
as part of checksum calculation in sending the packet.
2017-10-27 16:54:44 -07:00
Evan Tschannen
df74e2a373
re-added support for non-copying tlog recovery
2017-10-24 15:09:31 -07:00
Stephen Atherton
45fa3680fa
Restore logging of remote address (if connected) or host (if connection fails) for blob errors.
2017-10-20 21:47:23 -07:00
Stephen Atherton
3afc85881e
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbrpc/BlobStore.actor.cpp
2017-10-20 21:38:28 -07:00
Stephen Atherton
42955012e9
Merge branch 'release-5.0'
...
# Conflicts:
# fdbrpc/BlobStore.actor.cpp
# flow/error_definitions.h
2017-10-20 21:16:55 -07:00
Stephen Atherton
9f151314b3
Changed some trace event severities. Also fixed a weird casing of “retryable”.
2017-10-19 17:47:42 -07:00
Evan Tschannen
e2c1e87df6
made a large number of fixes to make fearless DR correctness clean.
2017-10-19 15:36:32 -07:00
Stephen Atherton
caad691ae2
Added comments for how to handle HTTP 400 errors gracefully in certain instances should the need arise.
2017-10-18 23:47:59 -07:00
Stephen Atherton
ef84e52127
Improved error handling and memory usage in AsyncFileBlobStoreWrite. Writes will now fail if any upload has already failed, rather than buffering unboundedly until sync() is called to complete the file. There is also a configurable limit on how many uploads can be pending before writes will stall waiting for one to finish.
2017-10-18 05:51:30 -07:00
Stephen Atherton
ebd0234514
Rewrote most error handling in BlobStoreEndpoint to fix several shortcomings in error handling and logging. The request loop now logs but rate limits all errors, and the exceptions thrown are more appropriate. HTTP 503 is now treated as retryable. Callers of BlobStoreEndpoint::doRequest() now specify which codes they consider to be successful so that more error handling can take place in the main request loop.
2017-10-18 02:52:09 -07:00
Alex Miller
7b9bc1d715
Merge pull request #170 from cie/alexmiller/flowprofile
...
Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.
2017-10-16 16:51:53 -07:00
Alex Miller
cf646d4a99
Address review comments.
...
* Fixed fdbcli to be more idiomatic.
* Removed is_binary_serializable in favor of std::is_pod<>
* Removed custom enable_if<> in favor of std::enable_if<>
* Removed HEY REVIEWER comments
* Removed print from prof.py
* Added FLOW_PROFILER_ENABLED=yes to circus components that wished to enable the flow profiler.
2017-10-16 16:46:52 -07:00
Yichi Chiang
a6ae89af1a
Merge pull request #176 from cie/add-cluster-controller-process-class
...
Add cluster controller process class
2017-10-16 16:27:54 -07:00
Yichi Chiang
af2aa41136
Downgrade Transaction process class for cluster controller
2017-10-16 16:27:01 -07:00
Yichi Chiang
76c5488421
Add cluster controller process class
2017-10-16 16:21:25 -07:00
Stephen Atherton
e934604f67
Added DNS resolution. Interface is INetworkConnections::resolveTCPEndpoint() to resolve, or for convenience INetworkConnections::connect(host, service) will resolve host and service (port number or service name like http) and connect to one of the addresses at random.
...
BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.
2017-10-15 21:51:11 -07:00
Evan Tschannen
ff1b49be2e
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DatabaseConfiguration.cpp
2017-10-10 16:07:59 -07:00
Evan Tschannen
15962cf079
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbrpc/Locality.cpp
# fdbrpc/Locality.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/ClusterRecruitmentInterface.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/fdbserver.vcxproj.filters
# fdbserver/masterserver.actor.cpp
# fdbserver/worker.actor.cpp
# flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Alvin Moore
de8f875038
Fixed call to IsClear
...
Changed killMachine and killDataCenter interface to return final killtype
Updated TESTs for DataCenter to ensure that DataCenter was killed
Added assertion to ensure that failed DC kills were not downgrades
2017-10-05 03:07:20 -07:00
Stephen Atherton
fd5fe3a000
Add slightly better handling of HTTP 503 in blob client. Previously it would end the blob request loop and the task doing the blob action would see a failure, but now the blob request attempt loop will continue to back off and retry. This is better because previously the task that saw the failure would be re-run quickly.
2017-10-03 15:25:49 -07:00
Stephen Atherton
03c4cea511
Added rate-controlled TraceEvents for blob http connection attempts and failures.
2017-10-03 15:21:40 -07:00
Yichi Chiang
284e35204a
Fix connection count
2017-10-03 10:54:20 -07:00
Alvin Moore
5257b99d3f
Fixed problem with machines RebootedAndCleared not being considered dead in availability consideration
2017-10-03 10:48:16 -07:00
Alvin Moore
d099656557
Merge branch 'release-5.0'
2017-10-02 12:05:24 -07:00
Alvin Moore
25513d8e2c
Added tests for DataCenter kills
2017-10-02 12:04:28 -07:00
Evan Tschannen
6ea9903c82
Merge branch 'release-5.0'
...
# Conflicts:
# fdbbackup/backup.actor.cpp
# fdbserver/ClusterController.actor.cpp
# versions.target
2017-10-01 18:46:44 -07:00
Stephen Atherton
058300be16
Each blobstore request will again select a random remote address. This used to happen before recent load balancing improvements related to focusing too much load on consistently up endpoints after others have recovered from being down.
2017-10-01 16:17:38 -07:00
Stephen Atherton
a95107417f
Improved behavior of slow writes during backup. KeyRange and Log backup tasks now use TaskBucket::saveAndExtend() to keep the task alive until flushing the file finishes or fails with an error (blob uploads fail after a limited number of retries). This prevents blob uploads from being retried too often if the destination is slow since a task abort and retry would start the backoff counters back at zero. Also removed a debugging behavior that was accidentally checked in.
2017-10-01 16:01:24 -07:00
Stephen Atherton
a098919b20
Bug fix, releaser declared in wrong place, and lots of whitespace cleanup from try blocks that were no longer needed.
2017-10-01 11:25:50 -07:00
Stephen Atherton
af87ac301d
Removed wait never used for debugging which was accidentally included in bug fix.
2017-10-01 11:19:38 -07:00
Stephen Atherton
6000cafde1
Bug fix, locks were being taken inside try/catch so release would be done even if the take threw an error. Changed to using a Releaser.
2017-10-01 10:46:55 -07:00
Evan Tschannen
f84e7252e8
fix: there was a reference counting cycle in asyncFileBlobStore and asyncFileReadAhead
2017-09-29 19:13:08 -07:00
A.J. Beamon
38616424f6
Report a couple error cases in blobstore URL parsing when dealing with numbers.
2017-09-29 17:58:49 -07:00
Alex Miller
c40c1bb5fe
Add a new workload: BackupToDBAbort, which does an ACI switchover.
...
This is to allower easier testing of non-durable switchovers without having to
wiggle into BackupToDBCorrectness's view of the world.
2017-09-29 15:58:36 -07:00
Evan Tschannen
a1f8b546e6
fix: ensure connections to blob store are evenly distributed across network addresses
...
added a per address limit to the number of open connections
lowered a variety of knobs to prevent us from using too much memory
2017-09-29 14:59:24 -07:00
A.J. Beamon
d30c730f75
Add the ability to access name and description in Error. Update error descriptions.
2017-09-28 12:35:03 -07:00
Alvin Moore
298b54104e
Merge branch 'release-5.0'
2017-09-26 11:16:14 -07:00