Evan Tschannen
e3c6b66240
fix: do not commit more data after being stopped
...
fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center
2018-02-26 13:13:37 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alvin Moore
de1551c20d
Merge branch 'release-5.1'
2018-02-23 08:24:06 -08:00
Alvin Moore
a1382895a6
Fixed headers and some whitespace
2018-02-23 04:50:23 -08:00
Alec Grieser
e1162e9238
Merge remote-tracking branch 'upstream/release-5.1'
2018-02-22 11:16:12 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Alec Grieser
aadc06de99
Merge remote-tracking branch 'upstream/release-5.1'
2018-02-20 14:28:29 -08:00
Alec Grieser
1c1ae7d70e
Merge remote-tracking branch 'upstream/release-5.1' into bindings-format
2018-02-19 12:37:06 -08:00
Evan Tschannen
31b89a638f
added satellite_none and remote_none options to unconfigure from a fearless setup
...
fix: log_router configuration was broken
2018-02-17 13:51:17 -08:00
Evan Tschannen
dc93759e15
suppressed trace events that are spammy
2018-02-16 16:01:19 -08:00
Evan Tschannen
cb25564d38
simulated cluster supports fearless configurations
...
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00
A.J. Beamon
814ae16016
Add destination tokens to Net2_LargePacket trace events. Add backtrace when a sent packet is too large.
2018-02-15 14:54:35 -08:00
Balachandar Namasivayam
f320b1b347
Change ConnectionClosed TraceEvent severity from SevError to SevWarnAlways.
2018-02-14 12:25:54 -08:00
Stephen Atherton
0a35f167e4
Merge branch 'master' into feature-redwood
...
# Conflicts:
# fdbserver/DiskQueue.actor.cpp
# fdbserver/IDiskQueue.h
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
# fdbserver/worker.actor.cpp
2018-02-12 01:30:02 -08:00
Evan Tschannen
42405c78a5
Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/Knobs.cpp
2018-02-10 12:08:52 -08:00
Evan Tschannen
fbadcc6eea
changing a storage server’s tag must be the first mutations applied in a version, because privatized mutations applied earlier in the same version will use the old tag
2018-02-09 18:21:29 -08:00
Evan Tschannen
c7b3be5b19
re-enabled better master exists
...
the cluster controller can choose a better data center for itself and let the workers know where the next cluster controller should be recruited
2018-02-09 16:48:55 -08:00
Stephen Atherton
69425a303b
Improved error handling for cases where blob account credentials are either not found in the provided credentials sources and/or some of the credentials sources provided are not readable or parseable.
2018-02-07 21:50:43 -08:00
Stephen Atherton
f8522248cb
Blob credentials files were being opened in read-write mode despite the read-only option being specified because the underlying caching layer opens always opens files for read/write access. For now, disabled caching for this file.
2018-02-07 16:25:16 -08:00
Stephen Atherton
d8879dc3f3
HTTP::doRequest() now reads responses in parallel with sending requests, so if the server responds before receiving all of the the request the client can stop sending the remainder of the request. For PUT requests which upload files, this prevents sending potentially several megabytes of unnecessary bytes if the server responds with an error (such as 429) before the request is completely sent. Updated the backup container unit test to use more parallelism in order to test this new behavior.
2018-02-07 10:38:31 -08:00
Stephen Atherton
0792d5e3dd
Fix: last restorable version for a backup tag name (a separate value from the latest restorable version for a configured backup) was not being updated.
...
Fix: backup blob speed was sometimes an error because the JSON $sum merge operator did not support mixed numeric types.
Fix: JSON merge operator handling was squashing errors in some cases, which was generally obscuring the backup speed metric issue.
Cleaned up some of the JSON object merging logic.
Improved error messages in JSON merge operators. Added JSON merge operator tests for mixed numeric math and improved readability of test output.
2018-02-06 13:44:04 -08:00
Evan Tschannen
ebd94bb654
removed a separately configurable storage team size for the remote data center, because it did not make sense
...
fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
2018-02-02 11:46:04 -08:00
Evan Tschannen
2e3b1d7ab8
Merge commit 'dd6ea70051aef215315e9eb3dea3b67a24778e32' into feature-remote-logs
...
# Conflicts:
# flow/Net2.actor.cpp
2018-01-29 17:11:03 -08:00
Stephen Atherton
2f291d8955
Bug fix in blob backup container deletion. The list/delete loop could end before deleting all of the files, but the index entry would still be deleted. Also preemptively made the same code change in listBucket() - Although it is technically correct as written it is a dangerous style because it is not obvious that the addition of a wait() call in the second 'when' block would create a bug. Consolidated deleteContainer() and deleteBucket() as they differ by only 1 line.
2018-01-29 00:32:41 -08:00
Alec Grieser
51781bb7a8
Merge branch 'release-5.1' into bindings-format
2018-01-26 12:28:29 -08:00
Evan Tschannen
79d94214a4
Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs
2018-01-25 10:12:06 -08:00
Stephen Atherton
9fd2a8df3d
Tweaked a trace event suppression time.
2018-01-24 19:08:24 -08:00
Alec Grieser
57986cfe00
format python files to be roughtly pep8 compliant
2018-01-24 19:06:58 -08:00
A.J. Beamon
19ed388c0e
Merge branch 'release-5.0' into release-5.1
...
# Conflicts:
# documentation/sphinx/source/downloads.rst
# documentation/sphinx/source/release-notes.rst
# versions.target
2018-01-24 14:43:41 -08:00
Stephen Atherton
7f18d59dfe
Bug fix, the blob request attempt count is now incremented for all errors except response code 429.
2018-01-24 01:15:01 -08:00
Stephen Atherton
a2481343ec
Bug fix, HTTP error code 429 was not being considered retryable in blob client (this was previously fixed but apparently reintroduced).
2018-01-24 00:22:11 -08:00
Stephen Atherton
66de9d392b
New error code, http_auth_failed, which is used when blob authentication fails instead of the previous generic http_request_failed.
2018-01-22 14:58:56 -08:00
Evan Tschannen
698ef4117e
Merge branch 'master' into feature-remote-logs
2018-01-20 10:34:30 -08:00
Stephen Atherton
307e04c0ad
Updated backup container unit test to match new safer behavior of expireData(). Rewrote BackupContainerLocalDirectory::deleteContainer() to actually delete the whole directory but only if it appears to be a backup with either log or snapshot data.
2018-01-18 00:36:28 -08:00
Stephen Atherton
93b34a945f
Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup.
2018-01-17 04:09:43 -08:00
Evan Tschannen
21482a45e1
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DBCoreState.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Alvin Moore
2e6ce03224
Merge pull request #232 from cie/build-dont-compile-hpp
...
Filter out .hpp files from *_BUILD_SOURCES (like we do with .h files)…
2018-01-12 14:09:25 -08:00
Evan Tschannen
02bd83ff76
changed incompatibleDataRead to an asyncTrigger
2018-01-11 13:35:56 -08:00
A.J. Beamon
80b84c23ac
Filter out .hpp files from *_BUILD_SOURCES (like we do with .h files). Add xml2json.hpp to our fdbrpc project.
2018-01-10 13:51:57 -08:00
A.J. Beamon
ce93d98b50
Temporarily remove xml2json.hpp from fdbrpc vcxproj
2018-01-10 10:18:44 -08:00
A.J. Beamon
2f5073d00f
Some visual studio project cleanup.
2018-01-10 10:07:18 -08:00
Stephen Atherton
0e7d538c94
Bug fix, in recursive blob folder listings the recent removal of common prefixes from the result stream caused the list marker to not be set correctly when a folder level requires multiple requests due to folder size.
2018-01-06 20:58:48 -08:00
Evan Tschannen
3ec45d38a0
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Stephen Atherton
96cb06cbc7
Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations.
2018-01-05 23:06:39 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Stephen Atherton
78430425e8
Blob bucket listings will now use parallel recursive requests on CommonPrefixes, up to a max depth, if a delimiter is provided.
2018-01-02 23:17:52 -08:00
Stephen Atherton
07fde9dfb4
Bug fix, error code 429 was not being treated as retryable in the recent refactor.
2018-01-02 23:15:25 -08:00
Stephen Atherton
f324afc13f
Bug fix in blob store listing when it requires multiple serial requests Added more trace events to FileBackup and BlobStoreEndpoint with suppression and added suppression to existing trace events.
2017-12-22 17:08:25 -08:00
Stephen Atherton
f2524ffd33
AsyncFileBlobStoreWrite was prohibiting the writing of 0-byte files. Improved HTTP verbose logging to stdout. Added writing a 0-byte file to BackupContainer unit test. Added backup log and snapshot sizes to backup description.
2017-12-21 21:15:26 -08:00
Stephen Atherton
e0ef5a9a20
Whitespace normalization.
2017-12-21 12:07:29 -08:00
Stephen Atherton
e3aee45a74
Backup tools and agent now accept blob account credentials via files containing JSON which are specified using command line arguments and/or an environment variable. Improved fdbbackup help, clarifying which options are for which operations. Fdbbackup operations which do not need to use a database no longer require a cluster file parameter. Added eat() commands to StringRef for incrementally tokenizing strings using separator strings.
2017-12-21 01:58:15 -08:00
Stephen Atherton
e0d9cea008
Merge branch 'master' into continuous-backup
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
# fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Alex Miller
9a0df6d76d
Deallocate aligned_alloc with aligned_free.
...
This probably fixes a windows-only crash, as only windows cares about this distinction.
2017-12-14 15:12:05 -08:00
Stephen Atherton
b6cfe010a1
Bug fix in URL encoding of delimiter.
2017-12-12 17:31:19 -08:00
Stephen Atherton
872edd7540
Merge branch 'release-5.0'
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2017-12-06 16:27:04 -08:00
Stephen Atherton
41f80bf7ed
Renamed an error, changed blob request failure to Warn severity.
2017-12-06 15:58:54 -08:00
Stephen Atherton
4bc7d0b86a
Updated error names and severities.
2017-12-06 15:42:44 -08:00
Stephen Atherton
abb2dd1ebc
Merge pull request #214 from cie/alexmiller/fallocate
...
Use fallocate to zero ranges instead of writing zeroes
2017-12-06 13:47:40 -08:00
Alex Miller
064670a95b
Maintain a reference to the IAsyncFile in zeroRange.
...
And also add some notes about the reference semantics to the IAsyncFile header
for future readers.
2017-12-06 13:41:21 -08:00
Balachandar Namasivayam
1f949240f5
Make fdbbackup s3 compatible.
...
s3 sends response in XML. FDB backup expects json response. Added a new libraray xml2json to convert xml to json.
2017-12-05 17:13:15 -08:00
Stephen Atherton
86ae6c09c7
Bug fixes, take(1) is incorrect usage of FlowLock.
2017-12-04 10:20:50 -08:00
Evan Tschannen
482ac38ca6
added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs
2017-12-01 13:04:32 -08:00
Alex Miller
7bab3a4ece
AsyncFileKAIO will prefer using fallocate's ZERO_RANGE for AsyncFile::zero().
...
For situations in which we have support for FALLOC_FL_ZERO_RANGE, it's much
faster to use fallocate than manually overwrite the file with zero bytes. Note
that this support depends on having a kernel from late 2014 or newer, and being
on ext4 or xfs. If these conditions aren't met, we'll fall back to writing
zeros in 1MB chunks as normal.
2017-11-30 17:57:55 -08:00
Alex Miller
196258080b
Refactor zeroing a chunk of a file from DiskQueue into IAsyncFile.
...
If we're going to do the work to provide more optimized ways to zero files,
then I'd feel better with this being in a more common place, so that any other
zero-ers are likely to reuse it. It also makes testing easier/more obvious.
Also, because it's needed for correctness, fix the aligned_alloc for OSX, which
wasn't aligned, and use an actually aligned allocation function.
2017-11-30 17:57:55 -08:00
Alex Miller
c7a120c59d
Rename IAsyncFile::incrementalDelete -> IAsyncFileSystem::incrementalDeleteFile.
...
`deleteFile` existed in IAsyncFileSystem, so an incremental delete function
seems to belong more as a virtual method on IAsyncFileSystem than a static
method on IAsyncFile, and the naming should match.
As long as we're here, change IAsyncFile to declare a virtual destructor, so
that it has good and proper C++ behavior. I presume this is what was vaguely
intended by the default constructor definition that previously existed?
2017-11-30 17:19:10 -08:00
Stephen Atherton
1e643239f9
Improvement in blob connnection reuse, oldest connnections in pool are now used first.
2017-11-30 12:57:29 -08:00
Stephen Atherton
1b1c8e985a
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2017-11-25 19:54:51 -08:00
Alex Miller
f19cb3bbbd
Merge pull request #208 from cie/alexmiller/grvtfix
...
Fix the GRV performance regression
2017-11-17 15:00:44 -08:00
Alex Miller
e9412bbb11
Fix the GRV performance regression introduced by adding the policy engine to GRV calculations.
...
Construction of LocalityGroup from LocalityData is expensive, and the previous
code greatly ran afoul of that. The policy engine does a large amount of
interning of strings and building compressed maps to make the expected many
future selectReplica calls cheap. Unfortunately we don't call selectReplicas,
so much of this work is undesireable for us, and a large amount of CPU time is
spent doing this initialization work.
The new changes aggressively do the minimal LocalityGroup::add() calls
necessary, and make them as cheap as possibly by removing all elements from
LocalityData that don't need to be considered by the policy.
This optimization was also applied to the PeekCursor used during recovery,
which should speed recoveries up by a small amount.
2017-11-16 16:15:52 -08:00
Stephen Atherton
a77162b53d
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/BackupAgent.h
# fdbclient/FileBackupAgent.actor.cpp
# fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton
e07dcb9ada
Fixed header paths.
2017-11-15 00:05:20 -08:00
Stephen Atherton
3dfaf13b67
IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
...
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively().
Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
Balachandar Namasivayam
987379d790
Changed naming of num_incompatible_connections to numIncompatibleConnections
2017-11-14 18:37:29 -08:00
Balachandar Namasivayam
27b67cffbe
The earlier implementation of tracking number of incompatible connection had a bug where the counter will be incorrectly decremented for incoming connections on certain conditions.
...
Now the counter increment and decrement happens in the same ACTOR (ConnecitonReader) and makes it easy to verify its correctness.
2017-11-13 15:07:39 -08:00
Balachandar Namasivayam
9809e84806
Added a counter to keep track of active outgoing incompatible connections.
...
This counter is used to print a warning in fdbcli if there are incompatible peers.
Example Output:
./fdbcli
Using cluster file `fdb.cluster'.
WARNING: Incompatible peers exist.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
fdb> status
WARNING: Incompatible peers exist.
Using cluster file `fdb.cluster'.
Could not communicate with a quorum of coordination servers:
127.0.0.1:4000 (unreachable)
2017-11-09 11:20:35 -08:00
Evan Tschannen
57aba0b3bc
fix: excluded servers were the same fitness as storage servers for the master role
...
fix: better master exists did not considers exclusion for master fitness
2017-11-03 17:09:14 -07:00
John Brownlee
d46e240de2
Merge branch 'release-5.0'
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
# versions.target
2017-11-02 10:42:30 -07:00
Stephen Atherton
f050105243
Added HTTP 502 to the list of retryable errors.
2017-11-01 11:41:32 -07:00
Alex Miller
3b61b76876
Fix a massive amount of valgrind errors and make them easier to debug in the future.
...
std::is_pod<> being less restrictive than is_binary_serializable<> meant that
structs that both were POD and had a serialize method defined would be binary
serialized instead of using the defined serialize(). This means that it would
also serialize any padding that the struct contained, which would cause mass
waves of valgrind failures from uninitialized memory.
Included in this change is additional uses of valgrind client requests so that
attempts to send uninitialized memory are reported at the sending site, versus
as part of checksum calculation in sending the packet.
2017-10-27 16:54:44 -07:00
Evan Tschannen
df74e2a373
re-added support for non-copying tlog recovery
2017-10-24 15:09:31 -07:00
Stephen Atherton
45fa3680fa
Restore logging of remote address (if connected) or host (if connection fails) for blob errors.
2017-10-20 21:47:23 -07:00
Stephen Atherton
3afc85881e
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbrpc/BlobStore.actor.cpp
2017-10-20 21:38:28 -07:00
Stephen Atherton
42955012e9
Merge branch 'release-5.0'
...
# Conflicts:
# fdbrpc/BlobStore.actor.cpp
# flow/error_definitions.h
2017-10-20 21:16:55 -07:00
Stephen Atherton
9f151314b3
Changed some trace event severities. Also fixed a weird casing of “retryable”.
2017-10-19 17:47:42 -07:00
Evan Tschannen
e2c1e87df6
made a large number of fixes to make fearless DR correctness clean.
2017-10-19 15:36:32 -07:00
Stephen Atherton
caad691ae2
Added comments for how to handle HTTP 400 errors gracefully in certain instances should the need arise.
2017-10-18 23:47:59 -07:00
Stephen Atherton
ef84e52127
Improved error handling and memory usage in AsyncFileBlobStoreWrite. Writes will now fail if any upload has already failed, rather than buffering unboundedly until sync() is called to complete the file. There is also a configurable limit on how many uploads can be pending before writes will stall waiting for one to finish.
2017-10-18 05:51:30 -07:00
Stephen Atherton
ebd0234514
Rewrote most error handling in BlobStoreEndpoint to fix several shortcomings in error handling and logging. The request loop now logs but rate limits all errors, and the exceptions thrown are more appropriate. HTTP 503 is now treated as retryable. Callers of BlobStoreEndpoint::doRequest() now specify which codes they consider to be successful so that more error handling can take place in the main request loop.
2017-10-18 02:52:09 -07:00
Alex Miller
7b9bc1d715
Merge pull request #170 from cie/alexmiller/flowprofile
...
Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.
2017-10-16 16:51:53 -07:00
Alex Miller
cf646d4a99
Address review comments.
...
* Fixed fdbcli to be more idiomatic.
* Removed is_binary_serializable in favor of std::is_pod<>
* Removed custom enable_if<> in favor of std::enable_if<>
* Removed HEY REVIEWER comments
* Removed print from prof.py
* Added FLOW_PROFILER_ENABLED=yes to circus components that wished to enable the flow profiler.
2017-10-16 16:46:52 -07:00
Yichi Chiang
a6ae89af1a
Merge pull request #176 from cie/add-cluster-controller-process-class
...
Add cluster controller process class
2017-10-16 16:27:54 -07:00
Yichi Chiang
af2aa41136
Downgrade Transaction process class for cluster controller
2017-10-16 16:27:01 -07:00
Yichi Chiang
76c5488421
Add cluster controller process class
2017-10-16 16:21:25 -07:00
Stephen Atherton
e934604f67
Added DNS resolution. Interface is INetworkConnections::resolveTCPEndpoint() to resolve, or for convenience INetworkConnections::connect(host, service) will resolve host and service (port number or service name like http) and connect to one of the addresses at random.
...
BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.
2017-10-15 21:51:11 -07:00
Evan Tschannen
ff1b49be2e
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DatabaseConfiguration.cpp
2017-10-10 16:07:59 -07:00
Evan Tschannen
15962cf079
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbrpc/Locality.cpp
# fdbrpc/Locality.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/ClusterRecruitmentInterface.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/fdbserver.vcxproj.filters
# fdbserver/masterserver.actor.cpp
# fdbserver/worker.actor.cpp
# flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Alvin Moore
de8f875038
Fixed call to IsClear
...
Changed killMachine and killDataCenter interface to return final killtype
Updated TESTs for DataCenter to ensure that DataCenter was killed
Added assertion to ensure that failed DC kills were not downgrades
2017-10-05 03:07:20 -07:00
Stephen Atherton
fd5fe3a000
Add slightly better handling of HTTP 503 in blob client. Previously it would end the blob request loop and the task doing the blob action would see a failure, but now the blob request attempt loop will continue to back off and retry. This is better because previously the task that saw the failure would be re-run quickly.
2017-10-03 15:25:49 -07:00
Stephen Atherton
03c4cea511
Added rate-controlled TraceEvents for blob http connection attempts and failures.
2017-10-03 15:21:40 -07:00
Yichi Chiang
284e35204a
Fix connection count
2017-10-03 10:54:20 -07:00
Alvin Moore
5257b99d3f
Fixed problem with machines RebootedAndCleared not being considered dead in availability consideration
2017-10-03 10:48:16 -07:00
Alvin Moore
d099656557
Merge branch 'release-5.0'
2017-10-02 12:05:24 -07:00
Alvin Moore
25513d8e2c
Added tests for DataCenter kills
2017-10-02 12:04:28 -07:00
Evan Tschannen
6ea9903c82
Merge branch 'release-5.0'
...
# Conflicts:
# fdbbackup/backup.actor.cpp
# fdbserver/ClusterController.actor.cpp
# versions.target
2017-10-01 18:46:44 -07:00
Stephen Atherton
058300be16
Each blobstore request will again select a random remote address. This used to happen before recent load balancing improvements related to focusing too much load on consistently up endpoints after others have recovered from being down.
2017-10-01 16:17:38 -07:00
Stephen Atherton
a95107417f
Improved behavior of slow writes during backup. KeyRange and Log backup tasks now use TaskBucket::saveAndExtend() to keep the task alive until flushing the file finishes or fails with an error (blob uploads fail after a limited number of retries). This prevents blob uploads from being retried too often if the destination is slow since a task abort and retry would start the backoff counters back at zero. Also removed a debugging behavior that was accidentally checked in.
2017-10-01 16:01:24 -07:00
Stephen Atherton
a098919b20
Bug fix, releaser declared in wrong place, and lots of whitespace cleanup from try blocks that were no longer needed.
2017-10-01 11:25:50 -07:00
Stephen Atherton
af87ac301d
Removed wait never used for debugging which was accidentally included in bug fix.
2017-10-01 11:19:38 -07:00
Stephen Atherton
6000cafde1
Bug fix, locks were being taken inside try/catch so release would be done even if the take threw an error. Changed to using a Releaser.
2017-10-01 10:46:55 -07:00
Evan Tschannen
f84e7252e8
fix: there was a reference counting cycle in asyncFileBlobStore and asyncFileReadAhead
2017-09-29 19:13:08 -07:00
A.J. Beamon
38616424f6
Report a couple error cases in blobstore URL parsing when dealing with numbers.
2017-09-29 17:58:49 -07:00
Alex Miller
c40c1bb5fe
Add a new workload: BackupToDBAbort, which does an ACI switchover.
...
This is to allower easier testing of non-durable switchovers without having to
wiggle into BackupToDBCorrectness's view of the world.
2017-09-29 15:58:36 -07:00
Evan Tschannen
a1f8b546e6
fix: ensure connections to blob store are evenly distributed across network addresses
...
added a per address limit to the number of open connections
lowered a variety of knobs to prevent us from using too much memory
2017-09-29 14:59:24 -07:00
A.J. Beamon
d30c730f75
Add the ability to access name and description in Error. Update error descriptions.
2017-09-28 12:35:03 -07:00
Alvin Moore
298b54104e
Merge branch 'release-5.0'
2017-09-26 11:16:14 -07:00
Alvin Moore
02525d7b14
Added TESTs to ensure that all of the different kills are performed during simulation
2017-09-26 11:15:39 -07:00
Stephen Atherton
1ca9814879
Bug (arguable, perhaps) fix in AsyncFileCached. Order was not being enforced between writes and truncates such that calling and waiting on a truncate to X and then writing to X + 1 could end up writing first and then truncating the written page off of the file.
2017-09-20 17:58:56 -07:00
Evan Tschannen
e8b895c878
added the ability to disable connection failures for a period of time after one happens
2017-09-18 12:46:29 -07:00
Evan Tschannen
8cb53fd608
Merge pull request #149 from cie/choose-leader-on-stateless-processes
...
choose leader on the perferred process class
2017-09-13 13:58:49 -07:00
Alvin Moore
b1dd2ac6fe
Merge branch 'release-5.0'
2017-09-12 13:34:28 -07:00
Alvin Moore
4a6fb10a42
Added TraceEvents for remaining and killed workers when killing DataCenter
...
Fixed consideration of excluded workers when checking cluster availability
2017-09-12 13:33:13 -07:00
Evan Tschannen
76e7988663
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.h
# flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Evan Tschannen
ea26bc1c43
passed first tests which kill entire datacenters
...
added configuration options for the remote data center and satellite data centers
updated cluster controller recruitment logic
refactors how master writes core state
updated log recovery, and log system peeking
2017-09-07 15:32:08 -07:00
Evan Tschannen
6f6dbe4b33
fix: load balance will still use second requests when client locality is present
2017-09-01 11:14:18 -07:00
Alvin Moore
0994587573
Fixed OS X compilation build warnings due to printf field specifiers
2017-09-01 09:35:56 -07:00
Alvin Moore
fd439e9d1c
Fixed OS X compilation build warnings due to printf field type specifiers
2017-09-01 09:34:53 -07:00
Stephen Atherton
6e9de8f35a
Bug fix. eraseDirectoryRecursive() on MacOS used to do nothing at all, but now it erases directories recursively. The Linux version was modified to be simpler and use a version of the FTW API that also works on MacOS.
2017-08-31 00:11:18 -07:00
A.J. Beamon
9a0a3b6329
Merge commit '66528becb82d826e81fa644bb378212584ab580e'
2017-08-28 16:47:59 -07:00
Yichi Chiang
9fe927127f
choose leader on the perferred process class
2017-08-28 14:41:04 -07:00
Alvin Moore
44e0df78c5
Added support for tracking roles for simulation workers
...
Fixed the exclusion and inclusion address simulation API and integration within workloads
Added more information within trace events for simulation
2017-08-28 11:25:37 -07:00
Alec Grieser
300b5a17ed
Merge branch 'release-5.0'
2017-08-25 18:55:33 -07:00
Evan Tschannen
272b4b984c
fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine
2017-08-25 10:12:58 -07:00
A.J. Beamon
45c0585891
Merge branch 'release-5.0'
2017-08-24 14:48:47 -07:00
Alvin Moore
0c1be7537c
Fixed OSX compilation warning about printf field value specification
2017-08-24 12:30:38 -07:00
Alec Grieser
2b678f6e91
Merge remote-tracking branch 'origin/release-5.0'
2017-08-23 10:24:23 -07:00
Alvin Moore
17c6392295
Added support for printing out information on the current simulation workers
2017-08-22 16:56:33 -07:00
A.J. Beamon
41c90bcdea
Merge commit '89ac94853c70d08289e7fb58055bc5d0cd4e494d'
2017-07-26 15:35:36 -07:00
A.J. Beamon
311d0e3815
Remove outdated comment from incrementalDelete function.
2017-07-26 15:27:37 -07:00
A.J. Beamon
d8acb11200
Remove the change that waits only for unlinking; call delete on the file even if it doesn't exist.
2017-07-26 15:25:49 -07:00
A.J. Beamon
d8e308c18f
Enable use of incremental delete when deleting disk queue and sqlite KVS sqlite files.
2017-07-26 14:11:11 -07:00
Evan Tschannen
64e9560599
Merge pull request #128 from cie/maintain-incompatible-connections
...
Maintain incompatible connections
2017-07-17 16:28:22 -07:00
A.J. Beamon
2113d47db6
Update protocol version for incompatible connection change
2017-07-17 16:16:05 -07:00
A.J. Beamon
23c2946fa3
Rename some trace events surrounding connections
2017-07-17 16:15:18 -07:00
A.J. Beamon
591d98f711
Update the incompatible version behavior change protocol version check and add a note that we'll need to appropriately set the version at merge time.
2017-07-17 11:00:45 -07:00
A.J. Beamon
650c6ff399
Merge branch 'release-5.0' into maintain-incompatible-connections
2017-07-17 10:40:36 -07:00
A.J. Beamon
9493f8f78c
Merge branch 'release-5.0'
2017-07-17 09:34:37 -07:00
A.J. Beamon
a7fbc56a8e
Checksums computed on pages with partially undefined contents are still valid, so mark them as such for valgrind purposes.
2017-07-17 09:34:04 -07:00
Alec Grieser
f75b6f333b
Merge branch 'release-5.0'
2017-07-13 11:21:18 -07:00
Stephen Atherton
39ff1b3c52
Bug fix, when io_timeouts are enabled in warn only mode they weren’t being logged at all.
2017-07-05 14:43:10 -07:00
Stephen Atherton
1b1a0d27e2
Merge branch 'release-5.0'
...
# Conflicts:
# versions.target
2017-06-29 15:58:04 -07:00
Stephen Atherton
028fb75f88
Added last write timestamp to lost write detector class. Renamed TraceEvent for lost writes detected since it is no longer part of the KAIO class specifically.
2017-06-29 15:11:11 -07:00
Alec Grieser
9bcdfe4ddb
removed undefined behavior surrounding TLS logging
2017-06-28 14:23:53 -07:00
Alec Grieser
94bce335e7
Merge branch 'release-5.0'
2017-06-19 17:51:10 -07:00
Alvin Moore
6d19580789
Merge branch 'release-5.0' of github.com:apple/foundationdb into release-5.0
...
# Conflicts:
# fdbrpc/simulator.h
2017-06-19 17:39:37 -07:00
Alvin Moore
9553458b78
Updated simulation to support managing exclusion and inclusion address
...
Added method for identifying acceptable availability process classes
Extended cluster availability function to ensure coordinators can be auto configured
Fixed availability function to allow protected processes to be considered as dead if not available
Added debug trace events for providing machine state when considering availability
Added trace event for protected coordinators
2017-06-19 16:48:15 -07:00
Stephen Atherton
5d13d845a7
Merge branch 'release-5.0'
2017-06-18 23:25:29 -07:00
Stephen Atherton
0e638e7ea2
Merge branch 'release-4.6' into release-5.0
2017-06-18 23:25:17 -07:00
Stephen Atherton
6d9e302487
Merge branch 'release-5.0'
2017-06-16 02:14:34 -07:00
Stephen Atherton
430bb6224e
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/Net2FileSystem.cpp
# fdbrpc/sim2.actor.cpp
2017-06-16 02:14:19 -07:00
Stephen Atherton
1c94e30e64
Merge branch 'release-5.0'
2017-06-15 17:40:40 -07:00
Stephen Atherton
f405c8d88e
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/optimisttest.actor.cpp
# versions.target
2017-06-15 17:40:19 -07:00
Evan Tschannen
cdd64ebc15
fix: asyncFileNonDurable could never complete deleting a file in rare situations
2017-06-15 13:30:15 -07:00
Evan Tschannen
afdc125db9
Merge branch 'release-5.0'
2017-06-14 16:44:23 -07:00
Evan Tschannen
4bdcd8fc12
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# bindings/bindingtester/run_binding_tester.sh
# fdbrpc/AsyncFileKAIO.actor.h
2017-06-14 16:43:53 -07:00
Yichi Chiang
02ee6d8cd1
Change checksum enabled condition
2017-06-13 11:03:25 -07:00
Stephen Atherton
e318aabe55
Merge branch 'release-5.0'
2017-05-31 17:10:48 -07:00
Stephen Atherton
fa4fdb1f1d
Merge branch 'fix-io-timeout-handling' into release-5.0
...
# Conflicts:
# fdbserver/optimisttest.actor.cpp
2017-05-31 17:03:15 -07:00
Yichi Chiang
41d9bce2d7
Merge pull request #115 from cie/checksum-off-with-tls
...
Disable checksum when TLS is enabled
2017-05-30 11:43:53 -07:00
Stephen Atherton
98604d33a0
Merge branch 'fix-io-timeout-handling'
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/KeyValueStoreSQLite.actor.cpp
# fdbserver/optimisttest.actor.cpp
# fdbserver/worker.actor.cpp
# fdbserver/workloads/MachineAttrition.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
Stephen Atherton
7260e38545
Merge branch 'fix-io-timeout-handling'
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/KeyValueStoreSQLite.actor.cpp
# fdbserver/optimisttest.actor.cpp
# fdbserver/worker.actor.cpp
# fdbserver/workloads/MachineAttrition.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 17:43:28 -07:00
Yichi Chiang
d2ad46680c
Disable checksum when TLS is enabled
2017-05-26 15:34:40 -07:00
Alvin Moore
b28ed397a2
Fixed printf field width specifier to reduce compilation warnings within OS X
2017-05-26 14:51:34 -07:00
Alvin Moore
0b9ed67e12
Fixed support for RemoveServers Workload
...
Added availability functions to simulation
2017-05-26 14:20:11 -07:00
Alvin Moore
16cc0821b1
Removed dead machine option from simulation
2017-05-25 16:29:02 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00