Commit Graph

49 Commits

Author SHA1 Message Date
Evan Tschannen 963f4f3354 failure emergency delay could be triggered more easily than originally thought, so reduce the knob values so it waits 30 seconds after 10 recoveries 2018-10-05 13:25:54 -07:00
Evan Tschannen dcdbb3ec4d Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-movekey-fixes 2018-09-05 10:29:13 -07:00
Evan Tschannen e60c668853 The cluster controller will increase its failure monitoring delay after there have been many unfinishedRecoveries 2018-08-31 10:51:55 -07:00
A.J. Beamon ac336c632a WATCH_TIMEOUT knob was always buggified 2018-08-22 10:40:55 -07:00
Evan Tschannen 4c7001571b increased the default list request rate to speed up backup expire 2018-08-09 18:43:02 -07:00
Evan Tschannen dd72379363 reduced the failure detection times 2018-06-27 20:41:18 -07:00
Evan Tschannen 1dce97f28c Merge branch 'release-5.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/SimulatedCluster.actor.cpp
#	packaging/msi/FDBInstaller.wxs
#	versions.target
2018-06-21 17:05:11 -07:00
Stephen Atherton d9f3eb05a2 Change default delete operations per second. Updated release notes. 2018-06-21 11:13:31 -07:00
Stephen Atherton e9e1e194f0 Added operation-specific rate controls to blob store interface. 2018-06-20 20:34:34 -07:00
Steve Atherton 731c3b38e8
Merge pull request #481 from satherton/release-5.2
Reduce backup parallel tasks to decrease memory usage.
2018-06-12 13:16:24 -07:00
Steve Atherton 0481928bd8
Reduce backup parallel tasks to decrease memory usage. 2018-06-12 13:15:24 -07:00
Balachandar Namasivayam 529d0497f1 Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload.
Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.
2018-06-01 15:21:40 -07:00
Evan Tschannen 20f61f5cf8 fix: increased the buggified restore batch size to prevent tests from taking too long 2018-05-11 13:26:32 -07:00
yichic ede5cab192
Merge pull request #89 from yichic/share-log-mutations-5.2
Share log mutations 5.2
2018-03-19 12:01:26 -07:00
Yichi Chiang d6559b144f Share log mutations between backups and DRs which have the same backup range 2018-03-19 11:32:50 -07:00
Yichi Chiang 26b93ff920 Share log mutations between backups and DRs which have the same backup range 2018-03-16 18:09:23 -07:00
Evan Tschannen 91bb8faa45 Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
Stephen Atherton cb68885328 If backup expiration determines that force is required but the force parameter is not set, it will no longer throw an error unless the backup contains data from prior to the expire_before_version. 2018-03-08 11:27:15 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen 79d94214a4 Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs 2018-01-25 10:12:06 -08:00
Stephen Atherton 4dec5423f7 Optimization in backup snapshot dispatching. If the next dispatch version is not ahead of the recently known current version then do not set a scheduled time for new range tasks in order to avoid the overhead and delay of task timeout handling. Adjusted task timeout knobs to avoid large transaction warnings. Removed backtrace() from LargeTransaction trace event. Tweaked suppression in backup trace events. 2018-01-24 17:40:02 -08:00
Stephen Atherton 1f59f9ee5b Reduce restore dispatch transaction size. 2018-01-22 15:04:14 -08:00
Evan Tschannen 698ef4117e Merge branch 'master' into feature-remote-logs 2018-01-20 10:34:30 -08:00
Stephen Atherton 897ff6f676 Added new knob for how many tasks to add per transaction in backup dispatch, instead of using the value for restore which has much lower overhead per task. 2018-01-16 10:45:21 -08:00
Evan Tschannen 21482a45e1 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DBCoreState.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen 721a891d1f fix: never request more than 100 shards from the proxy at a time to avoid large packets 2018-01-12 10:51:53 -08:00
Evan Tschannen 022df3b91b backup and restore sometimes took too long in simulation 2018-01-09 17:26:42 -08:00
Evan Tschannen 3ec45d38a0 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Stephen Atherton 96cb06cbc7 Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations. 2018-01-05 23:06:39 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Stephen Atherton fd3f3aa647 Increased system key size limit to fix some rare backup use cases. 2018-01-03 12:05:12 -08:00
Stephen Atherton 7caa012fbf Added snapshot interval option to "fdbbackup start" which defaults to a new knob's value. Added snapshot info to backup status text. Improvements to fdbbackup help. 2017-12-20 00:49:08 -08:00
Stephen Atherton e0d9cea008 Merge branch 'master' into continuous-backup
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Stephen Atherton 33f9f1a95c Added SnapshotDispatch task for writing snapshots in random order over a specified period of time and adapting speed to a growing or shrinking database. TaskBucket now supports scheduling tasks. TaskFuture now correctly recognizes multiple tasks in its callback space. TaskBucket extendTimeout() now supports specifying the new timeout version. Submitting a backup now requires a snapshot duration. 2017-12-14 01:44:38 -08:00
Evan Tschannen 482ac38ca6 added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs 2017-12-01 13:04:32 -08:00
Stephen Atherton a77162b53d Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbclient/BackupAgent.h
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton 3dfaf13b67 IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup.  The addition of IBackupFile and its finish() method simplified the log and range writer tasks.  Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes.  Added KeyBackedSet<T> type.  Moved JSONDoc to its own header.  Added platform::findFilesRecursively().

Still to do:  update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
A.J. Beamon cd085764f1 Do not automatically change a cluster file that does not match what you expect. 2017-11-10 14:12:45 -08:00
Stephen Atherton 3afc85881e Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbrpc/BlobStore.actor.cpp
2017-10-20 21:38:28 -07:00
Stephen Atherton 42955012e9 Merge branch 'release-5.0'
# Conflicts:
#	fdbrpc/BlobStore.actor.cpp
#	flow/error_definitions.h
2017-10-20 21:16:55 -07:00
Evan Tschannen e2c1e87df6 made a large number of fixes to make fearless DR correctness clean. 2017-10-19 15:36:32 -07:00
Stephen Atherton ef84e52127 Improved error handling and memory usage in AsyncFileBlobStoreWrite. Writes will now fail if any upload has already failed, rather than buffering unboundedly until sync() is called to complete the file. There is also a configurable limit on how many uploads can be pending before writes will stall waiting for one to finish. 2017-10-18 05:51:30 -07:00
Stephen Atherton e934604f67 Added DNS resolution. Interface is INetworkConnections::resolveTCPEndpoint() to resolve, or for convenience INetworkConnections::connect(host, service) will resolve host and service (port number or service name like http) and connect to one of the addresses at random.
BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.
2017-10-15 21:51:11 -07:00
Evan Tschannen 6ea9903c82 Merge branch 'release-5.0'
# Conflicts:
#	fdbbackup/backup.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	versions.target
2017-10-01 18:46:44 -07:00
Stephen Atherton ad9de674ac Knob change, blob requests should be allowed more time. 2017-10-01 16:45:47 -07:00
Evan Tschannen a1f8b546e6 fix: ensure connections to blob store are evenly distributed across network addresses
added a per address limit to the number of open connections
lowered a variety of knobs to prevent us from using too much memory
2017-09-29 14:59:24 -07:00
Evan Tschannen d67e017bcc reduced reply_byte_limit to 80k 2017-09-15 11:01:56 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00