Stephen Atherton
a9a9590058
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-02-16 19:44:36 -08:00
Stephen Atherton
54fc81b260
Improved backup error reporting in backup status. The most recent error for each error type is reported along with how long ago the error occurred, and errors are divided into two categories based on whether or not they occurred since the most recent backup progress.
2018-02-16 19:38:31 -08:00
Evan Tschannen
8c53483838
fix: log ranges were not being cleared correctly
2018-02-16 10:27:10 -08:00
Stephen Atherton
d8879dc3f3
HTTP::doRequest() now reads responses in parallel with sending requests, so if the server responds before receiving all of the the request the client can stop sending the remainder of the request. For PUT requests which upload files, this prevents sending potentially several megabytes of unnecessary bytes if the server responds with an error (such as 429) before the request is completely sent. Updated the backup container unit test to use more parallelism in order to test this new behavior.
2018-02-07 10:38:31 -08:00
Stephen Atherton
0792d5e3dd
Fix: last restorable version for a backup tag name (a separate value from the latest restorable version for a configured backup) was not being updated.
...
Fix: backup blob speed was sometimes an error because the JSON $sum merge operator did not support mixed numeric types.
Fix: JSON merge operator handling was squashing errors in some cases, which was generally obscuring the backup speed metric issue.
Cleaned up some of the JSON object merging logic.
Improved error messages in JSON merge operators. Added JSON merge operator tests for mixed numeric math and improved readability of test output.
2018-02-06 13:44:04 -08:00
A.J. Beamon
080a454051
fix: getVersionstamp would return broken promise if a transaction was disposed before being set. getAddressesForKey would not return when resetPromise was set.
2018-01-31 13:47:36 -08:00
A.J. Beamon
84f7565d04
MultiVersionApi::createCluster was throwing network_not_setup rather than returning it.
2018-01-31 10:03:09 -08:00
Stephen Atherton
2f291d8955
Bug fix in blob backup container deletion. The list/delete loop could end before deleting all of the files, but the index entry would still be deleted. Also preemptively made the same code change in listBucket() - Although it is technically correct as written it is a dangerous style because it is not obvious that the addition of a wait() call in the second 'when' block would create a bug. Consolidated deleteContainer() and deleteBucket() as they differ by only 1 line.
2018-01-29 00:32:41 -08:00
Stephen Atherton
4dec5423f7
Optimization in backup snapshot dispatching. If the next dispatch version is not ahead of the recently known current version then do not set a scheduled time for new range tasks in order to avoid the overhead and delay of task timeout handling. Adjusted task timeout knobs to avoid large transaction warnings. Removed backtrace() from LargeTransaction trace event. Tweaked suppression in backup trace events.
2018-01-24 17:40:02 -08:00
Stephen Atherton
95d4e5520b
Added TraceEvent.
2018-01-24 12:44:37 -08:00
Stephen Atherton
aebbe1dcfd
Changed core_versionspersecond knob to int64_t type to avoid integer overflow. Cleaned up backup TraceEvent suppression. Added backtrace to LargeTransaction TraceEvent to make it easier to find the source of large commits in applications using NativeAPI directly.
2018-01-24 11:59:37 -08:00
Stephen Atherton
83409fb067
Bug fix, versionFolderString() was not reducing the precision of the number in the output string. Not technically a 'bug' as the scheme will still work but produces an overly deep and sparse folder structure.
2018-01-24 10:29:37 -08:00
Stephen Atherton
40d38880fe
Changed version-based folder naming scheme to something simpler, a fixed width 0-padded 19 digit number (the longest a Version can be) with /'s inserted to limit the size of each folder level. Comparisons using these folder names ignore the /'s so any future change to the splitting scheme would still be compatible with the current listing/reading logic.
2018-01-23 15:02:15 -08:00
Stephen Atherton
7db7a51440
Changes to backup folder structure in BackupContainerBlobStore at the top level inside the backup bucket. All data files now live under data/<backup_name> and there is an 'index' at backups/<backup_name> which indicates what backups exist. Backup names can now contain '/' characters.
2018-01-23 11:46:16 -08:00
Stephen Atherton
7f0b7311b9
Corrected function name to timeKeeperVersionFromDatetime(). 'Fdbbackup expire' now allows an expire_before version of 0 if explicitly passed by version or by timestamp.
2018-01-23 00:19:51 -08:00
Stephen Atherton
51a1bd9327
Timekeeper lookup improvements, moved both function declartions to BackupContainer.h. VersionFromEpochs() now uses versions/sec to adjust the lookup result to improve accuracy. Conversions in both directions look for the latest record less than the target conversion value, but failing that they will now fall back on any available data point and adjust from there using versions/sec.
2018-01-22 23:57:01 -08:00
Stephen Atherton
f086ba9d9d
Improved version to timestamp lookup - if there are no older versioned records in the database then the next available record, if any, will be used to calculate a result.
2018-01-22 22:47:57 -08:00
Stephen Atherton
1f59f9ee5b
Reduce restore dispatch transaction size.
2018-01-22 15:04:14 -08:00
A.J. Beamon
16cd0c8f75
Change WaitStorageMetricsPenalty to not be SevWarnAlways unless it happens multiple times in a row (about 2.5 minutes with current knobs).
2018-01-19 16:41:15 -08:00
Evan Tschannen
0ca8b612c7
fix: do not set onDone until all addTaskFutures have had a chance to join that future
2018-01-19 16:29:42 -08:00
Stephen Atherton
9d5f7bd5ab
Aborting an old incompatible backup did not work if the old backup was never actually initialized, started, or was already aborted.
2018-01-19 15:08:03 -08:00
Stephen Atherton
6e96d3c30c
Bug fix, backup snapshots could take unexpectedly long if the desired snapshot interval is less than the configured snapshotDispatch interval.
2018-01-19 12:14:04 -08:00
Stephen Atherton
da02099c4c
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-01-19 11:02:48 -08:00
Stephen Atherton
fc16bb94ab
Discontinuing a backup that is already restorable now stops immediately and aborts (via validation key) any tasks scheduled to run later.
2018-01-19 11:02:43 -08:00
Evan Tschannen
570f72ba40
fix: nextDispatchVersion was being set too large if the snapshot interval was small
2018-01-19 10:53:58 -08:00
Stephen Atherton
b6dd06d945
Bug fix in version to timestamp conversion.
2018-01-18 02:54:12 -08:00
Stephen Atherton
307e04c0ad
Updated backup container unit test to match new safer behavior of expireData(). Rewrote BackupContainerLocalDirectory::deleteContainer() to actually delete the whole directory but only if it appears to be a backup with either log or snapshot data.
2018-01-18 00:36:28 -08:00
Stephen Atherton
cdd1e784dc
Added yields to writing backup snapshot manifests to avoid slow tasks.
2018-01-17 13:28:56 -08:00
Stephen Atherton
f6f0816bc1
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-01-17 12:12:12 -08:00
Stephen Atherton
8fece71662
Bug fix in backup metadata handling if logEnd becomes less than logBegin, which can happen if an expire is done without logEnd being updated.
2018-01-17 12:12:04 -08:00
Stephen Atherton
d7f8fe218a
Bug fix, resolving versions to timestamps for use in backup descriptions did not work on a locked database. Added some trace events to AtomicRestore to show progress.
2018-01-17 12:03:19 -08:00
A.J. Beamon
4bfbdbf454
Extract getLocalTime to platform.cpp
2018-01-17 11:35:34 -08:00
Stephen Atherton
a6fc30209e
Compile fix on windows, localtime_r is not available there.
2018-01-17 09:55:25 -08:00
Stephen Atherton
93b34a945f
Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup.
2018-01-17 04:09:43 -08:00
Stephen Atherton
f955547796
Bug fixes. Local var int i being declared over a state variable, and iterators being initialized incorrectly.
2018-01-16 11:41:49 -08:00
Stephen Atherton
897ff6f676
Added new knob for how many tasks to add per transaction in backup dispatch, instead of using the value for restore which has much lower overhead per task.
2018-01-16 10:45:21 -08:00
Stephen Atherton
02d72ca4b8
Added yields to CPU-heavy operations in FileBackup's snapshot range dispatcher.
2018-01-16 10:31:44 -08:00
Evan Tschannen
645dc5ead6
warmRange needs to get a read version occasionally to prevent it from overwhelming the proxy
...
quietDatabase waits for all data distribution to be completely finished so that databases are cached in a cleaner state
2018-01-14 12:50:52 -08:00
Evan Tschannen
660cee0254
increased the priority of getKeyServersLocations, because once a client gets a read version, answering their reads should be higher priority than starting new transactions
2018-01-12 13:46:20 -08:00
Evan Tschannen
721a891d1f
fix: never request more than 100 shards from the proxy at a time to avoid large packets
2018-01-12 10:51:53 -08:00
A.J. Beamon
2f5073d00f
Some visual studio project cleanup.
2018-01-10 10:07:18 -08:00
Evan Tschannen
022df3b91b
backup and restore sometimes took too long in simulation
2018-01-09 17:26:42 -08:00
Stephen Atherton
96cb06cbc7
Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations.
2018-01-05 23:06:39 -08:00
Stephen Atherton
236799c77f
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-01-05 14:38:06 -08:00
Alex Miller
f021934792
Fix yet another VersionStamp DR bug.
...
In this episode, we discover that having a transaction retry loop in which the
transaction conditionally has write conflict ranges is potentially troublesome.
To simplify the problem, if we have two concurrent transaction loops:
retry {
if (rand() > .5) tr->set('x', rand());
if (rand() > .5) tr->set('y', rand());
}
and
retry {
x = tr->get('x')
y = tr->get('y')
if (x > y) {
tr->set('y', x)
}
tr->commit();
}
Is not guaranteed that x > y in the database after the second transaction
commits. This is because it could read an older snapshot of x and y, in which
x was greater than y, and thus not invoke set. This means that `tr` is now a
read-only transaction, which no-ops out of committing as an "optimization". If
we add any write conflict range to `tr`, it then will conflict checked and
committed, which would guarantee that x>y when it commits.
Replace the first transaction with dumpData, and the second with version
upgrade transaction, and you have the bug that we're fixing, why, and how.
2018-01-05 14:23:11 -08:00
Stephen Atherton
cbeff0f789
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-01-05 14:13:27 -08:00
Stephen Atherton
2763713cbc
Bug fix, backup snapshot dispatch was calculating that all shards must be done immediately.
2018-01-05 14:12:00 -08:00
A.J. Beamon
30067d2f53
Whitespace fixes and removal of change to java's AbstractTester
2018-01-05 13:21:54 -08:00
A.J. Beamon
9f2e6bfbd1
Merge branch 'release-5.1' into vexillographer-binding-specific-disables
...
# Conflicts:
# fdbclient/vexillographer/fdb.options
2018-01-05 13:16:41 -08:00
A.J. Beamon
5015119115
Generalize the message that gets displayed in status if a cluster file's contents are incorrect.
2018-01-05 10:29:47 -08:00