Evan Tschannen
cf9d02cdbd
Merge pull request #48 from apple/release-5.2
...
Merge release-5.2 into master
2018-03-08 13:21:26 -08:00
A.J. Beamon
2c92ef8ff8
Merge pull request #47 from apple/release-5.1
...
Merge Release 5.1 into Release 5.2
2018-03-08 13:18:45 -08:00
A.J. Beamon
1aa32291f3
Fix line endings of backported changes.
2018-03-08 13:14:15 -08:00
Balachandar Namasivayam
bb8f48313f
Declare each state variable on its own line...
2018-03-08 12:43:44 -08:00
Balachandar Namasivayam
d136fc8eea
Improve fdbbackup status
...
Additional details added are
log bytes written
range bytes written
last snapshot version and its corresponding timestamp
last log version and its corresponding timestamp
If backup is supposed to stop at the next snapshot completion (stopWhenDone)
version/timestamp of snapshot start and expected end
Backup error reporting format changed to
Recent Errors (since latest restore point TIME ago)
<TIME> ago: <error description> on <task name>
Older Errors
<TIME> ago: <error description> on <task name>
TIME format:
<= 59 seconds ago
<= 59.99 minutes ago
<= 23.99 hours ago
N.NN days ago
2018-03-08 12:43:33 -08:00
Balachandar Namasivayam
4f58bca66a
Simple refactor of code...
2018-03-08 11:34:25 -08:00
Balachandar Namasivayam
1c1a497ea2
Refactor getKeyServers to be more readable.
...
Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers.
Simplify getMasterProxies on DatabaseContext class.
2018-03-08 11:34:18 -08:00
Stephen Atherton
cb68885328
If backup expiration determines that force is required but the force parameter is not set, it will no longer throw an error unless the backup contains data from prior to the expire_before_version.
2018-03-08 11:27:15 -08:00
Stephen Atherton
dcf5b2e35d
All readCommitted() functions now use Transaction instead of ReadYourWritesTransaction to reduce memory consumption in Backup and DR. Also removed one readCommitted() variant as it is just a special case of another definition.
2018-03-07 13:56:34 -08:00
Alec Grieser
2a2ac56529
Merge pull request #22 from alecgrieser/37844532-expose-append-if-fits
...
Expose APPEND_IF_FITS to clients
2018-03-06 16:31:36 -08:00
A.J. Beamon
f2c804e14f
Reverting changes from merge of master into release-5.2 ( b25810711c
). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit.
2018-03-06 10:15:04 -08:00
Evan Tschannen
1194e3a361
added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed.
2018-03-05 19:27:46 -08:00
Balachandar Namasivayam
aea1f7ba21
Add tests for Client Transaction Profiling correctness
2018-03-05 18:55:23 -08:00
A.J. Beamon
94e90454da
Fix line endings
2018-03-05 12:16:37 -08:00
A.J. Beamon
fd4c9d5b54
Fix line endings
2018-03-05 10:37:52 -08:00
A.J. Beamon
b25810711c
Merge branch 'master' into release-5.2
2018-03-05 10:32:57 -08:00
Alec Grieser
541dc1b136
improve verbiage
2018-03-02 17:44:14 -08:00
Alec Grieser
218b7a41e2
add APPEND_IF_FITS to workload and remove guard ; add command to vexillographer
2018-03-02 17:43:39 -08:00
Balachandar Namasivayam
066a9b7e55
Declare each state variable on its own line...
2018-03-02 16:30:13 -08:00
Balachandar Namasivayam
34715ec1fa
Improve fdbbackup status
...
Additional details added are
log bytes written
range bytes written
last snapshot version and its corresponding timestamp
last log version and its corresponding timestamp
If backup is supposed to stop at the next snapshot completion (stopWhenDone)
version/timestamp of snapshot start and expected end
Backup error reporting format changed to
Recent Errors (since latest restore point TIME ago)
<TIME> ago: <error description> on <task name>
Older Errors
<TIME> ago: <error description> on <task name>
TIME format:
<= 59 seconds ago
<= 59.99 minutes ago
<= 23.99 hours ago
N.NN days ago
2018-02-28 18:22:45 -08:00
Evan Tschannen
470f5c01f3
changed remoteDcId to a vector of ids, to support future configurations where there are multiple remote databases
2018-02-26 17:09:09 -08:00
Evan Tschannen
e3c6b66240
fix: do not commit more data after being stopped
...
fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center
2018-02-26 13:13:37 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen
cfcf98cffc
fix: log router tags were not stored at a best location
2018-02-23 12:26:19 -08:00
Evan Tschannen
719bb5bd0c
Merge pull request #4 from bnamasivayam/getKeyServers-refactor
...
Having 1000 as the limit for Limit for GetKeyServerLocationsRequest s…
2018-02-22 12:39:48 -08:00
Balachandar Namasivayam
2fe2b522d5
Simple refactor of code...
2018-02-22 12:38:14 -08:00
Alec Grieser
e1162e9238
Merge remote-tracking branch 'upstream/release-5.1'
2018-02-22 11:16:12 -08:00
Balachandar Namasivayam
e2030db5a8
Refactor getKeyServers to be more readable.
...
Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers.
Simplify getMasterProxies on DatabaseContext class.
2018-02-21 17:11:50 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Alec Grieser
aadc06de99
Merge remote-tracking branch 'upstream/release-5.1'
2018-02-20 14:28:29 -08:00
Evan Tschannen
31b89a638f
added satellite_none and remote_none options to unconfigure from a fearless setup
...
fix: log_router configuration was broken
2018-02-17 13:51:17 -08:00
Stephen Atherton
a9a9590058
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-02-16 19:44:36 -08:00
Stephen Atherton
54fc81b260
Improved backup error reporting in backup status. The most recent error for each error type is reported along with how long ago the error occurred, and errors are divided into two categories based on whether or not they occurred since the most recent backup progress.
2018-02-16 19:38:31 -08:00
Evan Tschannen
dc93759e15
suppressed trace events that are spammy
2018-02-16 16:01:19 -08:00
Bhaskar Muppana
d48417da0d
Merge branch 'release-5.1'
2018-02-16 13:36:25 -08:00
Evan Tschannen
8c53483838
fix: log ranges were not being cleared correctly
2018-02-16 10:27:10 -08:00
Evan Tschannen
cb25564d38
simulated cluster supports fearless configurations
...
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00
Evan Tschannen
42405c78a5
Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/Knobs.cpp
2018-02-10 12:08:52 -08:00
Evan Tschannen
fbadcc6eea
changing a storage server’s tag must be the first mutations applied in a version, because privatized mutations applied earlier in the same version will use the old tag
2018-02-09 18:21:29 -08:00
Evan Tschannen
c7b3be5b19
re-enabled better master exists
...
the cluster controller can choose a better data center for itself and let the workers know where the next cluster controller should be recruited
2018-02-09 16:48:55 -08:00
Stephen Atherton
555dc72353
Merge branch 'master' of github.com:apple/foundationdb
2018-02-07 15:12:13 -08:00
Stephen Atherton
acb876d520
Merge branch 'release-5.1'
2018-02-07 15:11:52 -08:00
A.J. Beamon
e06f80bb6e
Add client logging for number of logical and physical reads at the NativeAPI layer as well as committed mutation counts and bytes.
2018-02-07 11:56:47 -08:00
Stephen Atherton
d8879dc3f3
HTTP::doRequest() now reads responses in parallel with sending requests, so if the server responds before receiving all of the the request the client can stop sending the remainder of the request. For PUT requests which upload files, this prevents sending potentially several megabytes of unnecessary bytes if the server responds with an error (such as 429) before the request is completely sent. Updated the backup container unit test to use more parallelism in order to test this new behavior.
2018-02-07 10:38:31 -08:00
Stephen Atherton
3a49211c44
Merge branch 'release-5.1'
2018-02-06 13:58:35 -08:00
Stephen Atherton
0792d5e3dd
Fix: last restorable version for a backup tag name (a separate value from the latest restorable version for a configured backup) was not being updated.
...
Fix: backup blob speed was sometimes an error because the JSON $sum merge operator did not support mixed numeric types.
Fix: JSON merge operator handling was squashing errors in some cases, which was generally obscuring the backup speed metric issue.
Cleaned up some of the JSON object merging logic.
Improved error messages in JSON merge operators. Added JSON merge operator tests for mixed numeric math and improved readability of test output.
2018-02-06 13:44:04 -08:00
Evan Tschannen
ebd94bb654
removed a separately configurable storage team size for the remote data center, because it did not make sense
...
fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
2018-02-02 11:46:04 -08:00
Evan Tschannen
766964ff48
fix: dest tags were not repopulated when the tag cache was cleared
2018-01-31 17:35:48 -08:00
A.J. Beamon
080a454051
fix: getVersionstamp would return broken promise if a transaction was disposed before being set. getAddressesForKey would not return when resetPromise was set.
2018-01-31 13:47:36 -08:00
A.J. Beamon
0c601d6f85
Purge past version references
2018-01-31 12:05:41 -08:00
A.J. Beamon
41ebc34097
Fix some weird spacing
2018-01-31 11:13:57 -08:00
A.J. Beamon
84f7565d04
MultiVersionApi::createCluster was throwing network_not_setup rather than returning it.
2018-01-31 10:03:09 -08:00
Evan Tschannen
2e3b1d7ab8
Merge commit 'dd6ea70051aef215315e9eb3dea3b67a24778e32' into feature-remote-logs
...
# Conflicts:
# flow/Net2.actor.cpp
2018-01-29 17:11:03 -08:00
Stephen Atherton
2f291d8955
Bug fix in blob backup container deletion. The list/delete loop could end before deleting all of the files, but the index entry would still be deleted. Also preemptively made the same code change in listBucket() - Although it is technically correct as written it is a dangerous style because it is not obvious that the addition of a wait() call in the second 'when' block would create a bug. Consolidated deleteContainer() and deleteBucket() as they differ by only 1 line.
2018-01-29 00:32:41 -08:00
Evan Tschannen
29c5d4ad3d
upgrades from 5.X mostly supported, still some remaining correctness problems
2018-01-28 11:52:54 -08:00
Evan Tschannen
79d94214a4
Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs
2018-01-25 10:12:06 -08:00
Stephen Atherton
4dec5423f7
Optimization in backup snapshot dispatching. If the next dispatch version is not ahead of the recently known current version then do not set a scheduled time for new range tasks in order to avoid the overhead and delay of task timeout handling. Adjusted task timeout knobs to avoid large transaction warnings. Removed backtrace() from LargeTransaction trace event. Tweaked suppression in backup trace events.
2018-01-24 17:40:02 -08:00
Stephen Atherton
95d4e5520b
Added TraceEvent.
2018-01-24 12:44:37 -08:00
Stephen Atherton
aebbe1dcfd
Changed core_versionspersecond knob to int64_t type to avoid integer overflow. Cleaned up backup TraceEvent suppression. Added backtrace to LargeTransaction TraceEvent to make it easier to find the source of large commits in applications using NativeAPI directly.
2018-01-24 11:59:37 -08:00
Stephen Atherton
83409fb067
Bug fix, versionFolderString() was not reducing the precision of the number in the output string. Not technically a 'bug' as the scheme will still work but produces an overly deep and sparse folder structure.
2018-01-24 10:29:37 -08:00
Stephen Atherton
40d38880fe
Changed version-based folder naming scheme to something simpler, a fixed width 0-padded 19 digit number (the longest a Version can be) with /'s inserted to limit the size of each folder level. Comparisons using these folder names ignore the /'s so any future change to the splitting scheme would still be compatible with the current listing/reading logic.
2018-01-23 15:02:15 -08:00
Stephen Atherton
7db7a51440
Changes to backup folder structure in BackupContainerBlobStore at the top level inside the backup bucket. All data files now live under data/<backup_name> and there is an 'index' at backups/<backup_name> which indicates what backups exist. Backup names can now contain '/' characters.
2018-01-23 11:46:16 -08:00
Stephen Atherton
7f0b7311b9
Corrected function name to timeKeeperVersionFromDatetime(). 'Fdbbackup expire' now allows an expire_before version of 0 if explicitly passed by version or by timestamp.
2018-01-23 00:19:51 -08:00
Stephen Atherton
51a1bd9327
Timekeeper lookup improvements, moved both function declartions to BackupContainer.h. VersionFromEpochs() now uses versions/sec to adjust the lookup result to improve accuracy. Conversions in both directions look for the latest record less than the target conversion value, but failing that they will now fall back on any available data point and adjust from there using versions/sec.
2018-01-22 23:57:01 -08:00
Stephen Atherton
f086ba9d9d
Improved version to timestamp lookup - if there are no older versioned records in the database then the next available record, if any, will be used to calculate a result.
2018-01-22 22:47:57 -08:00
Stephen Atherton
1f59f9ee5b
Reduce restore dispatch transaction size.
2018-01-22 15:04:14 -08:00
Evan Tschannen
698ef4117e
Merge branch 'master' into feature-remote-logs
2018-01-20 10:34:30 -08:00
Evan Tschannen
89f0f9318a
decodeServerTagValue decodes tags encoded pre-6.0
2018-01-20 10:33:13 -08:00
A.J. Beamon
16cd0c8f75
Change WaitStorageMetricsPenalty to not be SevWarnAlways unless it happens multiple times in a row (about 2.5 minutes with current knobs).
2018-01-19 16:41:15 -08:00
Evan Tschannen
0ca8b612c7
fix: do not set onDone until all addTaskFutures have had a chance to join that future
2018-01-19 16:29:42 -08:00
Stephen Atherton
9d5f7bd5ab
Aborting an old incompatible backup did not work if the old backup was never actually initialized, started, or was already aborted.
2018-01-19 15:08:03 -08:00
Stephen Atherton
6e96d3c30c
Bug fix, backup snapshots could take unexpectedly long if the desired snapshot interval is less than the configured snapshotDispatch interval.
2018-01-19 12:14:04 -08:00
Stephen Atherton
da02099c4c
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-01-19 11:02:48 -08:00
Stephen Atherton
fc16bb94ab
Discontinuing a backup that is already restorable now stops immediately and aborts (via validation key) any tasks scheduled to run later.
2018-01-19 11:02:43 -08:00
Evan Tschannen
570f72ba40
fix: nextDispatchVersion was being set too large if the snapshot interval was small
2018-01-19 10:53:58 -08:00
Stephen Atherton
b6dd06d945
Bug fix in version to timestamp conversion.
2018-01-18 02:54:12 -08:00
Stephen Atherton
307e04c0ad
Updated backup container unit test to match new safer behavior of expireData(). Rewrote BackupContainerLocalDirectory::deleteContainer() to actually delete the whole directory but only if it appears to be a backup with either log or snapshot data.
2018-01-18 00:36:28 -08:00
Stephen Atherton
cdd1e784dc
Added yields to writing backup snapshot manifests to avoid slow tasks.
2018-01-17 13:28:56 -08:00
Stephen Atherton
f6f0816bc1
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-01-17 12:12:12 -08:00
Stephen Atherton
8fece71662
Bug fix in backup metadata handling if logEnd becomes less than logBegin, which can happen if an expire is done without logEnd being updated.
2018-01-17 12:12:04 -08:00
Stephen Atherton
d7f8fe218a
Bug fix, resolving versions to timestamps for use in backup descriptions did not work on a locked database. Added some trace events to AtomicRestore to show progress.
2018-01-17 12:03:19 -08:00
A.J. Beamon
4bfbdbf454
Extract getLocalTime to platform.cpp
2018-01-17 11:35:34 -08:00
Stephen Atherton
a6fc30209e
Compile fix on windows, localtime_r is not available there.
2018-01-17 09:55:25 -08:00
Stephen Atherton
93b34a945f
Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup.
2018-01-17 04:09:43 -08:00
Stephen Atherton
f955547796
Bug fixes. Local var int i being declared over a state variable, and iterators being initialized incorrectly.
2018-01-16 11:41:49 -08:00
Stephen Atherton
897ff6f676
Added new knob for how many tasks to add per transaction in backup dispatch, instead of using the value for restore which has much lower overhead per task.
2018-01-16 10:45:21 -08:00
Stephen Atherton
02d72ca4b8
Added yields to CPU-heavy operations in FileBackup's snapshot range dispatcher.
2018-01-16 10:31:44 -08:00
Evan Tschannen
21482a45e1
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DBCoreState.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen
645dc5ead6
warmRange needs to get a read version occasionally to prevent it from overwhelming the proxy
...
quietDatabase waits for all data distribution to be completely finished so that databases are cached in a cleaner state
2018-01-14 12:50:52 -08:00
Evan Tschannen
660cee0254
increased the priority of getKeyServersLocations, because once a client gets a read version, answering their reads should be higher priority than starting new transactions
2018-01-12 13:46:20 -08:00
Evan Tschannen
721a891d1f
fix: never request more than 100 shards from the proxy at a time to avoid large packets
2018-01-12 10:51:53 -08:00
A.J. Beamon
2f5073d00f
Some visual studio project cleanup.
2018-01-10 10:07:18 -08:00
Evan Tschannen
022df3b91b
backup and restore sometimes took too long in simulation
2018-01-09 17:26:42 -08:00
Evan Tschannen
4e8bc273b3
added a version of getKeyRangeLocations that checks for endpoint failures
...
fix: did not add the cluster controller to id_used in all cases
removed obsolete fixmes
2018-01-07 15:32:43 -08:00
Evan Tschannen
3ec45d38a0
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Evan Tschannen
14e956c4fc
fix: getRange did not check for endpoint failures
...
refactored checking endpoint failures
2018-01-06 13:49:47 -08:00
Stephen Atherton
96cb06cbc7
Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations.
2018-01-05 23:06:39 -08:00
Stephen Atherton
236799c77f
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-01-05 14:38:06 -08:00
Alex Miller
f021934792
Fix yet another VersionStamp DR bug.
...
In this episode, we discover that having a transaction retry loop in which the
transaction conditionally has write conflict ranges is potentially troublesome.
To simplify the problem, if we have two concurrent transaction loops:
retry {
if (rand() > .5) tr->set('x', rand());
if (rand() > .5) tr->set('y', rand());
}
and
retry {
x = tr->get('x')
y = tr->get('y')
if (x > y) {
tr->set('y', x)
}
tr->commit();
}
Is not guaranteed that x > y in the database after the second transaction
commits. This is because it could read an older snapshot of x and y, in which
x was greater than y, and thus not invoke set. This means that `tr` is now a
read-only transaction, which no-ops out of committing as an "optimization". If
we add any write conflict range to `tr`, it then will conflict checked and
committed, which would guarantee that x>y when it commits.
Replace the first transaction with dumpData, and the second with version
upgrade transaction, and you have the bug that we're fixing, why, and how.
2018-01-05 14:23:11 -08:00
Evan Tschannen
63751fb0e2
fix: remote logs are not in the log system until the recovery is complete so they cannot be used to determine if this is the correct log system to recover from
2018-01-05 14:15:25 -08:00
Stephen Atherton
cbeff0f789
Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1
2018-01-05 14:13:27 -08:00
Stephen Atherton
2763713cbc
Bug fix, backup snapshot dispatch was calculating that all shards must be done immediately.
2018-01-05 14:12:00 -08:00
A.J. Beamon
30067d2f53
Whitespace fixes and removal of change to java's AbstractTester
2018-01-05 13:21:54 -08:00
A.J. Beamon
9f2e6bfbd1
Merge branch 'release-5.1' into vexillographer-binding-specific-disables
...
# Conflicts:
# fdbclient/vexillographer/fdb.options
2018-01-05 13:16:41 -08:00
Evan Tschannen
2a869a4178
fixed merge errors
2018-01-05 12:17:17 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
A.J. Beamon
5015119115
Generalize the message that gets displayed in status if a cluster file's contents are incorrect.
2018-01-05 10:29:47 -08:00
Stephen Atherton
0f20068e82
Renamed all TaskBucket backup tasks to more appropriate names. Created the ability to make task aliases and used this to direct old task names to a task definition which will abort backups created before version 5.1.
2018-01-04 22:53:31 -08:00
Stephen Atherton
d43e80cf48
Bug fix, atomicRestore would fail to get a commit version after a commit_unknown_result where the transaction actually was committed. This would cause the restore to target version -1 so it would use one of the first available restorable versions in the backup instead of the version at which the database was locked.
2018-01-03 16:24:02 -08:00
Stephen Atherton
fd3f3aa647
Increased system key size limit to fix some rare backup use cases.
2018-01-03 12:05:12 -08:00
Stephen Atherton
96c479dc71
Rare bug fix. It turns out that backup log files must be written with unique names, otherwise a re-written >1 block log file overwritten after a restore has begun could read some blocks from before the rewrite and some blocks after, but due to random content ordering this would be incorrect and produce a corrupt restore. This bug is very rare because restore would detect an error unless the rewritten log file has exactly same size as the original file, but this is unlikely because the random content order affects block padding and therefore usable content bytes per block.
2018-01-02 23:38:01 -08:00
Stephen Atherton
371dee70e6
Improved backup folder structure to be shallower but spread files more uniformly and make each folder's entries lexically sort into version order regardless of numeric length. Improved backup container test to use a random version multiplier on the file versions created in order to test a wide range of versioned folder paths.
2018-01-02 23:22:35 -08:00
Stephen Atherton
f324afc13f
Bug fix in blob store listing when it requires multiple serial requests Added more trace events to FileBackup and BlobStoreEndpoint with suppression and added suppression to existing trace events.
2017-12-22 17:08:25 -08:00
Stephen Atherton
f2524ffd33
AsyncFileBlobStoreWrite was prohibiting the writing of 0-byte files. Improved HTTP verbose logging to stdout. Added writing a 0-byte file to BackupContainer unit test. Added backup log and snapshot sizes to backup description.
2017-12-21 21:15:26 -08:00
Stephen Atherton
aa8b4c52d5
Removed backup URL from trace events.
2017-12-21 18:22:14 -08:00
Stephen Atherton
93e426ccd2
Merge branch 'master' of github.com:apple/foundationdb
2017-12-21 17:21:20 -08:00
Stephen Atherton
e8f9568bbe
Simulation improvement, readCommitted() calls that run for a long time would sometimes go too slow depending on the buggified limit, so now the limit is updated for each fetch loop.
2017-12-21 17:21:05 -08:00
Evan Tschannen
69f7409c37
fix: latestRestorable was incorrect
2017-12-21 17:09:21 -08:00
Evan Tschannen
5ed080721d
fix: atomic restore must wait for the restorable version is greater than the lock version
...
fix: latestRestorableVersion calculation was wrong
2017-12-21 15:45:10 -08:00
Evan Tschannen
95b502e1d7
fix: we did not restore to the target version in all cases
2017-12-21 14:11:44 -08:00
Stephen Atherton
ec28c77353
Merge branch 'master' of github.com:apple/foundationdb
2017-12-21 01:58:47 -08:00
Stephen Atherton
e3aee45a74
Backup tools and agent now accept blob account credentials via files containing JSON which are specified using command line arguments and/or an environment variable. Improved fdbbackup help, clarifying which options are for which operations. Fdbbackup operations which do not need to use a database no longer require a cluster file parameter. Added eat() commands to StringRef for incrementally tokenizing strings using separator strings.
2017-12-21 01:58:15 -08:00
Alex Miller
f70e3b9fe8
Add or change a bunch of comments to provide descriptions of function contracts.
...
This cleans up a bit of the VersionStamp DR work I did, and leaves hints and
advice for anyone who will be touching mutation applying code in the future.
2017-12-20 16:57:14 -08:00
Evan Tschannen
38cff7d4a5
every transaction which clears applyMutation keys does so on the first proxy
2017-12-20 15:41:47 -08:00
Evan Tschannen
982f0dcb1e
Merge pull request #222 from cie/alexmiller/drtimefix2
...
Fix yet another VersionStamp DR issue.
2017-12-20 15:09:23 -08:00
Alex Miller
b5a6bc0ab7
Fix VersionStamp problems by instead adding a COMMIT_ON_FIRST_PROXY transaction option.
...
Simulation identified the fact that we can violate the
VersionStamps-are-always-increasing promise via the following series of events:
1. On proxy 0, dumpData adds commit requests to proxy 0's commit promise stream
2. To any proxy, a client submits the first transaction of abortBackup, which stops further dumpData calls on proxy 0.
3. To any proxy that is not proxy 0, submit a transaction that checks if it needs to upgrade the destination version.
4. The transaction from (3) is committed
5. Transactions from (1) are committed
This is possible because the dumpData transactions have no read conflict
ranges, and thus it's impossible to make them abort due to "conflicting"
transactions. There's also no promise that if client C sends a commit to proxy
A, and later a client D sends a commit to proxy B, that B must log its commit
after A. (We only promise that if C is told it was committed before D is told
it was committed, then A committed before B.)
There was a failed attempt to fix this problem. We tried to add read conflict
ranges to dumpData transactions so that they could be aborted by "conflicting"
transactions. However, this failed because this now means that dumpData
transactions require conflict resolution, and the stale read version that they
use can cause them to be aborted with a transaction_too_old error.
(Transactions that don't have read conflict ranges will never return
transaction_too_old, because with no reads, the read snapshot version is
effectively meaningless.) This was never previously possible, so the existing
code doesn't retry commits, and to make things more complicated, the dumpData
commits must be applied in order. This would require either adding
dependencies to transactions (if A is going to commit then B must also be/have
committed), which would be complicated, or submitting transactions with a fixed
read version, and replaying the failed commits with a higher read version once
we get a transaction_too_old error, which would unacceptably slow down the
maximum throughput of dumpData.
Thus, we've instead elected to add a special transaction option that bypasses
proxy load balancing for commits, and always commits against proxy 0. We can
know for certain that after the transaction from (2) is committed, all of the
dumpData transactions that will be committed have been added to the commit
promise stream on proxy 0. Thus, if we enqueue another transaction against
proxy 0, we can know that it will be placed into the promise stream after all
of the dumpData transactions, thus providing the semantics that we require: no
dumpData transaction can commit after the destination version upgrade
transaction.
2017-12-20 15:04:04 -08:00
Evan Tschannen
07efdc70c8
more fixes for windows compile
2017-12-20 14:39:23 -08:00
Evan Tschannen
c51de3bb88
fixed windows compile issues
2017-12-20 13:48:31 -08:00
Stephen Atherton
c1958b335a
Compile fix on windows, can't access protected parent class member from static function, apparently.
2017-12-20 12:13:25 -08:00
Evan Tschannen
0ab0cf51a3
fix: snapshotDispatch signaled completion after the first snapshot finished
2017-12-20 12:07:35 -08:00
Stephen Atherton
b77276d2f0
First snapshot of a backup should go as fast as possible instead of using the configured snapshot interval.
2017-12-20 01:07:03 -08:00
Stephen Atherton
7caa012fbf
Added snapshot interval option to "fdbbackup start" which defaults to a new knob's value. Added snapshot info to backup status text. Improvements to fdbbackup help.
2017-12-20 00:49:08 -08:00
Stephen Atherton
d87aa521e9
Merge branch 'backup-container-refactor' into continuous-backup
2017-12-19 23:39:00 -08:00
Stephen Atherton
e0d9cea008
Merge branch 'master' into continuous-backup
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
# fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Stephen Atherton
2cd1ff6aae
Bug fix, in restore dispatch the apply lag was being retrieved before updating the apply end version which would make it look like mutations were finished applying early.
2017-12-19 18:11:40 -08:00
Stephen Atherton
61a043ebfa
Added tr->reset() to prevent initial transaction loop attempts from having a higher chance of expiring.
2017-12-19 17:33:45 -08:00
Alex Miller
c7dbd31a1e
Refactoring: Create a common prefixRange and do UID->Key once in backup.
2017-12-19 17:17:50 -08:00
Stephen Atherton
aa5169bd3c
Removed unnecessary trace event.
2017-12-19 15:29:22 -08:00
Stephen Atherton
e28641886d
TraceEvent improvements. Minor bug fix, restore log writing tasks didn't have the log file endVersion but it's only for logging purposes.
2017-12-19 15:27:04 -08:00
Stephen Atherton
a276985baf
Bug fix, if there are range files in a restore which begin at exactly the restore version they will be repeatedly dispatched forever.
2017-12-18 17:48:18 -08:00
Stephen Atherton
005a4a0706
Restore status bug fix, during restore the apply lag would appear as a large negative number until the first restore batch is completed. Test improvement, snapshot dispatch now chooses a random number of tasks to dispatch per commit.
2017-12-18 15:56:57 -08:00
Stephen Atherton
937fa75bec
Bug fix, if target snapshot end version is at or before the begin version then no progress would be made.
2017-12-18 00:13:25 -08:00
Stephen Atherton
d32a770648
Bug fix, backup never went to differential mode once it was restorable which caused waitBackup to only return once the backup was discontinued.
2017-12-17 23:22:18 -08:00
Stephen Atherton
2b92815e8c
Bug fix. The snapshot dispatch add task retry loop was incorrectly deciding that the second and further transaction of an execution was already committed and therefore skipping it, resulting in missing ranges in the snapshot.
2017-12-17 21:01:31 -08:00
Stephen Atherton
afd2603576
Refactored backup task flow and config to support ongoing snapshots and allow stopping the backup cleanly between snapshots. The previously separate tasks for initial and differential mode log dispatching have been merged into BackupLogsDispatchTask.
2017-12-17 14:29:57 -08:00
Evan Tschannen
1dc9eceb6d
optimize GetKeyLocationRequests on the proxy so they only require a single map lookup, instead of doing 3 + (3* [number of ranges]) lookups
2017-12-15 20:13:44 -08:00
Stephen Atherton
18305ab326
Bug fixes. Added snapshotBatchSize to backupConfig to enable detecting if a transaction for adding a group of tasks to a batch had already completed. Changed KeyRangeMap usage so that each range value to be dispatched has a unique integer value, enabling more efficient range coalescing and avoiding some iterator invalidation bugs.
2017-12-15 01:39:50 -08:00
Yichi Chiang
50c154fed4
Add fdbbackup interface
2017-12-14 13:54:01 -08:00
Stephen Atherton
33f9f1a95c
Added SnapshotDispatch task for writing snapshots in random order over a specified period of time and adapting speed to a growing or shrinking database. TaskBucket now supports scheduling tasks. TaskFuture now correctly recognizes multiple tasks in its callback space. TaskBucket extendTimeout() now supports specifying the new timeout version. Submitting a backup now requires a snapshot duration.
2017-12-14 01:44:38 -08:00
Stephen Atherton
47a9a7ab0e
Finished backup container discovery / listing via base URL.
2017-12-12 17:44:03 -08:00