Commit Graph

65 Commits

Author SHA1 Message Date
Oleg Samarin 4c9df78076 dstonly 2020-07-10 15:13:42 +03:00
Evan Tschannen ced65cd30b finished explicitly versioning everything stored in the database 2020-05-22 17:14:21 -07:00
Evan Tschannen 0420b3e786 fix compile error 2020-04-29 14:05:53 -07:00
Evan Tschannen 7cebe743f9 A number of bug fixes of rare correctness errors 2020-04-29 13:50:13 -07:00
Evan Tschannen a486ec2de0 pipelined fdbdr status 2020-02-25 15:48:00 -08:00
Evan Tschannen daee15cbb5 fix: starting a DR should do the commit on the first proxy to ensure all mutations from previous backups have been flushed 2020-02-25 12:35:24 -08:00
Evan Tschannen 73ad702d14 Clients which fetch status should not disconnect from the coordinators and cluster controller between each retrieval 2020-01-22 15:41:22 -08:00
Steve Atherton 4ff058e86b Backup and DR layer status document generation now uses snapshot reads for all keys read to avoid unnecessary conflicts when read during a status update or cleanup transaction. Since many of the keys read use wrapper functions, all of the underlying functions in BackupAgentBase and its two implementations also required a snapshot mode argument. All snapshot arguments default to false to match the underlying FDB API get/getrange methods. 2019-12-19 00:29:35 -08:00
Evan Tschannen 628b4e0220 added a warning if multiple log ranges exist for the same range 2019-10-02 17:06:19 -07:00
Evan Tschannen 9ec9f41d34 backup and DR would not share mutations if they were started on different versions of FDB 2019-10-01 18:52:07 -07:00
Evan Tschannen ef01ad2ed8 optimized log range clearing to clear everything for each possible hash (256 clears) if that would be more efficient than one clear per second that has elapsed
aborting a DR without the —cleanup flag will still attempt to cleanup for 30 seconds before giving up
added a cleanup command to fdbbackup which can remove mutations from orphaned DRs which were stopped without the —cleanup flag
2019-09-27 18:32:27 -07:00
Jingyu Zhou 3a63d053e9 Address review comments for PR#1725 2019-06-20 14:06:32 -07:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Andrew Noyes 781b6ece77 Fix OPEN_FOR_IDE -Wunused-variable warnings
CC #1255, #1173
2019-04-16 15:28:01 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Steve Atherton 8aab719c22
Merge branch 'master' into feature-backup-json 2019-03-12 18:23:16 -07:00
Stephen Atherton 023bbb566f Renamed backup state enums for clarity, added backup state names. Changed Epochs to EpochSeconds in backup JSON along with some other renaming/moving of fields, and added information about snapshot dispatch. Changed timestamp format for input/output in all backup/restore contexts to be a fully qualified time with timezone offset. Added information about the last snapshot dispatch to backup config and status (not yet populated). 2019-03-10 16:00:01 -07:00
Evan Tschannen f1897f3eb6 Merge branch 'master' into feature-metadata-version
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
2019-03-04 21:06:16 -08:00
Balachandar Namasivayam a258df32f6 Skip switchover checks for force option. 2019-03-04 15:58:36 -08:00
Balachandar Namasivayam 7cf71b0931 Add some basic checks before doing an atomic switchover. 2019-03-01 14:49:04 -08:00
Evan Tschannen 3da85f3acd implemented the \xff/metadataVersion key, which can be used by layers to help them cheaply cache metadata and know when their cache is invalid 2019-02-28 17:45:00 -08:00
mpilman c1592b4c3a Several minor fixes within fdbclient 2019-02-19 15:16:59 -08:00
mpilman 78dd80ea8a Proper fwd decl in BackupAgent
Also BackupAgent.h -> BackupAgent.actor.h
2019-02-19 15:16:59 -08:00
mpilman 3cb2391b58 use proper fwd declarations in ManagementAPI
Also ManagementAPI.h -> ManagementAPI.actor.h
2019-02-19 15:16:59 -08:00
mpilman 479a4d7c22 Minor fixes in fdbclient for intellisense 2019-02-19 15:16:59 -08:00
Andrew Noyes 067a445e06 Replace unused _ variables with wait(success(...)) 2019-02-12 17:30:30 -08:00
A.J. Beamon d4d5740282 * Add Optional.map and ErrorOr.map.
* Rename Optional/ErrorOr cast_to to castTo.
* Make printable(Optional<T>) templated rather than restricted to StringRef types.
* Fixes bug in (unused) ErrorOr.castTo where an ErrorOr that was not set would lose its error.
2019-01-11 09:03:38 -08:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
A.J. Beamon 99c9958db7 Some more trace event normalization 2018-06-08 13:57:00 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen e82985aea2 fix: continue setting beginVersion so that versions between 5.2.0 and 5.2.2 do not crash when decoding tasks created by 5.2.3 2018-06-06 13:34:22 -07:00
Evan Tschannen 4120062bb9 fix: backup initialized its begin version at 1 instead of the read version of the starting transaction
fix: erasing log ranges did not properly divide up work between transactions to prevent making transactions which were too large
2018-06-06 13:05:53 -07:00
Evan Tschannen 8930c2e3db DR upgrade tests now test the durability of the data. 2018-05-09 15:11:05 -07:00
Yichi Chiang c721ab6854 Fix review comments 2018-04-27 13:54:34 -07:00
Yichi Chiang 6bddf8aefa Upgrade DR from 5.1 to 5.2 2018-04-26 17:24:40 -07:00
Evan Tschannen 57d650062a merge 5.1 into 5.2 2018-04-18 20:44:31 -07:00
Evan Tschannen 77d100e1e6 fix: if a DR cluster has a much lower version than the primary database, it would take a long time to process the empty versions. But the version of the DR cluster before starting the DR to avoid this problem. 2018-04-18 19:37:24 -07:00
yichic ede5cab192
Merge pull request #89 from yichic/share-log-mutations-5.2
Share log mutations 5.2
2018-03-19 12:01:26 -07:00
Yichi Chiang ec02e54f64 Refactor EraseLogData() 2018-03-19 11:56:01 -07:00
Yichi Chiang 1f2602d2b3 Fix all review comments 2018-03-19 11:33:33 -07:00
Yichi Chiang d6559b144f Share log mutations between backups and DRs which have the same backup range 2018-03-19 11:32:50 -07:00
Stephen Atherton dcf5b2e35d All readCommitted() functions now use Transaction instead of ReadYourWritesTransaction to reduce memory consumption in Backup and DR. Also removed one readCommitted() variant as it is just a special case of another definition. 2018-03-07 13:56:34 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Alex Miller f021934792 Fix yet another VersionStamp DR bug.
In this episode, we discover that having a transaction retry loop in which the
transaction conditionally has write conflict ranges is potentially troublesome.

To simplify the problem, if we have two concurrent transaction loops:

    retry {
      if (rand() > .5) tr->set('x', rand());
      if (rand() > .5) tr->set('y', rand());
    }

and

    retry {
      x = tr->get('x')
      y = tr->get('y')
      if (x > y) {
        tr->set('y', x)
      }
      tr->commit();
    }

Is not guaranteed that x > y in the database after the second transaction
commits.  This is because it could read an older snapshot of x and y, in which
x was greater than y, and thus not invoke set.  This means that `tr` is now a
read-only transaction, which no-ops out of committing as an "optimization".  If
we add any write conflict range to `tr`, it then will conflict checked and
committed, which would guarantee that x>y when it commits.

Replace the first transaction with dumpData, and the second with version
upgrade transaction, and you have the bug that we're fixing, why, and how.
2018-01-05 14:23:11 -08:00
Alex Miller b264a98aea Fix yet another VersionStamp DR bug.
In this episode, we discover that having a transaction retry loop in which the
transaction conditionally has write conflict ranges is potentially troublesome.

To simplify the problem, if we have two concurrent transaction loops:

    retry {
      if (rand() > .5) tr->set('x', rand());
      if (rand() > .5) tr->set('y', rand());
    }

and

    retry {
      x = tr->get('x')
      y = tr->get('y')
      if (x > y) {
        tr->set('y', x)
      }
      tr->commit();
    }

Is not guaranteed that x > y in the database after the second transaction
commits.  This is because it could read an older snapshot of x and y, in which
x was greater than y, and thus not invoke set.  This means that `tr` is now a
read-only transaction, which no-ops out of committing as an "optimization".  If
we add any write conflict range to `tr`, it then will conflict checked and
committed, which would guarantee that x>y when it commits.

Replace the first transaction with dumpData, and the second with version
upgrade transaction, and you have the bug that we're fixing, why, and how.
2018-01-04 17:29:43 -08:00
Alex Miller f70e3b9fe8 Add or change a bunch of comments to provide descriptions of function contracts.
This cleans up a bit of the VersionStamp DR work I did, and leaves hints and
advice for anyone who will be touching mutation applying code in the future.
2017-12-20 16:57:14 -08:00
Evan Tschannen 38cff7d4a5 every transaction which clears applyMutation keys does so on the first proxy 2017-12-20 15:41:47 -08:00
Evan Tschannen 982f0dcb1e Merge pull request #222 from cie/alexmiller/drtimefix2
Fix yet another VersionStamp DR issue.
2017-12-20 15:09:23 -08:00
Alex Miller b5a6bc0ab7 Fix VersionStamp problems by instead adding a COMMIT_ON_FIRST_PROXY transaction option.
Simulation identified the fact that we can violate the
VersionStamps-are-always-increasing promise via the following series of events:

1. On proxy 0, dumpData adds commit requests to proxy 0's commit promise stream
2. To any proxy, a client submits the first transaction of abortBackup, which stops further dumpData calls on proxy 0.
3. To any proxy that is not proxy 0, submit a transaction that checks if it needs to upgrade the destination version.
4. The transaction from (3) is committed
5. Transactions from (1) are committed

This is possible because the dumpData transactions have no read conflict
ranges, and thus it's impossible to make them abort due to "conflicting"
transactions.  There's also no promise that if client C sends a commit to proxy
A, and later a client D sends a commit to proxy B, that B must log its commit
after A.  (We only promise that if C is told it was committed before D is told
it was committed, then A committed before B.)

There was a failed attempt to fix this problem.  We tried to add read conflict
ranges to dumpData transactions so that they could be aborted by "conflicting"
transactions.  However, this failed because this now means that dumpData
transactions require conflict resolution, and the stale read version that they
use can cause them to be aborted with a transaction_too_old error.
(Transactions that don't have read conflict ranges will never return
transaction_too_old, because with no reads, the read snapshot version is
effectively meaningless.)  This was never previously possible, so the existing
code doesn't retry commits, and to make things more complicated, the dumpData
commits must be applied in order.  This would require either adding
dependencies to transactions (if A is going to commit then B must also be/have
committed), which would be complicated, or submitting transactions with a fixed
read version, and replaying the failed commits with a higher read version once
we get a transaction_too_old error, which would unacceptably slow down the
maximum throughput of dumpData.

Thus, we've instead elected to add a special transaction option that bypasses
proxy load balancing for commits, and always commits against proxy 0.  We can
know for certain that after the transaction from (2) is committed, all of the
dumpData transactions that will be committed have been added to the commit
promise stream on proxy 0.  Thus, if we enqueue another transaction against
proxy 0, we can know that it will be placed into the promise stream after all
of the dumpData transactions, thus providing the semantics that we require:  no
dumpData transaction can commit after the destination version upgrade
transaction.
2017-12-20 15:04:04 -08:00